Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells3444
Missing cells (%)5.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Text3
Categorical1
Numeric2

Dataset

Description관리_폐쇄말소대장_PK,동명칭,호_명,층_구분_코드,층_번호,작업_일자
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15749/S/1/datasetView.do

Alerts

층_구분_코드 is highly imbalanced (81.9%)Imbalance
동명칭 has 3416 (34.2%) missing valuesMissing
관리_폐쇄말소대장_PK has unique valuesUnique
층_번호 has 651 (6.5%) zerosZeros

Reproduction

Analysis started2024-05-11 05:37:13.167563
Analysis finished2024-05-11 05:37:15.564202
Duration2.4 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:37:15.876649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length15.948
Min length10

Characters and Unicode

Total characters159480
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11650-100777865
2nd row11710-101330516
3rd row11215-100674522
4th row11710-100159170
5th row11530-1000000000000001491424
ValueCountFrequency (%)
11650-100777865 1
 
< 0.1%
11410-100382287 1
 
< 0.1%
11530-1000000000000001491061 1
 
< 0.1%
11470-100124086 1
 
< 0.1%
11710-100299574 1
 
< 0.1%
11650-97280 1
 
< 0.1%
11530-1000000000000001496691 1
 
< 0.1%
11710-100313482 1
 
< 0.1%
11710-100298785 1
 
< 0.1%
11440-100511145 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T14:37:16.592131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 44590
28.0%
1 41898
26.3%
- 10000
 
6.3%
5 9469
 
5.9%
3 9394
 
5.9%
7 9089
 
5.7%
2 8046
 
5.0%
4 7809
 
4.9%
6 7349
 
4.6%
8 6033
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 149480
93.7%
Dash Punctuation 10000
 
6.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 44590
29.8%
1 41898
28.0%
5 9469
 
6.3%
3 9394
 
6.3%
7 9089
 
6.1%
2 8046
 
5.4%
4 7809
 
5.2%
6 7349
 
4.9%
8 6033
 
4.0%
9 5803
 
3.9%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 159480
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 44590
28.0%
1 41898
26.3%
- 10000
 
6.3%
5 9469
 
5.9%
3 9394
 
5.9%
7 9089
 
5.7%
2 8046
 
5.0%
4 7809
 
4.9%
6 7349
 
4.6%
8 6033
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 159480
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 44590
28.0%
1 41898
26.3%
- 10000
 
6.3%
5 9469
 
5.9%
3 9394
 
5.9%
7 9089
 
5.7%
2 8046
 
5.0%
4 7809
 
4.9%
6 7349
 
4.6%
8 6033
 
3.8%

동명칭
Text

MISSING 

Distinct560
Distinct (%)8.5%
Missing3416
Missing (%)34.2%
Memory size156.2 KiB
2024-05-11T14:37:17.250385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length4.7531896
Min length1

Characters and Unicode

Total characters31295
Distinct characters273
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique138 ?
Unique (%)2.1%

Sample

1st row509동
2nd row더 라움 펜트하우스
3rd row가든파이브라이프
4th row305동
5th row341동
ValueCountFrequency (%)
가든파이브라이프 1029
 
14.4%
101동 313
 
4.4%
가든파이브툴 210
 
2.9%
102동 200
 
2.8%
135
 
1.9%
상가 128
 
1.8%
비동 112
 
1.6%
103동 89
 
1.2%
2동 88
 
1.2%
시티 84
 
1.2%
Other values (572) 4736
66.5%
2024-05-11T14:37:18.196980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4017
 
12.8%
1 3095
 
9.9%
2454
 
7.8%
0 2409
 
7.7%
2 1623
 
5.2%
1498
 
4.8%
1397
 
4.5%
3 1358
 
4.3%
1241
 
4.0%
1239
 
4.0%
Other values (263) 10964
35.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 19417
62.0%
Decimal Number 11151
35.6%
Space Separator 540
 
1.7%
Uppercase Letter 99
 
0.3%
Open Punctuation 28
 
0.1%
Close Punctuation 28
 
0.1%
Dash Punctuation 18
 
0.1%
Other Punctuation 10
 
< 0.1%
Lowercase Letter 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4017
20.7%
2454
12.6%
1498
 
7.7%
1397
 
7.2%
1241
 
6.4%
1239
 
6.4%
1188
 
6.1%
1065
 
5.5%
227
 
1.2%
210
 
1.1%
Other values (226) 4881
25.1%
Uppercase Letter
ValueCountFrequency (%)
A 37
37.4%
B 20
20.2%
C 10
 
10.1%
D 7
 
7.1%
E 4
 
4.0%
I 3
 
3.0%
H 3
 
3.0%
V 3
 
3.0%
T 3
 
3.0%
J 2
 
2.0%
Other values (5) 7
 
7.1%
Decimal Number
ValueCountFrequency (%)
1 3095
27.8%
0 2409
21.6%
2 1623
14.6%
3 1358
12.2%
4 802
 
7.2%
5 656
 
5.9%
6 386
 
3.5%
7 322
 
2.9%
8 278
 
2.5%
9 222
 
2.0%
Other Punctuation
ValueCountFrequency (%)
* 6
60.0%
, 2
 
20.0%
& 1
 
10.0%
. 1
 
10.0%
Lowercase Letter
ValueCountFrequency (%)
u 1
25.0%
s 1
25.0%
e 1
25.0%
o 1
25.0%
Space Separator
ValueCountFrequency (%)
540
100.0%
Open Punctuation
ValueCountFrequency (%)
( 28
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 19417
62.0%
Common 11775
37.6%
Latin 103
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4017
20.7%
2454
12.6%
1498
 
7.7%
1397
 
7.2%
1241
 
6.4%
1239
 
6.4%
1188
 
6.1%
1065
 
5.5%
227
 
1.2%
210
 
1.1%
Other values (226) 4881
25.1%
Latin
ValueCountFrequency (%)
A 37
35.9%
B 20
19.4%
C 10
 
9.7%
D 7
 
6.8%
E 4
 
3.9%
I 3
 
2.9%
H 3
 
2.9%
V 3
 
2.9%
T 3
 
2.9%
J 2
 
1.9%
Other values (9) 11
 
10.7%
Common
ValueCountFrequency (%)
1 3095
26.3%
0 2409
20.5%
2 1623
13.8%
3 1358
11.5%
4 802
 
6.8%
5 656
 
5.6%
540
 
4.6%
6 386
 
3.3%
7 322
 
2.7%
8 278
 
2.4%
Other values (8) 306
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 19417
62.0%
ASCII 11878
38.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4017
20.7%
2454
12.6%
1498
 
7.7%
1397
 
7.2%
1241
 
6.4%
1239
 
6.4%
1188
 
6.1%
1065
 
5.5%
227
 
1.2%
210
 
1.1%
Other values (226) 4881
25.1%
ASCII
ValueCountFrequency (%)
1 3095
26.1%
0 2409
20.3%
2 1623
13.7%
3 1358
11.4%
4 802
 
6.8%
5 656
 
5.5%
540
 
4.5%
6 386
 
3.2%
7 322
 
2.7%
8 278
 
2.3%
Other values (27) 409
 
3.4%
Distinct3376
Distinct (%)33.9%
Missing28
Missing (%)0.3%
Memory size156.2 KiB
2024-05-11T14:37:18.878082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length13
Mean length4.2095868
Min length1

Characters and Unicode

Total characters41978
Distinct characters92
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2678 ?
Unique (%)26.9%

Sample

1st row102호
2nd row1702
3rd row2동-501호
4th rowY-3034
5th row204
ValueCountFrequency (%)
401 143
 
1.4%
201 134
 
1.3%
301 132
 
1.3%
302 131
 
1.3%
202 130
 
1.3%
101호 110
 
1.1%
101 103
 
1.0%
202호 103
 
1.0%
501 101
 
1.0%
201호 101
 
1.0%
Other values (3287) 8905
88.2%
2024-05-11T14:37:19.776977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8773
20.9%
1 7304
17.4%
2 4548
10.8%
3379
 
8.0%
3 3154
 
7.5%
4 2398
 
5.7%
5 1965
 
4.7%
6 1483
 
3.5%
- 1468
 
3.5%
7 1303
 
3.1%
Other values (82) 6203
14.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 33085
78.8%
Other Letter 5256
 
12.5%
Uppercase Letter 1952
 
4.7%
Dash Punctuation 1468
 
3.5%
Space Separator 121
 
0.3%
Close Punctuation 41
 
0.1%
Open Punctuation 41
 
0.1%
Other Punctuation 14
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3379
64.3%
918
 
17.5%
225
 
4.3%
140
 
2.7%
90
 
1.7%
82
 
1.6%
73
 
1.4%
49
 
0.9%
33
 
0.6%
30
 
0.6%
Other values (52) 237
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
B 535
27.4%
T 291
14.9%
Y 274
14.0%
L 266
13.6%
F 239
12.2%
A 117
 
6.0%
E 77
 
3.9%
S 51
 
2.6%
D 47
 
2.4%
C 42
 
2.2%
Other values (4) 13
 
0.7%
Decimal Number
ValueCountFrequency (%)
0 8773
26.5%
1 7304
22.1%
2 4548
13.7%
3 3154
 
9.5%
4 2398
 
7.2%
5 1965
 
5.9%
6 1483
 
4.5%
7 1303
 
3.9%
8 1134
 
3.4%
9 1023
 
3.1%
Other Punctuation
ValueCountFrequency (%)
, 9
64.3%
. 5
35.7%
Dash Punctuation
ValueCountFrequency (%)
- 1468
100.0%
Space Separator
ValueCountFrequency (%)
121
100.0%
Close Punctuation
ValueCountFrequency (%)
) 41
100.0%
Open Punctuation
ValueCountFrequency (%)
( 41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 34770
82.8%
Hangul 5256
 
12.5%
Latin 1952
 
4.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3379
64.3%
918
 
17.5%
225
 
4.3%
140
 
2.7%
90
 
1.7%
82
 
1.6%
73
 
1.4%
49
 
0.9%
33
 
0.6%
30
 
0.6%
Other values (52) 237
 
4.5%
Common
ValueCountFrequency (%)
0 8773
25.2%
1 7304
21.0%
2 4548
13.1%
3 3154
 
9.1%
4 2398
 
6.9%
5 1965
 
5.7%
6 1483
 
4.3%
- 1468
 
4.2%
7 1303
 
3.7%
8 1134
 
3.3%
Other values (6) 1240
 
3.6%
Latin
ValueCountFrequency (%)
B 535
27.4%
T 291
14.9%
Y 274
14.0%
L 266
13.6%
F 239
12.2%
A 117
 
6.0%
E 77
 
3.9%
S 51
 
2.6%
D 47
 
2.4%
C 42
 
2.2%
Other values (4) 13
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 36722
87.5%
Hangul 5252
 
12.5%
Compat Jamo 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8773
23.9%
1 7304
19.9%
2 4548
12.4%
3 3154
 
8.6%
4 2398
 
6.5%
5 1965
 
5.4%
6 1483
 
4.0%
- 1468
 
4.0%
7 1303
 
3.5%
8 1134
 
3.1%
Other values (20) 3192
 
8.7%
Hangul
ValueCountFrequency (%)
3379
64.3%
918
 
17.5%
225
 
4.3%
140
 
2.7%
90
 
1.7%
82
 
1.6%
73
 
1.4%
49
 
0.9%
33
 
0.6%
30
 
0.6%
Other values (48) 233
 
4.4%
Compat Jamo
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

층_구분_코드
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
지상
9316 
지하
 
682
옥탑
 
1
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0002
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row지상
2nd row지상
3rd row지상
4th row지상
5th row지상

Common Values

ValueCountFrequency (%)
지상 9316
93.2%
지하 682
 
6.8%
옥탑 1
 
< 0.1%
<NA> 1
 
< 0.1%

Length

2024-05-11T14:37:20.033038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:37:20.526084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지상 9316
93.2%
지하 682
 
6.8%
옥탑 1
 
< 0.1%
na 1
 
< 0.1%

층_번호
Real number (ℝ)

ZEROS 

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.4313
Minimum0
Maximum49
Zeros651
Zeros (%)6.5%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T14:37:20.706617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q38
95-th percentile16
Maximum49
Range49
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.3727616
Coefficient of variation (CV)0.98922202
Kurtosis5.5267689
Mean5.4313
Median Absolute Deviation (MAD)3
Skewness1.9371405
Sum54313
Variance28.866567
MonotonicityNot monotonic
2024-05-11T14:37:20.927430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
1 1745
17.4%
2 1324
13.2%
3 1108
11.1%
4 859
8.6%
5 722
7.2%
0 651
 
6.5%
6 524
 
5.2%
7 468
 
4.7%
9 418
 
4.2%
8 416
 
4.2%
Other values (34) 1765
17.6%
ValueCountFrequency (%)
0 651
 
6.5%
1 1745
17.4%
2 1324
13.2%
3 1108
11.1%
4 859
8.6%
5 722
7.2%
6 524
 
5.2%
7 468
 
4.7%
8 416
 
4.2%
9 418
 
4.2%
ValueCountFrequency (%)
49 1
 
< 0.1%
44 1
 
< 0.1%
42 1
 
< 0.1%
40 4
< 0.1%
39 1
 
< 0.1%
38 2
 
< 0.1%
37 5
0.1%
36 6
0.1%
35 4
< 0.1%
34 1
 
< 0.1%

작업_일자
Real number (ℝ)

Distinct123
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20225085
Minimum20201201
Maximum20240327
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T14:37:21.174026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20201201
5-th percentile20211029
Q120211214
median20231104
Q320231104
95-th percentile20231124
Maximum20240327
Range39126
Interquartile range (IQR)19890

Descriptive statistics

Standard deviation8922.0596
Coefficient of variation (CV)0.00044113831
Kurtosis-0.85324708
Mean20225085
Median Absolute Deviation (MAD)20
Skewness-0.96863241
Sum2.0225085 × 1011
Variance79603147
MonotonicityNot monotonic
2024-05-11T14:37:21.378975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20231104 3171
31.7%
20211029 2068
20.7%
20231124 1092
 
10.9%
20231110 778
 
7.8%
20230329 467
 
4.7%
20230321 415
 
4.2%
20231028 407
 
4.1%
20230324 164
 
1.6%
20211127 106
 
1.1%
20230411 73
 
0.7%
Other values (113) 1259
 
12.6%
ValueCountFrequency (%)
20201201 11
 
0.1%
20201204 3
 
< 0.1%
20201208 6
 
0.1%
20201216 36
0.4%
20201230 33
0.3%
20210106 8
 
0.1%
20210108 2
 
< 0.1%
20210119 60
0.6%
20210126 10
 
0.1%
20210130 11
 
0.1%
ValueCountFrequency (%)
20240327 1
 
< 0.1%
20231124 1092
 
10.9%
20231110 778
 
7.8%
20231104 3171
31.7%
20231028 407
 
4.1%
20230908 15
 
0.1%
20230708 2
 
< 0.1%
20230602 2
 
< 0.1%
20230527 3
 
< 0.1%
20230422 35
 
0.4%

Interactions

2024-05-11T14:37:14.646912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T14:37:14.239084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T14:37:14.822891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T14:37:14.424201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:37:21.529059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층_구분_코드층_번호작업_일자
층_구분_코드1.0000.2630.207
층_번호0.2631.0000.232
작업_일자0.2070.2321.000
2024-05-11T14:37:21.659359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층_번호작업_일자층_구분_코드
층_번호1.0000.2590.162
작업_일자0.2591.0000.084
층_구분_코드0.1620.0841.000

Missing values

2024-05-11T14:37:15.047524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:37:15.272130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T14:37:15.464658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_폐쇄말소대장_PK동명칭호_명층_구분_코드층_번호작업_일자
4264811650-100777865<NA>102호지상120211029
3104111710-101330516509동1702지상1720211029
5754611215-100674522더 라움 펜트하우스2동-501호지상520231110
855511710-100159170가든파이브라이프Y-3034지상320231104
8675511530-1000000000000001491424305동204지상220230329
5129411650-100864310341동304호지상020220309
9656911545-61161<NA>3층 301호지상320230104
6567711710-1014068221503동803지상820211029
3953011650-100776270<NA>301호지상320211029
9219011650-97586엘동제1의5호지상120230321
관리_폐쇄말소대장_PK동명칭호_명층_구분_코드층_번호작업_일자
5671611590-100403851<NA>202지상220211029
225011545-100353173<NA>1006지상1020231104
2069511710-100313448가든파이브라이프T-6084지상620231104
7157211440-100735347102동1403지상1420231104
9921611650-97329102동306호지상320230411
1352011260-100132900<NA>A(EAST)동603호지상620231110
1660811710-1015279571208동1801지상1820231104
6692111650-100827557319동202호지상020211029
428811545-100310328<NA>1309지상1320231104
4468511650-100779189<NA>101호지상120211029