Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells3825
Missing cells (%)9.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text3
Categorical1

Dataset

Description관리_건축물대장_PK,동명칭,호_명,층_구분_코드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15393/S/1/datasetView.do

Alerts

층_구분_코드 is highly imbalanced (86.5%)Imbalance
동명칭 has 3816 (38.2%) missing valuesMissing
관리_건축물대장_PK has unique valuesUnique

Reproduction

Analysis started2024-05-18 03:51:22.758348
Analysis finished2024-05-18 03:51:24.850108
Duration2.09 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T12:51:25.388733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length11
Mean length12.8371
Min length11

Characters and Unicode

Total characters128371
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11320-20542
2nd row11530-100243033
3rd row11380-100182917
4th row11170-75974
5th row11350-92272
ValueCountFrequency (%)
11320-20542 1
 
< 0.1%
11410-91737 1
 
< 0.1%
11710-74309 1
 
< 0.1%
11440-49148 1
 
< 0.1%
11170-43779 1
 
< 0.1%
11710-74768 1
 
< 0.1%
11350-52486 1
 
< 0.1%
11380-111671 1
 
< 0.1%
11590-100201159 1
 
< 0.1%
11260-100270427 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-18T12:51:27.072160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 31827
24.8%
0 25347
19.7%
- 10000
 
7.8%
5 9838
 
7.7%
3 9738
 
7.6%
2 9385
 
7.3%
4 6899
 
5.4%
8 6766
 
5.3%
6 6474
 
5.0%
7 6453
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 118371
92.2%
Dash Punctuation 10000
 
7.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 31827
26.9%
0 25347
21.4%
5 9838
 
8.3%
3 9738
 
8.2%
2 9385
 
7.9%
4 6899
 
5.8%
8 6766
 
5.7%
6 6474
 
5.5%
7 6453
 
5.5%
9 5644
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 128371
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 31827
24.8%
0 25347
19.7%
- 10000
 
7.8%
5 9838
 
7.7%
3 9738
 
7.6%
2 9385
 
7.3%
4 6899
 
5.4%
8 6766
 
5.3%
6 6474
 
5.0%
7 6453
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 128371
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 31827
24.8%
0 25347
19.7%
- 10000
 
7.8%
5 9838
 
7.7%
3 9738
 
7.6%
2 9385
 
7.3%
4 6899
 
5.4%
8 6766
 
5.3%
6 6474
 
5.0%
7 6453
 
5.0%

동명칭
Text

MISSING 

Distinct778
Distinct (%)12.6%
Missing3816
Missing (%)38.2%
Memory size156.2 KiB
2024-05-18T12:51:28.233585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length4
Mean length4.1235446
Min length1

Characters and Unicode

Total characters25500
Distinct characters335
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique379 ?
Unique (%)6.1%

Sample

1st row111동
2nd row804동
3rd row811동
4th row104동
5th row204동
ValueCountFrequency (%)
101동 505
 
7.8%
102동 376
 
5.8%
103동 237
 
3.7%
104동 223
 
3.5%
105동 217
 
3.4%
106동 216
 
3.4%
108동 115
 
1.8%
110동 92
 
1.4%
109동 90
 
1.4%
203동 84
 
1.3%
Other values (832) 4279
66.5%
2024-05-18T12:51:30.029547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5602
22.0%
1 5039
19.8%
0 3914
15.3%
2 1685
 
6.6%
3 1174
 
4.6%
4 943
 
3.7%
5 729
 
2.9%
6 684
 
2.7%
8 520
 
2.0%
7 367
 
1.4%
Other values (325) 4843
19.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 15361
60.2%
Other Letter 9329
36.6%
Uppercase Letter 407
 
1.6%
Space Separator 250
 
1.0%
Close Punctuation 43
 
0.2%
Open Punctuation 43
 
0.2%
Lowercase Letter 32
 
0.1%
Dash Punctuation 25
 
0.1%
Other Punctuation 8
 
< 0.1%
Letter Number 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5602
60.0%
219
 
2.3%
166
 
1.8%
147
 
1.6%
129
 
1.4%
94
 
1.0%
90
 
1.0%
89
 
1.0%
88
 
0.9%
73
 
0.8%
Other values (280) 2632
28.2%
Uppercase Letter
ValueCountFrequency (%)
A 76
18.7%
B 55
13.5%
T 50
12.3%
S 25
 
6.1%
E 25
 
6.1%
W 23
 
5.7%
V 20
 
4.9%
R 20
 
4.9%
O 18
 
4.4%
I 15
 
3.7%
Other values (12) 80
19.7%
Decimal Number
ValueCountFrequency (%)
1 5039
32.8%
0 3914
25.5%
2 1685
 
11.0%
3 1174
 
7.6%
4 943
 
6.1%
5 729
 
4.7%
6 684
 
4.5%
8 520
 
3.4%
7 367
 
2.4%
9 306
 
2.0%
Lowercase Letter
ValueCountFrequency (%)
l 14
43.8%
e 6
18.8%
z 6
18.8%
i 6
18.8%
Other Punctuation
ValueCountFrequency (%)
. 6
75.0%
& 1
 
12.5%
, 1
 
12.5%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
250
100.0%
Close Punctuation
ValueCountFrequency (%)
) 43
100.0%
Open Punctuation
ValueCountFrequency (%)
( 43
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15730
61.7%
Hangul 9329
36.6%
Latin 441
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5602
60.0%
219
 
2.3%
166
 
1.8%
147
 
1.6%
129
 
1.4%
94
 
1.0%
90
 
1.0%
89
 
1.0%
88
 
0.9%
73
 
0.8%
Other values (280) 2632
28.2%
Latin
ValueCountFrequency (%)
A 76
17.2%
B 55
12.5%
T 50
11.3%
S 25
 
5.7%
E 25
 
5.7%
W 23
 
5.2%
V 20
 
4.5%
R 20
 
4.5%
O 18
 
4.1%
I 15
 
3.4%
Other values (18) 114
25.9%
Common
ValueCountFrequency (%)
1 5039
32.0%
0 3914
24.9%
2 1685
 
10.7%
3 1174
 
7.5%
4 943
 
6.0%
5 729
 
4.6%
6 684
 
4.3%
8 520
 
3.3%
7 367
 
2.3%
9 306
 
1.9%
Other values (7) 369
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16169
63.4%
Hangul 9329
36.6%
Number Forms 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5602
60.0%
219
 
2.3%
166
 
1.8%
147
 
1.6%
129
 
1.4%
94
 
1.0%
90
 
1.0%
89
 
1.0%
88
 
0.9%
73
 
0.8%
Other values (280) 2632
28.2%
ASCII
ValueCountFrequency (%)
1 5039
31.2%
0 3914
24.2%
2 1685
 
10.4%
3 1174
 
7.3%
4 943
 
5.8%
5 729
 
4.5%
6 684
 
4.2%
8 520
 
3.2%
7 367
 
2.3%
9 306
 
1.9%
Other values (33) 808
 
5.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct1822
Distinct (%)18.2%
Missing9
Missing (%)0.1%
Memory size156.2 KiB
2024-05-18T12:51:30.898224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length3.9887899
Min length1

Characters and Unicode

Total characters39852
Distinct characters80
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1156 ?
Unique (%)11.6%

Sample

1st row605호
2nd row1004
3rd row708
4th row102호
5th row809호
ValueCountFrequency (%)
301 211
 
2.1%
401 189
 
1.9%
201 187
 
1.9%
202 166
 
1.7%
302 159
 
1.6%
402 155
 
1.5%
501 148
 
1.5%
201호 136
 
1.4%
101 130
 
1.3%
301호 116
 
1.2%
Other values (1779) 8456
84.1%
2024-05-18T12:51:32.132051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 9331
23.4%
1 7517
18.9%
4909
12.3%
2 4449
11.2%
3 3027
 
7.6%
4 2400
 
6.0%
5 1902
 
4.8%
6 1371
 
3.4%
7 1147
 
2.9%
8 868
 
2.2%
Other values (70) 2931
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 32842
82.4%
Other Letter 6192
 
15.5%
Uppercase Letter 381
 
1.0%
Dash Punctuation 324
 
0.8%
Space Separator 62
 
0.2%
Open Punctuation 18
 
< 0.1%
Close Punctuation 18
 
< 0.1%
Connector Punctuation 8
 
< 0.1%
Other Punctuation 6
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4909
79.3%
611
 
9.9%
188
 
3.0%
107
 
1.7%
53
 
0.9%
36
 
0.6%
32
 
0.5%
26
 
0.4%
24
 
0.4%
24
 
0.4%
Other values (36) 182
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
B 188
49.3%
A 80
21.0%
S 20
 
5.2%
E 19
 
5.0%
T 17
 
4.5%
W 12
 
3.1%
C 11
 
2.9%
F 10
 
2.6%
O 9
 
2.4%
G 4
 
1.0%
Other values (5) 11
 
2.9%
Decimal Number
ValueCountFrequency (%)
0 9331
28.4%
1 7517
22.9%
2 4449
13.5%
3 3027
 
9.2%
4 2400
 
7.3%
5 1902
 
5.8%
6 1371
 
4.2%
7 1147
 
3.5%
8 868
 
2.6%
9 830
 
2.5%
Other Punctuation
ValueCountFrequency (%)
. 4
66.7%
: 1
 
16.7%
, 1
 
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 324
100.0%
Space Separator
ValueCountFrequency (%)
62
100.0%
Open Punctuation
ValueCountFrequency (%)
( 18
100.0%
Close Punctuation
ValueCountFrequency (%)
) 18
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8
100.0%
Lowercase Letter
ValueCountFrequency (%)
b 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 33278
83.5%
Hangul 6192
 
15.5%
Latin 382
 
1.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4909
79.3%
611
 
9.9%
188
 
3.0%
107
 
1.7%
53
 
0.9%
36
 
0.6%
32
 
0.5%
26
 
0.4%
24
 
0.4%
24
 
0.4%
Other values (36) 182
 
2.9%
Common
ValueCountFrequency (%)
0 9331
28.0%
1 7517
22.6%
2 4449
13.4%
3 3027
 
9.1%
4 2400
 
7.2%
5 1902
 
5.7%
6 1371
 
4.1%
7 1147
 
3.4%
8 868
 
2.6%
9 830
 
2.5%
Other values (8) 436
 
1.3%
Latin
ValueCountFrequency (%)
B 188
49.2%
A 80
20.9%
S 20
 
5.2%
E 19
 
5.0%
T 17
 
4.5%
W 12
 
3.1%
C 11
 
2.9%
F 10
 
2.6%
O 9
 
2.4%
G 4
 
1.0%
Other values (6) 12
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 33660
84.5%
Hangul 6192
 
15.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 9331
27.7%
1 7517
22.3%
2 4449
13.2%
3 3027
 
9.0%
4 2400
 
7.1%
5 1902
 
5.7%
6 1371
 
4.1%
7 1147
 
3.4%
8 868
 
2.6%
9 830
 
2.5%
Other values (24) 818
 
2.4%
Hangul
ValueCountFrequency (%)
4909
79.3%
611
 
9.9%
188
 
3.0%
107
 
1.7%
53
 
0.9%
36
 
0.6%
32
 
0.5%
26
 
0.4%
24
 
0.4%
24
 
0.4%
Other values (36) 182
 
2.9%

층_구분_코드
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
지상
9661 
지하
 
338
옥탑
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row지상
2nd row지상
3rd row지상
4th row지상
5th row지상

Common Values

ValueCountFrequency (%)
지상 9661
96.6%
지하 338
 
3.4%
옥탑 1
 
< 0.1%

Length

2024-05-18T12:51:32.502450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T12:51:32.814950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지상 9661
96.6%
지하 338
 
3.4%
옥탑 1
 
< 0.1%

Missing values

2024-05-18T12:51:23.985258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T12:51:24.337781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-18T12:51:24.687385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_건축물대장_PK동명칭호_명층_구분_코드
3217911320-20542111동605호지상
7380711530-100243033804동1004지상
6166011380-100182917811동708지상
6943111170-75974<NA>102호지상
3995711350-92272104동809호지상
988011590-95572204동606지상
4233111590-100219793에이동202지상
6930811170-79517(2단지)202-2705지상
5453911350-9120610동507호지상
3948311440-58822<NA>아-502지상
관리_건축물대장_PK동명칭호_명층_구분_코드
8821711380-100200182332동1002지상
3255111470-89216<NA>402호지상
2909111350-95910203동401호지상
5988311170-64086<NA>209호지상
2173911710-152434상가3층1호지상
6741711470-113238106동309호지상
6799011230-100256270<NA>604지상
8080711230-100181318<NA>501지상
7508011380-100199474317동605지상
6597611560-69081<NA>1층마-8호지상