Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells9981
Missing cells (%)16.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Text4
Categorical2

Dataset

Description관리_지역지구구역_pk,관리_건축물대장_pk,지역지구구역_구분_코드,지역지구구역_코드,대표_여부,기타_지역지구구역
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15655/S/1/datasetView.do

Alerts

지역지구구역_구분_코드 is highly overall correlated with 대표_여부High correlation
대표_여부 is highly overall correlated with 지역지구구역_구분_코드High correlation
대표_여부 is highly imbalanced (62.2%)Imbalance
지역지구구역_코드 has 4620 (46.2%) missing valuesMissing
기타_지역지구구역 has 5361 (53.6%) missing valuesMissing
관리_지역지구구역_pk has unique valuesUnique

Reproduction

Analysis started2024-05-11 05:48:59.033458
Analysis finished2024-05-11 05:49:00.421573
Duration1.39 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:49:00.605140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length13.3923
Min length8

Characters and Unicode

Total characters133923
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11140-100041786
2nd row11170-100059590
3rd row11170-100060210
4th row11140-5982
5th row11140-100047719
ValueCountFrequency (%)
11140-100041786 1
 
< 0.1%
11200-18540 1
 
< 0.1%
11140-3398 1
 
< 0.1%
11140-3486 1
 
< 0.1%
11170-100031400 1
 
< 0.1%
11140-100044061 1
 
< 0.1%
11170-100037971 1
 
< 0.1%
11110-270 1
 
< 0.1%
11170-100069740 1
 
< 0.1%
11110-100027902 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T14:49:01.118994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 43616
32.6%
0 35088
26.2%
- 10000
 
7.5%
4 8822
 
6.6%
7 7675
 
5.7%
2 5662
 
4.2%
3 5435
 
4.1%
5 5100
 
3.8%
6 4583
 
3.4%
8 3976
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 123923
92.5%
Dash Punctuation 10000
 
7.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 43616
35.2%
0 35088
28.3%
4 8822
 
7.1%
7 7675
 
6.2%
2 5662
 
4.6%
3 5435
 
4.4%
5 5100
 
4.1%
6 4583
 
3.7%
8 3976
 
3.2%
9 3966
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 133923
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 43616
32.6%
0 35088
26.2%
- 10000
 
7.5%
4 8822
 
6.6%
7 7675
 
5.7%
2 5662
 
4.2%
3 5435
 
4.1%
5 5100
 
3.8%
6 4583
 
3.4%
8 3976
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 133923
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 43616
32.6%
0 35088
26.2%
- 10000
 
7.5%
4 8822
 
6.6%
7 7675
 
5.7%
2 5662
 
4.2%
3 5435
 
4.1%
5 5100
 
3.8%
6 4583
 
3.4%
8 3976
 
3.0%
Distinct9305
Distinct (%)93.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:49:01.400738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length11.8075
Min length7

Characters and Unicode

Total characters118075
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8648 ?
Unique (%)86.5%

Sample

1st row11140-9462
2nd row11170-1639
3rd row11170-23781
4th row11140-7259
5th row11140-24692
ValueCountFrequency (%)
11110-27531 3
 
< 0.1%
11110-100190272 3
 
< 0.1%
11140-9970 3
 
< 0.1%
11170-17548 3
 
< 0.1%
11170-11419 3
 
< 0.1%
11110-100216420 3
 
< 0.1%
11110-26041 3
 
< 0.1%
11140-2211 3
 
< 0.1%
11170-2794 3
 
< 0.1%
11170-100207339 3
 
< 0.1%
Other values (9295) 9970
99.7%
2024-05-11T14:49:01.806646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 41375
35.0%
0 22275
18.9%
- 10000
 
8.5%
2 8644
 
7.3%
7 7574
 
6.4%
4 7384
 
6.3%
3 4450
 
3.8%
5 4236
 
3.6%
9 4186
 
3.5%
8 4049
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 108075
91.5%
Dash Punctuation 10000
 
8.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 41375
38.3%
0 22275
20.6%
2 8644
 
8.0%
7 7574
 
7.0%
4 7384
 
6.8%
3 4450
 
4.1%
5 4236
 
3.9%
9 4186
 
3.9%
8 4049
 
3.7%
6 3902
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 118075
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 41375
35.0%
0 22275
18.9%
- 10000
 
8.5%
2 8644
 
7.3%
7 7574
 
6.4%
4 7384
 
6.3%
3 4450
 
3.8%
5 4236
 
3.6%
9 4186
 
3.5%
8 4049
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 118075
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 41375
35.0%
0 22275
18.9%
- 10000
 
8.5%
2 8644
 
7.3%
7 7574
 
6.4%
4 7384
 
6.3%
3 4450
 
3.8%
5 4236
 
3.6%
9 4186
 
3.5%
8 4049
 
3.4%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
용도지역코드
5610 
용도지구코드
3090 
용도구역코드
1055 
1
 
153
2
 
71

Length

Max length6
Median length6
Mean length5.8775
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row용도지역코드
3rd row용도지역코드
4th row용도지역코드
5th row용도지구코드

Common Values

ValueCountFrequency (%)
용도지역코드 5610
56.1%
용도지구코드 3090
30.9%
용도구역코드 1055
 
10.5%
1 153
 
1.5%
2 71
 
0.7%
3 21
 
0.2%

Length

2024-05-11T14:49:02.015694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:49:02.190181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
용도지역코드 5610
56.1%
용도지구코드 3090
30.9%
용도구역코드 1055
 
10.5%
1 153
 
1.5%
2 71
 
0.7%
3 21
 
0.2%
Distinct68
Distinct (%)1.3%
Missing4620
Missing (%)46.2%
Memory size156.2 KiB
2024-05-11T14:49:02.509218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length9
Mean length7.6280669
Min length3

Characters and Unicode

Total characters41039
Distinct characters93
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)0.4%

Sample

1st row1023
2nd row일반상업지역
3rd row준주거지역
4th row제2종일반주거지역
5th row준주거지역
ValueCountFrequency (%)
제2종일반주거지역 1861
34.3%
일반상업지역 1027
19.0%
제1종일반주거지역 660
 
12.2%
제3종일반주거지역 479
 
8.8%
준주거지역 222
 
4.1%
도시지역 170
 
3.1%
일반주거지역 160
 
3.0%
상대보호구역 101
 
1.9%
준공업지역 92
 
1.7%
제1종전용주거지역 86
 
1.6%
Other values (60) 561
 
10.4%
2024-05-11T14:49:02.981430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5160
12.6%
4948
12.1%
4187
10.2%
4187
10.2%
3478
8.5%
3478
8.5%
3181
7.8%
3091
7.5%
2 2066
 
5.0%
1172
 
2.9%
Other values (83) 6091
14.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 37040
90.3%
Decimal Number 3927
 
9.6%
Space Separator 39
 
0.1%
Uppercase Letter 33
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5160
13.9%
4948
13.4%
4187
11.3%
4187
11.3%
3478
9.4%
3478
9.4%
3181
8.6%
3091
8.3%
1172
 
3.2%
1142
 
3.1%
Other values (71) 3016
8.1%
Decimal Number
ValueCountFrequency (%)
2 2066
52.6%
1 1043
26.6%
3 541
 
13.8%
0 246
 
6.3%
4 21
 
0.5%
6 10
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
U 11
33.3%
Q 11
33.3%
A 7
21.2%
G 3
 
9.1%
F 1
 
3.0%
Space Separator
ValueCountFrequency (%)
39
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 37040
90.3%
Common 3966
 
9.7%
Latin 33
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5160
13.9%
4948
13.4%
4187
11.3%
4187
11.3%
3478
9.4%
3478
9.4%
3181
8.6%
3091
8.3%
1172
 
3.2%
1142
 
3.1%
Other values (71) 3016
8.1%
Common
ValueCountFrequency (%)
2 2066
52.1%
1 1043
26.3%
3 541
 
13.6%
0 246
 
6.2%
39
 
1.0%
4 21
 
0.5%
6 10
 
0.3%
Latin
ValueCountFrequency (%)
U 11
33.3%
Q 11
33.3%
A 7
21.2%
G 3
 
9.1%
F 1
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 37040
90.3%
ASCII 3999
 
9.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5160
13.9%
4948
13.4%
4187
11.3%
4187
11.3%
3478
9.4%
3478
9.4%
3181
8.6%
3091
8.3%
1172
 
3.2%
1142
 
3.1%
Other values (71) 3016
8.1%
ASCII
ValueCountFrequency (%)
2 2066
51.7%
1 1043
26.1%
3 541
 
13.5%
0 246
 
6.2%
39
 
1.0%
4 21
 
0.5%
U 11
 
0.3%
Q 11
 
0.3%
6 10
 
0.3%
A 7
 
0.2%
Other values (2) 4
 
0.1%

대표_여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대표
8339 
<NA>
1422 
1
 
225
0
 
14

Length

Max length4
Median length2
Mean length2.2605
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row대표
3rd row대표
4th row대표
5th row대표

Common Values

ValueCountFrequency (%)
대표 8339
83.4%
<NA> 1422
 
14.2%
1 225
 
2.2%
0 14
 
0.1%

Length

2024-05-11T14:49:03.235651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:49:03.470869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대표 8339
83.4%
na 1422
 
14.2%
1 225
 
2.2%
0 14
 
0.1%
Distinct226
Distinct (%)4.9%
Missing5361
Missing (%)53.6%
Memory size156.2 KiB
2024-05-11T14:49:03.927962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length35
Mean length6.6667385
Min length2

Characters and Unicode

Total characters30927
Distinct characters163
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique101 ?
Unique (%)2.2%

Sample

1st row제2종일반주거지역
2nd row자연경관지구
3rd row방화지구
4th row일반상업지역
5th row일반주거지역
ValueCountFrequency (%)
일반주거지역 592
 
12.3%
주차장정비지구 403
 
8.4%
제2종일반주거지역 379
 
7.9%
일반주거 323
 
6.7%
일반상업지역 246
 
5.1%
최고고도지구 185
 
3.8%
주거환경개선지구 159
 
3.3%
도시지역 158
 
3.3%
주차장정비 140
 
2.9%
제1종일반주거지역 134
 
2.8%
Other values (223) 2093
43.5%
2024-05-11T14:49:04.664601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3600
 
11.6%
2602
 
8.4%
2494
 
8.1%
2177
 
7.0%
1957
 
6.3%
1945
 
6.3%
1930
 
6.2%
959
 
3.1%
871
 
2.8%
648
 
2.1%
Other values (153) 11744
38.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29505
95.4%
Decimal Number 1079
 
3.5%
Space Separator 173
 
0.6%
Open Punctuation 55
 
0.2%
Close Punctuation 55
 
0.2%
Other Punctuation 31
 
0.1%
Dash Punctuation 19
 
0.1%
Lowercase Letter 6
 
< 0.1%
Uppercase Letter 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3600
 
12.2%
2602
 
8.8%
2494
 
8.5%
2177
 
7.4%
1957
 
6.6%
1945
 
6.6%
1930
 
6.5%
959
 
3.3%
871
 
3.0%
648
 
2.2%
Other values (134) 10322
35.0%
Decimal Number
ValueCountFrequency (%)
2 489
45.3%
1 406
37.6%
3 72
 
6.7%
4 69
 
6.4%
7 14
 
1.3%
0 14
 
1.3%
6 8
 
0.7%
5 5
 
0.5%
8 2
 
0.2%
Other Punctuation
ValueCountFrequency (%)
, 22
71.0%
/ 4
 
12.9%
: 3
 
9.7%
. 2
 
6.5%
Space Separator
ValueCountFrequency (%)
173
100.0%
Open Punctuation
ValueCountFrequency (%)
( 55
100.0%
Close Punctuation
ValueCountFrequency (%)
) 55
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 19
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 6
100.0%
Uppercase Letter
ValueCountFrequency (%)
M 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29505
95.4%
Common 1412
 
4.6%
Latin 10
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3600
 
12.2%
2602
 
8.8%
2494
 
8.5%
2177
 
7.4%
1957
 
6.6%
1945
 
6.6%
1930
 
6.5%
959
 
3.3%
871
 
3.0%
648
 
2.2%
Other values (134) 10322
35.0%
Common
ValueCountFrequency (%)
2 489
34.6%
1 406
28.8%
173
 
12.3%
3 72
 
5.1%
4 69
 
4.9%
( 55
 
3.9%
) 55
 
3.9%
, 22
 
1.6%
- 19
 
1.3%
7 14
 
1.0%
Other values (7) 38
 
2.7%
Latin
ValueCountFrequency (%)
m 6
60.0%
M 4
40.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29505
95.4%
ASCII 1422
 
4.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3600
 
12.2%
2602
 
8.8%
2494
 
8.5%
2177
 
7.4%
1957
 
6.6%
1945
 
6.6%
1930
 
6.5%
959
 
3.3%
871
 
3.0%
648
 
2.2%
Other values (134) 10322
35.0%
ASCII
ValueCountFrequency (%)
2 489
34.4%
1 406
28.6%
173
 
12.2%
3 72
 
5.1%
4 69
 
4.9%
( 55
 
3.9%
) 55
 
3.9%
, 22
 
1.5%
- 19
 
1.3%
7 14
 
1.0%
Other values (9) 48
 
3.4%

Correlations

2024-05-11T14:49:04.839989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부
지역지구구역_구분_코드1.0000.9990.942
지역지구구역_코드0.9991.0000.960
대표_여부0.9420.9601.000
2024-05-11T14:49:04.998725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대표_여부지역지구구역_구분_코드
대표_여부1.0000.707
지역지구구역_구분_코드0.7071.000
2024-05-11T14:49:05.152515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부
지역지구구역_구분_코드1.0000.707
대표_여부0.7071.000

Missing values

2024-05-11T14:48:59.930991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:49:00.130705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T14:49:00.326709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_지역지구구역_pk관리_건축물대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역
2975511140-10004178611140-9462110231<NA>
7456011170-10005959011170-1639용도지역코드일반상업지역대표<NA>
7512611170-10006021011170-23781용도지역코드준주거지역대표<NA>
5169611140-598211140-7259용도지역코드제2종일반주거지역대표<NA>
3502211140-10004771911140-24692용도지구코드<NA>대표<NA>
5911911170-10003192411170-12884용도지역코드준주거지역<NA><NA>
8569711170-1462911170-18925용도지역코드제2종일반주거지역대표<NA>
9267511200-10004773411200-100244697용도지역코드제2종일반주거지역대표제2종일반주거지역
504911110-10004177011110-100203337용도지구코드<NA>대표자연경관지구
2100711110-733511110-14974용도지구코드<NA><NA>방화지구
관리_지역지구구역_pk관리_건축물대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역
6770011170-10005006311170-12947용도지역코드<NA><NA>개발행위허가제한지역
2937311140-10004135411140-8809용도지역코드제2종일반주거지역대표<NA>
6700611170-10004500211170-680용도구역코드<NA>대표<NA>
7795911170-10006440011170-270용도지역코드제3종일반주거지역대표<NA>
9452411200-1294211200-13782용도지역코드<NA>대표일반주거
616711110-10004721411110-100209676용도구역코드<NA>대표지구단위계획구역
4607411140-1847511140-23264용도지구코드<NA>대표<NA>
6895911170-10005355411170-9943용도지역코드준주거지역대표<NA>
592111110-10004587211110-100208245용도지역코드과밀억제지역<NA>과밀억제지역
9403311200-1178111200-12693용도지역코드제2종일반주거지역대표일반주거