Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells50
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text4
Categorical3

Dataset

Description관리_지역지구구역_pk,관리_허가대장_pk,지역지구구역_구분_코드,지역지구구역_코드,대표_여부,주_동_구분_코드,지역지구구역_명
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15667/S/1/datasetView.do

Alerts

지역지구구역_구분_코드 is highly overall correlated with 주_동_구분_코드High correlation
대표_여부 is highly overall correlated with 주_동_구분_코드High correlation
주_동_구분_코드 is highly overall correlated with 지역지구구역_구분_코드 and 1 other fieldsHigh correlation
주_동_구분_코드 is highly imbalanced (78.9%)Imbalance
관리_지역지구구역_pk has unique valuesUnique

Reproduction

Analysis started2024-05-03 23:34:47.123177
Analysis finished2024-05-03 23:34:49.793589
Duration2.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-03T23:34:50.255568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length16.3506
Min length7

Characters and Unicode

Total characters163506
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11170-100020409
2nd row11110-100069912
3rd row11110-1000000000000000090558
4th row11140-694
5th row11110-100099204
ValueCountFrequency (%)
11170-100020409 1
 
< 0.1%
11140-100093241 1
 
< 0.1%
11110-100049929 1
 
< 0.1%
11110-100093095 1
 
< 0.1%
11110-4440 1
 
< 0.1%
11170-1000000000000000655589 1
 
< 0.1%
11140-1000000000000000496664 1
 
< 0.1%
11110-100099223 1
 
< 0.1%
11110-100039518 1
 
< 0.1%
11110-100112428 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-03T23:34:51.305047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 59173
36.2%
1 50983
31.2%
- 10000
 
6.1%
4 8178
 
5.0%
7 5748
 
3.5%
2 5275
 
3.2%
3 4948
 
3.0%
9 4871
 
3.0%
8 4834
 
3.0%
5 4763
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 153506
93.9%
Dash Punctuation 10000
 
6.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 59173
38.5%
1 50983
33.2%
4 8178
 
5.3%
7 5748
 
3.7%
2 5275
 
3.4%
3 4948
 
3.2%
9 4871
 
3.2%
8 4834
 
3.1%
5 4763
 
3.1%
6 4733
 
3.1%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 163506
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 59173
36.2%
1 50983
31.2%
- 10000
 
6.1%
4 8178
 
5.0%
7 5748
 
3.5%
2 5275
 
3.2%
3 4948
 
3.0%
9 4871
 
3.0%
8 4834
 
3.0%
5 4763
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 163506
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 59173
36.2%
1 50983
31.2%
- 10000
 
6.1%
4 8178
 
5.0%
7 5748
 
3.5%
2 5275
 
3.2%
3 4948
 
3.0%
9 4871
 
3.0%
8 4834
 
3.0%
5 4763
 
2.9%
Distinct7800
Distinct (%)78.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-03T23:34:51.787879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length15.6822
Min length7

Characters and Unicode

Total characters156822
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6253 ?
Unique (%)62.5%

Sample

1st row11170-100022091
2nd row11110-100029060
3rd row11110-100056172
4th row11140-476
5th row11110-100046752
ValueCountFrequency (%)
11110-100061913 8
 
0.1%
11110-100035151 8
 
0.1%
11140-100079198 8
 
0.1%
11140-1000000000000000319010 8
 
0.1%
11110-100025597 8
 
0.1%
11000-1000000000000000236251 8
 
0.1%
11140-1000000000000000262122 7
 
0.1%
11110-100053933 7
 
0.1%
11110-100061514 7
 
0.1%
11140-100057454 6
 
0.1%
Other values (7790) 9925
99.2%
2024-05-03T23:34:52.390410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 53893
34.4%
1 49980
31.9%
- 10000
 
6.4%
4 8449
 
5.4%
2 6097
 
3.9%
3 5789
 
3.7%
7 5466
 
3.5%
5 5276
 
3.4%
9 4336
 
2.8%
8 3876
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 146822
93.6%
Dash Punctuation 10000
 
6.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 53893
36.7%
1 49980
34.0%
4 8449
 
5.8%
2 6097
 
4.2%
3 5789
 
3.9%
7 5466
 
3.7%
5 5276
 
3.6%
9 4336
 
3.0%
8 3876
 
2.6%
6 3660
 
2.5%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 156822
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 53893
34.4%
1 49980
31.9%
- 10000
 
6.4%
4 8449
 
5.4%
2 6097
 
3.9%
3 5789
 
3.7%
7 5466
 
3.5%
5 5276
 
3.4%
9 4336
 
2.8%
8 3876
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 156822
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 53893
34.4%
1 49980
31.9%
- 10000
 
6.4%
4 8449
 
5.4%
2 6097
 
3.9%
3 5789
 
3.7%
7 5466
 
3.5%
5 5276
 
3.4%
9 4336
 
2.8%
8 3876
 
2.5%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
4781 
3
2719 
2
2500 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 4781
47.8%
3 2719
27.2%
2 2500
25.0%

Length

2024-05-03T23:34:52.757527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-03T23:34:53.134346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 4781
47.8%
3 2719
27.2%
2 2500
25.0%
Distinct143
Distinct (%)1.4%
Missing25
Missing (%)0.2%
Memory size156.2 KiB
2024-05-03T23:34:53.622849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length5.3493734
Min length3

Characters and Unicode

Total characters53360
Distinct characters34
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)0.3%

Sample

1st rowUQA220
2nd rowUQG110
3rd rowUQH100
4th row1120
5th rowZA0014
ValueCountFrequency (%)
uqa220 722
 
7.2%
uqa001 544
 
5.5%
uqq300 501
 
5.0%
zq0001 494
 
5.0%
uqa122 479
 
4.8%
uqi100 467
 
4.7%
1120 457
 
4.6%
uoa120 401
 
4.0%
uqq310 386
 
3.9%
uqa121 316
 
3.2%
Other values (133) 5208
52.2%
2024-05-03T23:34:54.548445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 13452
25.2%
1 10122
19.0%
Q 6456
12.1%
U 6343
11.9%
2 5733
10.7%
A 3774
 
7.1%
3 1873
 
3.5%
Z 1124
 
2.1%
O 552
 
1.0%
I 479
 
0.9%
Other values (24) 3452
 
6.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 32142
60.2%
Uppercase Letter 21218
39.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
Q 6456
30.4%
U 6343
29.9%
A 3774
17.8%
Z 1124
 
5.3%
O 552
 
2.6%
I 479
 
2.3%
H 375
 
1.8%
D 372
 
1.8%
G 367
 
1.7%
X 315
 
1.5%
Other values (14) 1061
 
5.0%
Decimal Number
ValueCountFrequency (%)
0 13452
41.9%
1 10122
31.5%
2 5733
17.8%
3 1873
 
5.8%
4 308
 
1.0%
6 259
 
0.8%
9 148
 
0.5%
7 121
 
0.4%
8 83
 
0.3%
5 43
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 32142
60.2%
Latin 21218
39.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
Q 6456
30.4%
U 6343
29.9%
A 3774
17.8%
Z 1124
 
5.3%
O 552
 
2.6%
I 479
 
2.3%
H 375
 
1.8%
D 372
 
1.8%
G 367
 
1.7%
X 315
 
1.5%
Other values (14) 1061
 
5.0%
Common
ValueCountFrequency (%)
0 13452
41.9%
1 10122
31.5%
2 5733
17.8%
3 1873
 
5.8%
4 308
 
1.0%
6 259
 
0.8%
9 148
 
0.5%
7 121
 
0.4%
8 83
 
0.3%
5 43
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 53360
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 13452
25.2%
1 10122
19.0%
Q 6456
12.1%
U 6343
11.9%
2 5733
10.7%
A 3774
 
7.1%
3 1873
 
3.5%
Z 1124
 
2.1%
O 552
 
1.0%
I 479
 
0.9%
Other values (24) 3452
 
6.5%

대표_여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
6614 
0
3052 
<NA>
 
334

Length

Max length4
Median length1
Mean length1.1002
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 6614
66.1%
0 3052
30.5%
<NA> 334
 
3.3%

Length

2024-05-03T23:34:54.889855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-03T23:34:55.077441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 6614
66.1%
0 3052
30.5%
na 334
 
3.3%

주_동_구분_코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9666 
<NA>
 
334

Length

Max length4
Median length1
Mean length1.1002
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9666
96.7%
<NA> 334
 
3.3%

Length

2024-05-03T23:34:55.377224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-03T23:34:55.563105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9666
96.7%
na 334
 
3.3%
Distinct120
Distinct (%)1.2%
Missing25
Missing (%)0.2%
Memory size156.2 KiB
2024-05-03T23:34:55.854344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length15
Mean length6.9029574
Min length3

Characters and Unicode

Total characters68857
Distinct characters139
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)0.3%

Sample

1st row일반상업지역
2nd row중심지미관지구
3rd row고도지구
4th row일반상업지역
5th row역사도심
ValueCountFrequency (%)
일반상업지역 1179
 
11.4%
도시지역 912
 
8.8%
제2종일반주거지역 720
 
6.9%
방화지구 691
 
6.7%
지구단위계획구역 528
 
5.1%
제1종지구단위계획구역 511
 
4.9%
중점경관관리구역 494
 
4.8%
중심지미관지구 442
 
4.3%
제1종일반주거지역 438
 
4.2%
최고고도지구 302
 
2.9%
Other values (116) 4144
40.0%
2024-05-03T23:34:56.728482image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8396
 
12.2%
7734
 
11.2%
6324
 
9.2%
2951
 
4.3%
2951
 
4.3%
2447
 
3.6%
2128
 
3.1%
2081
 
3.0%
2071
 
3.0%
1894
 
2.8%
Other values (129) 29880
43.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66086
96.0%
Decimal Number 2254
 
3.3%
Space Separator 386
 
0.6%
Close Punctuation 55
 
0.1%
Open Punctuation 55
 
0.1%
Other Punctuation 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8396
 
12.7%
7734
 
11.7%
6324
 
9.6%
2951
 
4.5%
2951
 
4.5%
2447
 
3.7%
2128
 
3.2%
2081
 
3.1%
2071
 
3.1%
1894
 
2.9%
Other values (121) 27109
41.0%
Decimal Number
ValueCountFrequency (%)
1 1097
48.7%
2 748
33.2%
3 282
 
12.5%
4 127
 
5.6%
Space Separator
ValueCountFrequency (%)
386
100.0%
Close Punctuation
ValueCountFrequency (%)
) 55
100.0%
Open Punctuation
ValueCountFrequency (%)
( 55
100.0%
Other Punctuation
ValueCountFrequency (%)
? 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66086
96.0%
Common 2771
 
4.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8396
 
12.7%
7734
 
11.7%
6324
 
9.6%
2951
 
4.5%
2951
 
4.5%
2447
 
3.7%
2128
 
3.2%
2081
 
3.1%
2071
 
3.1%
1894
 
2.9%
Other values (121) 27109
41.0%
Common
ValueCountFrequency (%)
1 1097
39.6%
2 748
27.0%
386
 
13.9%
3 282
 
10.2%
4 127
 
4.6%
) 55
 
2.0%
( 55
 
2.0%
? 21
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66086
96.0%
ASCII 2771
 
4.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8396
 
12.7%
7734
 
11.7%
6324
 
9.6%
2951
 
4.5%
2951
 
4.5%
2447
 
3.7%
2128
 
3.2%
2081
 
3.1%
2071
 
3.1%
1894
 
2.9%
Other values (121) 27109
41.0%
ASCII
ValueCountFrequency (%)
1 1097
39.6%
2 748
27.0%
386
 
13.9%
3 282
 
10.2%
4 127
 
4.6%
) 55
 
2.0%
( 55
 
2.0%
? 21
 
0.8%

Correlations

2024-05-03T23:34:56.996066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부
지역지구구역_구분_코드1.0000.047
대표_여부0.0471.000
2024-05-03T23:34:57.214936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부주_동_구분_코드
지역지구구역_구분_코드1.0000.0771.000
대표_여부0.0771.0001.000
주_동_구분_코드1.0001.0001.000
2024-05-03T23:34:57.478822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부주_동_구분_코드
지역지구구역_구분_코드1.0000.0771.000
대표_여부0.0771.0001.000
주_동_구분_코드1.0001.0001.000

Missing values

2024-05-03T23:34:48.552250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-03T23:34:49.026196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-03T23:34:49.553232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_지역지구구역_pk관리_허가대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부주_동_구분_코드지역지구구역_명
9971111170-10002040911170-1000220911UQA22000일반상업지역
2749111110-10006991211110-1000290602UQG11000중심지미관지구
151311110-100000000000000009055811110-1000561722UQH10010고도지구
8843411140-69411140-4761112010일반상업지역
3661011110-10009920411110-1000467521ZA001400역사도심
2112111110-10004391811110-1000220493UOA110<NA><NA>절대정화구역
2531511110-10006089511110-1000273891UQA22010일반상업지역
6920511140-10004285111140-1000447522UQH11010최고고도지구
5458711110-913011110-46221102210제2종일반주거지역
2200511110-10004765711110-1000161922UQI10000방화지구
관리_지역지구구역_pk관리_허가대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부주_동_구분_코드지역지구구역_명
4244211110-10011382611110-1000542721UQA43010자연녹지지역
5168511110-600411110-30421102110제1종일반주거지역
2960511110-10007792311110-1000269192UQI10010방화지구
9670511170-10000391111170-1000059801102200제2종일반주거지역
7803611140-10008448511140-1000772181ZA001400역사도심
4878111110-319311110-14401102110제1종일반주거지역
8284111140-131511140-10611102000일반주거지역
975411110-10000769511110-1000055141101110제1종전용주거지역
7778011140-10008350211140-1000781191UQA22010일반상업지역
4541911110-10012120911110-1000597321UQA11110제1종전용주거지역