Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells1848
Missing cells (%)2.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text4
Categorical2
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15655/S/1/datasetView.do

Alerts

대표_여부 is highly imbalanced (78.1%)Imbalance
지역지구구역_코드 has 629 (6.3%) missing valuesMissing
기타_지역지구구역 has 1219 (12.2%) missing valuesMissing
관리_지역지구구역 has unique valuesUnique

Reproduction

Analysis started2024-05-04 02:56:47.097427
Analysis finished2024-05-04 02:56:50.057525
Duration2.96 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-04T02:56:50.741163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length11.2789
Min length8

Characters and Unicode

Total characters112789
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11620-46961
2nd row11620-35514
3rd row11140-21174
4th row11380-7844
5th row11530-13240
ValueCountFrequency (%)
11620-46961 1
 
< 0.1%
11620-33799 1
 
< 0.1%
11545-25718 1
 
< 0.1%
11320-13690 1
 
< 0.1%
11620-30413 1
 
< 0.1%
11140-12675 1
 
< 0.1%
11260-10786 1
 
< 0.1%
11545-11552 1
 
< 0.1%
11530-6175 1
 
< 0.1%
11290-18424 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-04T02:56:52.049701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 31002
27.5%
0 16413
14.6%
5 10157
 
9.0%
- 10000
 
8.9%
2 9564
 
8.5%
4 7640
 
6.8%
3 7205
 
6.4%
6 6876
 
6.1%
9 4808
 
4.3%
7 4703
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 102789
91.1%
Dash Punctuation 10000
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 31002
30.2%
0 16413
16.0%
5 10157
 
9.9%
2 9564
 
9.3%
4 7640
 
7.4%
3 7205
 
7.0%
6 6876
 
6.7%
9 4808
 
4.7%
7 4703
 
4.6%
8 4421
 
4.3%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 112789
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 31002
27.5%
0 16413
14.6%
5 10157
 
9.0%
- 10000
 
8.9%
2 9564
 
8.5%
4 7640
 
6.8%
3 7205
 
6.4%
6 6876
 
6.1%
9 4808
 
4.3%
7 4703
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 112789
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 31002
27.5%
0 16413
14.6%
5 10157
 
9.0%
- 10000
 
8.9%
2 9564
 
8.5%
4 7640
 
6.8%
3 7205
 
6.4%
6 6876
 
6.1%
9 4808
 
4.3%
7 4703
 
4.2%
Distinct8469
Distinct (%)84.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-04T02:56:52.910173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length10.7347
Min length7

Characters and Unicode

Total characters107347
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7032 ?
Unique (%)70.3%

Sample

1st row11620-30925
2nd row11620-23885
3rd row11140-412
4th row11380-23504
5th row11530-589
ValueCountFrequency (%)
11110-660 6
 
0.1%
11500-855 6
 
0.1%
11500-583 5
 
< 0.1%
11545-100184705 4
 
< 0.1%
11500-100205108 4
 
< 0.1%
11500-493 4
 
< 0.1%
11500-100214647 4
 
< 0.1%
11170-100200885 4
 
< 0.1%
11170-6854 4
 
< 0.1%
11500-100259112 4
 
< 0.1%
Other values (8459) 9955
99.6%
2024-05-04T02:56:54.066796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 29894
27.8%
0 13457
12.5%
2 10616
 
9.9%
- 10000
 
9.3%
5 9128
 
8.5%
4 6813
 
6.3%
6 6792
 
6.3%
3 6486
 
6.0%
9 4840
 
4.5%
7 4753
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 97347
90.7%
Dash Punctuation 10000
 
9.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 29894
30.7%
0 13457
13.8%
2 10616
 
10.9%
5 9128
 
9.4%
4 6813
 
7.0%
6 6792
 
7.0%
3 6486
 
6.7%
9 4840
 
5.0%
7 4753
 
4.9%
8 4568
 
4.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 107347
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 29894
27.8%
0 13457
12.5%
2 10616
 
9.9%
- 10000
 
9.3%
5 9128
 
8.5%
4 6813
 
6.3%
6 6792
 
6.3%
3 6486
 
6.0%
9 4840
 
4.5%
7 4753
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 107347
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 29894
27.8%
0 13457
12.5%
2 10616
 
9.9%
- 10000
 
9.3%
5 9128
 
8.5%
4 6813
 
6.3%
6 6792
 
6.3%
3 6486
 
6.0%
9 4840
 
4.5%
7 4753
 
4.4%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
6208 
2
3478 
3
 
314

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 6208
62.1%
2 3478
34.8%
3 314
 
3.1%

Length

2024-05-04T02:56:54.549665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T02:56:54.830019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 6208
62.1%
2 3478
34.8%
3 314
 
3.1%
Distinct110
Distinct (%)1.2%
Missing629
Missing (%)6.3%
Memory size156.2 KiB
2024-05-04T02:56:55.350517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length3.7970334
Min length2

Characters and Unicode

Total characters35582
Distinct characters31
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)0.3%

Sample

1st row1022
2nd row111
3rd row1120
4th row1020
5th row150
ValueCountFrequency (%)
1020 2079
22.2%
1022 2011
21.5%
260 1033
11.0%
990 938
10.0%
1021 498
 
5.3%
112 432
 
4.6%
111 303
 
3.2%
1023 239
 
2.6%
uqa001 233
 
2.5%
uqa122 170
 
1.8%
Other values (100) 1435
15.3%
2024-05-04T02:56:56.600175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 10987
30.9%
2 9229
25.9%
1 9048
25.4%
9 1917
 
5.4%
6 1047
 
2.9%
3 884
 
2.5%
U 732
 
2.1%
Q 610
 
1.7%
A 564
 
1.6%
7 110
 
0.3%
Other values (21) 454
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 33358
93.7%
Uppercase Letter 2224
 
6.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 732
32.9%
Q 610
27.4%
A 564
25.4%
N 52
 
2.3%
H 45
 
2.0%
O 41
 
1.8%
G 37
 
1.7%
E 26
 
1.2%
L 25
 
1.1%
M 18
 
0.8%
Other values (11) 74
 
3.3%
Decimal Number
ValueCountFrequency (%)
0 10987
32.9%
2 9229
27.7%
1 9048
27.1%
9 1917
 
5.7%
6 1047
 
3.1%
3 884
 
2.7%
7 110
 
0.3%
4 82
 
0.2%
5 41
 
0.1%
8 13
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 33358
93.7%
Latin 2224
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 732
32.9%
Q 610
27.4%
A 564
25.4%
N 52
 
2.3%
H 45
 
2.0%
O 41
 
1.8%
G 37
 
1.7%
E 26
 
1.2%
L 25
 
1.1%
M 18
 
0.8%
Other values (11) 74
 
3.3%
Common
ValueCountFrequency (%)
0 10987
32.9%
2 9229
27.7%
1 9048
27.1%
9 1917
 
5.7%
6 1047
 
3.1%
3 884
 
2.7%
7 110
 
0.3%
4 82
 
0.2%
5 41
 
0.1%
8 13
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 35582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10987
30.9%
2 9229
25.9%
1 9048
25.4%
9 1917
 
5.4%
6 1047
 
2.9%
3 884
 
2.5%
U 732
 
2.1%
Q 610
 
1.7%
A 564
 
1.6%
7 110
 
0.3%
Other values (21) 454
 
1.3%

대표_여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
9383 
0
 
598
<NA>
 
19

Length

Max length4
Median length1
Mean length1.0057
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 9383
93.8%
0 598
 
6.0%
<NA> 19
 
0.2%

Length

2024-05-04T02:56:57.083071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T02:56:57.485620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 9383
93.8%
0 598
 
6.0%
na 19
 
0.2%
Distinct211
Distinct (%)2.4%
Missing1219
Missing (%)12.2%
Memory size156.2 KiB
2024-05-04T02:56:58.113241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length7.2948411
Min length2

Characters and Unicode

Total characters64056
Distinct characters159
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique112 ?
Unique (%)1.3%

Sample

1st row2종일반주거지역
2nd row고도지구기타
3rd row일반주거지역
4th row공항고도지구<진입표면>
5th row고도지구기타
ValueCountFrequency (%)
일반주거지역 1756
19.8%
2종일반주거지역 1051
11.9%
고도지구기타(공항고도지구 823
9.3%
일반주거 697
 
7.9%
제2종일반주거지역 612
 
6.9%
주차장정비지구 607
 
6.8%
주차장정비 471
 
5.3%
고도지구기타 427
 
4.8%
도시지역 325
 
3.7%
1종일반주거지역 240
 
2.7%
Other values (204) 1853
20.9%
2024-05-04T02:56:59.502641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8309
13.0%
6061
 
9.5%
5039
 
7.9%
4955
 
7.7%
4755
 
7.4%
4749
 
7.4%
3727
 
5.8%
2658
 
4.1%
2437
 
3.8%
2306
 
3.6%
Other values (149) 19060
29.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59742
93.3%
Decimal Number 2331
 
3.6%
Close Punctuation 940
 
1.5%
Open Punctuation 939
 
1.5%
Space Separator 81
 
0.1%
Other Punctuation 19
 
< 0.1%
Math Symbol 2
 
< 0.1%
Lowercase Letter 1
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8309
13.9%
6061
10.1%
5039
 
8.4%
4955
 
8.3%
4755
 
8.0%
4749
 
7.9%
3727
 
6.2%
2658
 
4.4%
2437
 
4.1%
2306
 
3.9%
Other values (129) 14746
24.7%
Decimal Number
ValueCountFrequency (%)
2 1703
73.1%
1 416
 
17.8%
3 192
 
8.2%
4 12
 
0.5%
7 5
 
0.2%
6 1
 
< 0.1%
5 1
 
< 0.1%
0 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
, 13
68.4%
/ 5
 
26.3%
? 1
 
5.3%
Close Punctuation
ValueCountFrequency (%)
) 939
99.9%
] 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 938
99.9%
[ 1
 
0.1%
Math Symbol
ValueCountFrequency (%)
< 1
50.0%
> 1
50.0%
Space Separator
ValueCountFrequency (%)
81
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
M 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59738
93.3%
Common 4312
 
6.7%
Han 4
 
< 0.1%
Latin 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8309
13.9%
6061
10.1%
5039
 
8.4%
4955
 
8.3%
4755
 
8.0%
4749
 
7.9%
3727
 
6.2%
2658
 
4.4%
2437
 
4.1%
2306
 
3.9%
Other values (125) 14742
24.7%
Common
ValueCountFrequency (%)
2 1703
39.5%
) 939
21.8%
( 938
21.8%
1 416
 
9.6%
3 192
 
4.5%
81
 
1.9%
, 13
 
0.3%
4 12
 
0.3%
/ 5
 
0.1%
7 5
 
0.1%
Other values (8) 8
 
0.2%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Latin
ValueCountFrequency (%)
m 1
50.0%
M 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59738
93.3%
ASCII 4314
 
6.7%
CJK 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8309
13.9%
6061
10.1%
5039
 
8.4%
4955
 
8.3%
4755
 
8.0%
4749
 
7.9%
3727
 
6.2%
2658
 
4.4%
2437
 
4.1%
2306
 
3.9%
Other values (125) 14742
24.7%
ASCII
ValueCountFrequency (%)
2 1703
39.5%
) 939
21.8%
( 938
21.7%
1 416
 
9.6%
3 192
 
4.5%
81
 
1.9%
, 13
 
0.3%
4 12
 
0.3%
/ 5
 
0.1%
7 5
 
0.1%
Other values (10) 10
 
0.2%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

작업_일자
Real number (ℝ)

Distinct458
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20124887
Minimum20111227
Maximum20160730
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-04T02:57:00.048040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20111227
5-th percentile20111227
Q120111227
median20120530
Q320140724
95-th percentile20151202
Maximum20160730
Range49503
Interquartile range (IQR)29497

Descriptive statistics

Standard deviation16439.994
Coefficient of variation (CV)0.00081689873
Kurtosis-0.91818257
Mean20124887
Median Absolute Deviation (MAD)9303
Skewness0.79037545
Sum2.0124887 × 1011
Variance2.7027342 × 108
MonotonicityNot monotonic
2024-05-04T02:57:00.679472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20111227 4808
48.1%
20120825 807
 
8.1%
20120530 277
 
2.8%
20150116 214
 
2.1%
20121222 211
 
2.1%
20140724 197
 
2.0%
20150107 139
 
1.4%
20141115 109
 
1.1%
20141217 80
 
0.8%
20120920 80
 
0.8%
Other values (448) 3078
30.8%
ValueCountFrequency (%)
20111227 4808
48.1%
20120102 6
 
0.1%
20120104 1
 
< 0.1%
20120110 4
 
< 0.1%
20120111 1
 
< 0.1%
20120112 7
 
0.1%
20120113 2
 
< 0.1%
20120117 3
 
< 0.1%
20120119 1
 
< 0.1%
20120120 1
 
< 0.1%
ValueCountFrequency (%)
20160730 9
0.1%
20160727 1
 
< 0.1%
20160726 5
 
0.1%
20160723 15
0.1%
20160720 10
0.1%
20160716 1
 
< 0.1%
20160712 1
 
< 0.1%
20160709 11
0.1%
20160706 7
0.1%
20160702 3
 
< 0.1%

Interactions

2024-05-04T02:56:48.441162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-04T02:57:01.095474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부작업_일자
지역지구구역_구분_코드1.0000.0790.193
대표_여부0.0791.0000.106
작업_일자0.1930.1061.000
2024-05-04T02:57:01.435575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부
지역지구구역_구분_코드1.0000.132
대표_여부0.1321.000
2024-05-04T02:57:01.771019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
작업_일자지역지구구역_구분_코드대표_여부
작업_일자1.0000.0870.142
지역지구구역_구분_코드0.0871.0000.132
대표_여부0.1420.1321.000

Missing values

2024-05-04T02:56:49.125734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T02:56:49.551250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-04T02:56:49.884409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_지역지구구역관리_건축물대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
1875311620-4696111620-309251102212종일반주거지역20121222
2765211620-3551411620-2388521111고도지구기타20151202
1568511140-2117411140-412111201<NA>20120825
1009311380-784411380-23504110201일반주거지역20111227
2132611530-1324011530-58921501공항고도지구<진입표면>20140521
1853811620-3045611620-2073421111고도지구기타20121222
2561311110-1027711110-203501102212종일반주거지역20150403
616511140-1459011140-192271<NA>1일반주거20111227
1657811380-289811380-10274110201일반주거지역20120825
2774011530-449011530-10875110201일반주거지역20151216
관리_지역지구구역관리_건축물대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
815211545-2276611545-1237029901고도지구기타(공항고도지구)20111227
2079411170-10005881411170-19118110221<NA>20140131
1218711545-2838211545-150971102212종일반주거지역20111227
2142911140-1804211140-22905111201<NA>20140618
523411620-687411620-61001102313종일반주거지역20111227
1993611380-212711380-6819110201일반주거지역20131008
777511620-3093611620-2102921121고도지구기타20111227
1426811260-83311260-2250110201일반주거20120427
358111620-4894711620-3212933011제1종지구단위계획구역20111227
2680011260-905411260-16401110201일반주거지역20150801