Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells2984
Missing cells (%)4.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory634.8 KiB
Average record size in memory65.0 B

Variable types

Text3
Categorical3
Numeric1

Dataset

Description관리_지역지구구역_pk,관리_폐쇄말소대장_pk,지역지구구역_구분_코드,지역지구구역_코드,대표_여부,기타_지역지구구역,작업_일자
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15658/S/1/datasetView.do

Alerts

지역지구구역_코드 is highly overall correlated with 지역지구구역_구분_코드 and 1 other fieldsHigh correlation
대표_여부 is highly overall correlated with 지역지구구역_구분_코드 and 1 other fieldsHigh correlation
지역지구구역_구분_코드 is highly overall correlated with 지역지구구역_코드 and 1 other fieldsHigh correlation
지역지구구역_코드 is highly imbalanced (52.7%)Imbalance
대표_여부 is highly imbalanced (63.5%)Imbalance
기타_지역지구구역 has 2984 (29.8%) missing valuesMissing
관리_지역지구구역_pk has unique valuesUnique

Reproduction

Analysis started2024-05-10 23:20:51.030957
Analysis finished2024-05-10 23:20:53.698916
Duration2.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-10T23:20:54.044648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length15.6247
Min length8

Characters and Unicode

Total characters156247
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11230-4925
2nd row11530-100101164
3rd row11710-100004205
4th row11545-100024279
5th row11230-100165161
ValueCountFrequency (%)
11230-4925 1
 
< 0.1%
11680-31860 1
 
< 0.1%
11620-1000000000000001315061 1
 
< 0.1%
11230-100164344 1
 
< 0.1%
11170-100136529 1
 
< 0.1%
11110-100058736 1
 
< 0.1%
11260-100102949 1
 
< 0.1%
11530-100055404 1
 
< 0.1%
11230-100160052 1
 
< 0.1%
11290-15206 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-10T23:20:55.073299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 48830
31.3%
1 39486
25.3%
- 10000
 
6.4%
2 9713
 
6.2%
5 8447
 
5.4%
3 7659
 
4.9%
4 7420
 
4.7%
6 7245
 
4.6%
7 5962
 
3.8%
8 5892
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 146247
93.6%
Dash Punctuation 10000
 
6.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 48830
33.4%
1 39486
27.0%
2 9713
 
6.6%
5 8447
 
5.8%
3 7659
 
5.2%
4 7420
 
5.1%
6 7245
 
5.0%
7 5962
 
4.1%
8 5892
 
4.0%
9 5593
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 156247
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 48830
31.3%
1 39486
25.3%
- 10000
 
6.4%
2 9713
 
6.2%
5 8447
 
5.4%
3 7659
 
4.9%
4 7420
 
4.7%
6 7245
 
4.6%
7 5962
 
3.8%
8 5892
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 156247
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 48830
31.3%
1 39486
25.3%
- 10000
 
6.4%
2 9713
 
6.2%
5 8447
 
5.4%
3 7659
 
4.9%
4 7420
 
4.7%
6 7245
 
4.6%
7 5962
 
3.8%
8 5892
 
3.8%
Distinct8690
Distinct (%)86.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-10T23:20:55.650879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length16.0775
Min length8

Characters and Unicode

Total characters160775
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7504 ?
Unique (%)75.0%

Sample

1st row11230-2613
2nd row11530-101111854
3rd row11710-100007081
4th row11545-100024393
5th row11230-100482561
ValueCountFrequency (%)
11140-24215 4
 
< 0.1%
11500-100060167 4
 
< 0.1%
11500-100060197 4
 
< 0.1%
11110-1000000000000001440830 4
 
< 0.1%
11500-100311009 4
 
< 0.1%
11110-1000000000000001431272 4
 
< 0.1%
11500-100780413 4
 
< 0.1%
11680-101106855 4
 
< 0.1%
11650-100743657 3
 
< 0.1%
11230-25735 3
 
< 0.1%
Other values (8680) 9962
99.6%
2024-05-10T23:20:56.659048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 51262
31.9%
1 37156
23.1%
2 10318
 
6.4%
- 10000
 
6.2%
5 9103
 
5.7%
3 8990
 
5.6%
4 8585
 
5.3%
6 7158
 
4.5%
7 6324
 
3.9%
8 6264
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 150775
93.8%
Dash Punctuation 10000
 
6.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 51262
34.0%
1 37156
24.6%
2 10318
 
6.8%
5 9103
 
6.0%
3 8990
 
6.0%
4 8585
 
5.7%
6 7158
 
4.7%
7 6324
 
4.2%
8 6264
 
4.2%
9 5615
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 160775
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 51262
31.9%
1 37156
23.1%
2 10318
 
6.4%
- 10000
 
6.2%
5 9103
 
5.7%
3 8990
 
5.6%
4 8585
 
5.3%
6 7158
 
4.5%
7 6324
 
3.9%
8 6264
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 160775
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 51262
31.9%
1 37156
23.1%
2 10318
 
6.4%
- 10000
 
6.2%
5 9103
 
5.7%
3 8990
 
5.6%
4 8585
 
5.3%
6 7158
 
4.5%
7 6324
 
3.9%
8 6264
 
3.9%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
용도지역코드
6062 
용도지구코드
2773 
용도구역코드
1155 
1
 
5
2
 
3

Length

Max length6
Median length6
Mean length5.995
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row용도지역코드
2nd row용도지역코드
3rd row용도지역코드
4th row용도지역코드
5th row용도지구코드

Common Values

ValueCountFrequency (%)
용도지역코드 6062
60.6%
용도지구코드 2773
27.7%
용도구역코드 1155
 
11.6%
1 5
 
0.1%
2 3
 
< 0.1%
3 2
 
< 0.1%

Length

2024-05-10T23:20:57.213104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T23:20:57.616702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
용도지역코드 6062
60.6%
용도지구코드 2773
27.7%
용도구역코드 1155
 
11.6%
1 5
 
< 0.1%
2 3
 
< 0.1%
3 2
 
< 0.1%

지역지구구역_코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct49
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
4620 
제2종일반주거지역
1281 
도시지역
1273 
일반주거지역
1095 
일반상업지역
 
382
Other values (44)
1349 

Length

Max length15
Median length4
Mean length5.4089
Min length3

Unique

Unique15 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row제3종일반주거지역
3rd row<NA>
4th row준주거지역
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 4620
46.2%
제2종일반주거지역 1281
 
12.8%
도시지역 1273
 
12.7%
일반주거지역 1095
 
10.9%
일반상업지역 382
 
3.8%
제3종일반주거지역 327
 
3.3%
제1종일반주거지역 248
 
2.5%
준주거지역 162
 
1.6%
자연녹지지역 156
 
1.6%
준공업지역 142
 
1.4%
Other values (39) 314
 
3.1%

Length

2024-05-10T23:20:58.373824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 4620
46.1%
제2종일반주거지역 1281
 
12.8%
도시지역 1273
 
12.7%
일반주거지역 1095
 
10.9%
일반상업지역 382
 
3.8%
제3종일반주거지역 327
 
3.3%
제1종일반주거지역 248
 
2.5%
준주거지역 162
 
1.6%
자연녹지지역 156
 
1.6%
준공업지역 142
 
1.4%
Other values (42) 333
 
3.3%

대표_여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대표
8014 
<NA>
1976 
1
 
6
0
 
4

Length

Max length4
Median length2
Mean length2.3942
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대표
2nd row대표
3rd row대표
4th row대표
5th row대표

Common Values

ValueCountFrequency (%)
대표 8014
80.1%
<NA> 1976
 
19.8%
1 6
 
0.1%
0 4
 
< 0.1%

Length

2024-05-10T23:20:58.856704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T23:20:59.254119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대표 8014
80.1%
na 1976
 
19.8%
1 6
 
0.1%
0 4
 
< 0.1%
Distinct259
Distinct (%)3.7%
Missing2984
Missing (%)29.8%
Memory size156.2 KiB
2024-05-10T23:20:59.793136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length59
Median length26
Mean length6.6506556
Min length2

Characters and Unicode

Total characters46661
Distinct characters177
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)1.6%

Sample

1st row제3종일반주거지역
2nd row주거지역
3rd row준주거지역
4th row공항시설보호지구
5th row일반주거지역
ValueCountFrequency (%)
도시지역 1193
16.8%
일반주거지역 1184
16.7%
주차장정비지구 373
 
5.2%
제2종일반주거지역 333
 
4.7%
일반주거 322
 
4.5%
공항시설보호지구 240
 
3.4%
2종일반주거지역 223
 
3.1%
주차장정비 191
 
2.7%
일반상업지역 177
 
2.5%
준공업지역 150
 
2.1%
Other values (263) 2724
38.3%
2024-05-10T23:21:00.894514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6472
 
13.9%
4749
 
10.2%
3411
 
7.3%
2822
 
6.0%
2812
 
6.0%
2808
 
6.0%
2635
 
5.6%
1887
 
4.0%
1535
 
3.3%
1319
 
2.8%
Other values (167) 16211
34.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 44058
94.4%
Decimal Number 1454
 
3.1%
Open Punctuation 496
 
1.1%
Close Punctuation 496
 
1.1%
Space Separator 94
 
0.2%
Other Punctuation 53
 
0.1%
Dash Punctuation 4
 
< 0.1%
Lowercase Letter 4
 
< 0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6472
14.7%
4749
 
10.8%
3411
 
7.7%
2822
 
6.4%
2812
 
6.4%
2808
 
6.4%
2635
 
6.0%
1887
 
4.3%
1535
 
3.5%
1319
 
3.0%
Other values (149) 13608
30.9%
Decimal Number
ValueCountFrequency (%)
2 692
47.6%
1 383
26.3%
3 224
 
15.4%
7 103
 
7.1%
4 37
 
2.5%
5 7
 
0.5%
6 7
 
0.5%
0 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
, 46
86.8%
: 3
 
5.7%
/ 3
 
5.7%
. 1
 
1.9%
Open Punctuation
ValueCountFrequency (%)
( 496
100.0%
Close Punctuation
ValueCountFrequency (%)
) 496
100.0%
Space Separator
ValueCountFrequency (%)
94
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 4
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 44058
94.4%
Common 2599
 
5.6%
Latin 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6472
14.7%
4749
 
10.8%
3411
 
7.7%
2822
 
6.4%
2812
 
6.4%
2808
 
6.4%
2635
 
6.0%
1887
 
4.3%
1535
 
3.5%
1319
 
3.0%
Other values (149) 13608
30.9%
Common
ValueCountFrequency (%)
2 692
26.6%
( 496
19.1%
) 496
19.1%
1 383
14.7%
3 224
 
8.6%
7 103
 
4.0%
94
 
3.6%
, 46
 
1.8%
4 37
 
1.4%
5 7
 
0.3%
Other values (7) 21
 
0.8%
Latin
ValueCountFrequency (%)
m 4
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 44057
94.4%
ASCII 2603
 
5.6%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6472
14.7%
4749
 
10.8%
3411
 
7.7%
2822
 
6.4%
2812
 
6.4%
2808
 
6.4%
2635
 
6.0%
1887
 
4.3%
1535
 
3.5%
1319
 
3.0%
Other values (148) 13607
30.9%
ASCII
ValueCountFrequency (%)
2 692
26.6%
( 496
19.1%
) 496
19.1%
1 383
14.7%
3 224
 
8.6%
7 103
 
4.0%
94
 
3.6%
, 46
 
1.8%
4 37
 
1.4%
5 7
 
0.3%
Other values (8) 25
 
1.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

작업_일자
Real number (ℝ)

Distinct178
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20220611
Minimum20201201
Maximum20240510
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-10T23:21:01.365881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20201201
5-th percentile20210126
Q120211029
median20220426
Q320230831
95-th percentile20231124
Maximum20240510
Range39309
Interquartile range (IQR)19802

Descriptive statistics

Standard deviation9644.9286
Coefficient of variation (CV)0.00047698503
Kurtosis-1.0386875
Mean20220611
Median Absolute Deviation (MAD)9397
Skewness0.1861971
Sum2.0220611 × 1011
Variance93024647
MonotonicityNot monotonic
2024-05-10T23:21:01.870464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20211029 2903
29.0%
20231104 872
 
8.7%
20231124 574
 
5.7%
20231110 287
 
2.9%
20231028 222
 
2.2%
20221223 149
 
1.5%
20220107 143
 
1.4%
20220301 135
 
1.4%
20230321 128
 
1.3%
20220304 113
 
1.1%
Other values (168) 4474
44.7%
ValueCountFrequency (%)
20201201 37
 
0.4%
20201204 26
 
0.3%
20201208 27
 
0.3%
20201216 55
0.5%
20201230 107
1.1%
20210106 49
0.5%
20210108 54
0.5%
20210119 76
0.8%
20210126 71
0.7%
20210130 39
 
0.4%
ValueCountFrequency (%)
20240510 13
 
0.1%
20240507 12
 
0.1%
20240425 11
 
0.1%
20240420 17
 
0.2%
20240417 4
 
< 0.1%
20240411 9
 
0.1%
20240406 3
 
< 0.1%
20240402 5
 
0.1%
20240330 44
0.4%
20240327 55
0.5%

Interactions

2024-05-10T23:20:52.445812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-10T23:21:02.161323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부작업_일자
지역지구구역_구분_코드1.0001.0000.9430.108
지역지구구역_코드1.0001.0001.0000.304
대표_여부0.9431.0001.0000.049
작업_일자0.1080.3040.0491.000
2024-05-10T23:21:02.410781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_코드대표_여부지역지구구역_구분_코드
지역지구구역_코드1.0000.9960.987
대표_여부0.9961.0000.712
지역지구구역_구분_코드0.9870.7121.000
2024-05-10T23:21:02.746836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
작업_일자지역지구구역_구분_코드지역지구구역_코드대표_여부
작업_일자1.0000.0400.1270.020
지역지구구역_구분_코드0.0401.0000.9870.712
지역지구구역_코드0.1270.9871.0000.996
대표_여부0.0200.7120.9961.000

Missing values

2024-05-10T23:20:52.976802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-10T23:20:53.489374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_지역지구구역_pk관리_폐쇄말소대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
1944211230-492511230-2613용도지역코드<NA>대표<NA>20211029
3862211530-10010116411530-101111854용도지역코드제3종일반주거지역대표제3종일반주거지역20211029
5496411710-10000420511710-100007081용도지역코드<NA>대표주거지역20201204
3914311545-10002427911545-100024393용도지역코드준주거지역대표준주거지역20220315
1879211230-10016516111230-100482561용도지구코드<NA>대표<NA>20220614
3147011500-10006671111500-100054411용도지구코드<NA><NA>공항시설보호지구20211029
4174411560-10013709611560-100835359용도지역코드일반주거지역대표일반주거지역20211029
483811170-10006411311170-100123462용도지역코드일반상업지역대표일반상업지역20231028
4001911545-10011482911545-100509335용도지구코드<NA>대표고도지구기타(공항고도지구)20211029
772111200-10010148511200-100380949용도지역코드제2종일반주거지역대표일반주거20220510
관리_지역지구구역_pk관리_폐쇄말소대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
1123111230-10008744011230-100225444용도지역코드도시지역<NA>도시지역20231104
3103311500-100000000000000175982411500-1000000000000002642503용도지구코드<NA>대표<NA>20231124
1655611230-10015819611230-100461489용도지역코드제2종일반주거지역대표<NA>20231104
2252011290-1810511290-8603용도지구코드<NA>대표<NA>20220107
750311200-10009759011200-100365884용도지역코드준주거지역대표준주거20211029
3168011500-10007389811500-100060174용도지구코드<NA>대표최고고도지구(수평,전이)20231104
4545611620-10017480611620-100322055용도지역코드일반상업지역대표일반상업지역20211029
2809511440-100000000000000007896711440-1000000000000000129920용도지구코드<NA>대표주차장정비지구20220929
3684311500-4685911500-1000000000000002658703용도지역코드도시지역<NA>도시지역20231124
4279811590-100000000000000103223111590-21945용도지구코드<NA>대표주차장정비지구20230331