Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells52
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory742.2 KiB
Average record size in memory76.0 B

Variable types

Text4
Categorical3
Numeric1

Dataset

Description관리_지역지구구역_pk,관리_주택대장_pk,지역지구구역_구분_코드,지역지구구역_코드,대표_여부,동_구분_코드,지역지구구역_명,작업_일자
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15676/S/1/datasetView.do

Alerts

지역지구구역_구분_코드 is highly overall correlated with 동_구분_코드High correlation
대표_여부 is highly overall correlated with 동_구분_코드High correlation
동_구분_코드 is highly overall correlated with 작업_일자 and 2 other fieldsHigh correlation
작업_일자 is highly overall correlated with 동_구분_코드High correlation
관리_지역지구구역_pk has unique valuesUnique

Reproduction

Analysis started2024-05-04 01:56:05.506756
Analysis finished2024-05-04 01:56:07.504561
Duration2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-04T01:56:07.686268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length9
Mean length11.2395
Min length7

Characters and Unicode

Total characters112395
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11500-444
2nd row11440-465
3rd row11740-10
4th row11530-745
5th row11530-1461
ValueCountFrequency (%)
11500-444 1
 
< 0.1%
11320-107 1
 
< 0.1%
11740-100003345 1
 
< 0.1%
11140-100001421 1
 
< 0.1%
11530-1192 1
 
< 0.1%
11590-33 1
 
< 0.1%
11170-226 1
 
< 0.1%
11500-792 1
 
< 0.1%
11000-100001361 1
 
< 0.1%
11110-1000000000000000107301 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-04T01:56:08.518694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 31305
27.9%
1 29544
26.3%
- 10000
 
8.9%
2 7534
 
6.7%
5 7471
 
6.6%
3 6590
 
5.9%
4 5492
 
4.9%
6 4787
 
4.3%
7 3616
 
3.2%
8 3165
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 102395
91.1%
Dash Punctuation 10000
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 31305
30.6%
1 29544
28.9%
2 7534
 
7.4%
5 7471
 
7.3%
3 6590
 
6.4%
4 5492
 
5.4%
6 4787
 
4.7%
7 3616
 
3.5%
8 3165
 
3.1%
9 2891
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 112395
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 31305
27.9%
1 29544
26.3%
- 10000
 
8.9%
2 7534
 
6.7%
5 7471
 
6.6%
3 6590
 
5.9%
4 5492
 
4.9%
6 4787
 
4.3%
7 3616
 
3.2%
8 3165
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 112395
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 31305
27.9%
1 29544
26.3%
- 10000
 
8.9%
2 7534
 
6.7%
5 7471
 
6.6%
3 6590
 
5.9%
4 5492
 
4.9%
6 4787
 
4.3%
7 3616
 
3.2%
8 3165
 
2.8%
Distinct2953
Distinct (%)29.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-04T01:56:09.129282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length10.4708
Min length7

Characters and Unicode

Total characters104708
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1372 ?
Unique (%)13.7%

Sample

1st row11500-18
2nd row11440-89
3rd row11740-5
4th row11530-17
5th row11530-38
ValueCountFrequency (%)
11530-17 692
 
6.9%
11305-8 456
 
4.6%
11500-18 120
 
1.2%
11620-11 103
 
1.0%
11545-2 95
 
0.9%
11710-27 79
 
0.8%
11560-33 79
 
0.8%
11200-38 69
 
0.7%
11620-1 68
 
0.7%
11620-15 61
 
0.6%
Other values (2943) 8178
81.8%
2024-05-04T01:56:09.906647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 29275
28.0%
0 26842
25.6%
- 10000
 
9.6%
5 7256
 
6.9%
2 6063
 
5.8%
3 6046
 
5.8%
4 5005
 
4.8%
6 4531
 
4.3%
7 4014
 
3.8%
8 3262
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 94708
90.4%
Dash Punctuation 10000
 
9.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 29275
30.9%
0 26842
28.3%
5 7256
 
7.7%
2 6063
 
6.4%
3 6046
 
6.4%
4 5005
 
5.3%
6 4531
 
4.8%
7 4014
 
4.2%
8 3262
 
3.4%
9 2414
 
2.5%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 104708
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 29275
28.0%
0 26842
25.6%
- 10000
 
9.6%
5 7256
 
6.9%
2 6063
 
5.8%
3 6046
 
5.8%
4 5005
 
4.8%
6 4531
 
4.3%
7 4014
 
3.8%
8 3262
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 104708
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 29275
28.0%
0 26842
25.6%
- 10000
 
9.6%
5 7256
 
6.9%
2 6063
 
5.8%
3 6046
 
5.8%
4 5005
 
4.8%
6 4531
 
4.3%
7 4014
 
3.8%
8 3262
 
3.1%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
6587 
2
2451 
3
962 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 6587
65.9%
2 2451
 
24.5%
3 962
 
9.6%

Length

2024-05-04T01:56:10.311741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T01:56:10.696811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 6587
65.9%
2 2451
 
24.5%
3 962
 
9.6%
Distinct148
Distinct (%)1.5%
Missing26
Missing (%)0.3%
Memory size156.2 KiB
2024-05-04T01:56:11.121037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length4
Mean length4.2715059
Min length3

Characters and Unicode

Total characters42604
Distinct characters35
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.4%

Sample

1st row980
2nd row1023
3rd row1020
4th row1020
5th row260
ValueCountFrequency (%)
1020 3546
35.6%
uqa122 495
 
5.0%
112 482
 
4.8%
1022 414
 
4.2%
uqa123 373
 
3.7%
103 320
 
3.2%
uqa001 309
 
3.1%
1230 301
 
3.0%
1023 284
 
2.8%
0100 277
 
2.8%
Other values (138) 3173
31.8%
2024-05-04T01:56:11.794008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 13574
31.9%
1 9563
22.4%
2 8262
19.4%
U 2525
 
5.9%
Q 2262
 
5.3%
3 2162
 
5.1%
A 1725
 
4.0%
8 414
 
1.0%
6 357
 
0.8%
9 301
 
0.7%
Other values (25) 1459
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 34922
82.0%
Uppercase Letter 7682
 
18.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U 2525
32.9%
Q 2262
29.4%
A 1725
22.5%
G 194
 
2.5%
D 167
 
2.2%
Z 120
 
1.6%
O 110
 
1.4%
N 98
 
1.3%
M 86
 
1.1%
H 77
 
1.0%
Other values (15) 318
 
4.1%
Decimal Number
ValueCountFrequency (%)
0 13574
38.9%
1 9563
27.4%
2 8262
23.7%
3 2162
 
6.2%
8 414
 
1.2%
6 357
 
1.0%
9 301
 
0.9%
5 187
 
0.5%
4 86
 
0.2%
7 16
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 34922
82.0%
Latin 7682
 
18.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U 2525
32.9%
Q 2262
29.4%
A 1725
22.5%
G 194
 
2.5%
D 167
 
2.2%
Z 120
 
1.6%
O 110
 
1.4%
N 98
 
1.3%
M 86
 
1.1%
H 77
 
1.0%
Other values (15) 318
 
4.1%
Common
ValueCountFrequency (%)
0 13574
38.9%
1 9563
27.4%
2 8262
23.7%
3 2162
 
6.2%
8 414
 
1.2%
6 357
 
1.0%
9 301
 
0.9%
5 187
 
0.5%
4 86
 
0.2%
7 16
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 42604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 13574
31.9%
1 9563
22.4%
2 8262
19.4%
U 2525
 
5.9%
Q 2262
 
5.3%
3 2162
 
5.1%
A 1725
 
4.0%
8 414
 
1.0%
6 357
 
0.8%
9 301
 
0.7%
Other values (25) 1459
 
3.4%

대표_여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
7345 
0
2649 
<NA>
 
6

Length

Max length4
Median length1
Mean length1.0018
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 7345
73.5%
0 2649
 
26.5%
<NA> 6
 
0.1%

Length

2024-05-04T01:56:12.226165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T01:56:12.549414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 7345
73.5%
0 2649
 
26.5%
na 6
 
0.1%

동_구분_코드
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
5930 
1
4070 

Length

Max length4
Median length4
Mean length2.779
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row<NA>
4th row<NA>
5th row1

Common Values

ValueCountFrequency (%)
<NA> 5930
59.3%
1 4070
40.7%

Length

2024-05-04T01:56:12.994911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T01:56:13.316566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 5930
59.3%
1 4070
40.7%
Distinct128
Distinct (%)1.3%
Missing26
Missing (%)0.3%
Memory size156.2 KiB
2024-05-04T01:56:13.764356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length6
Mean length6.6089834
Min length4

Characters and Unicode

Total characters65918
Distinct characters144
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique43 ?
Unique (%)0.4%

Sample

1st row용도구역미지정
2nd row제3종일반주거지역
3rd row일반주거지역
4th row일반주거지역
5th row주차장정비지구
ValueCountFrequency (%)
일반주거지역 3556
35.3%
제2종일반주거지역 909
 
9.0%
제3종일반주거지역 657
 
6.5%
도시지역 604
 
6.0%
최고고도지구 526
 
5.2%
일반미관지구 404
 
4.0%
준공업지역 354
 
3.5%
주차장정비지구 235
 
2.3%
공항지구 218
 
2.2%
재개발구역 207
 
2.1%
Other values (128) 2399
23.8%
2024-05-04T01:56:14.967557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9571
14.5%
7739
11.7%
5741
 
8.7%
5741
 
8.7%
5632
 
8.5%
5374
 
8.2%
3708
 
5.6%
2001
 
3.0%
1856
 
2.8%
1316
 
2.0%
Other values (134) 17239
26.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 63914
97.0%
Decimal Number 1863
 
2.8%
Space Separator 95
 
0.1%
Close Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9571
15.0%
7739
12.1%
5741
 
9.0%
5741
 
9.0%
5632
 
8.8%
5374
 
8.4%
3708
 
5.8%
2001
 
3.1%
1856
 
2.9%
1316
 
2.1%
Other values (125) 15235
23.8%
Decimal Number
ValueCountFrequency (%)
2 958
51.4%
3 661
35.5%
1 227
 
12.2%
4 10
 
0.5%
5 6
 
0.3%
6 1
 
0.1%
Space Separator
ValueCountFrequency (%)
95
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 63914
97.0%
Common 2004
 
3.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9571
15.0%
7739
12.1%
5741
 
9.0%
5741
 
9.0%
5632
 
8.8%
5374
 
8.4%
3708
 
5.8%
2001
 
3.1%
1856
 
2.9%
1316
 
2.1%
Other values (125) 15235
23.8%
Common
ValueCountFrequency (%)
2 958
47.8%
3 661
33.0%
1 227
 
11.3%
95
 
4.7%
) 23
 
1.1%
( 23
 
1.1%
4 10
 
0.5%
5 6
 
0.3%
6 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 63914
97.0%
ASCII 2004
 
3.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9571
15.0%
7739
12.1%
5741
 
9.0%
5741
 
9.0%
5632
 
8.8%
5374
 
8.4%
3708
 
5.8%
2001
 
3.1%
1856
 
2.9%
1316
 
2.1%
Other values (125) 15235
23.8%
ASCII
ValueCountFrequency (%)
2 958
47.8%
3 661
33.0%
1 227
 
11.3%
95
 
4.7%
) 23
 
1.1%
( 23
 
1.1%
4 10
 
0.5%
5 6
 
0.3%
6 1
 
< 0.1%

작업_일자
Real number (ℝ)

HIGH CORRELATION 

Distinct485
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20141060
Minimum20111227
Maximum20240503
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-04T01:56:15.389820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20111227
5-th percentile20111227
Q120111227
median20111227
Q320180927
95-th percentile20240208
Maximum20240503
Range129276
Interquartile range (IQR)69700

Descriptive statistics

Standard deviation46221.045
Coefficient of variation (CV)0.0022948666
Kurtosis-0.40895471
Mean20141060
Median Absolute Deviation (MAD)0
Skewness1.1389643
Sum2.014106 × 1011
Variance2.136385 × 109
MonotonicityNot monotonic
2024-05-04T01:56:15.862607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20111227 6181
61.8%
20120207 481
 
4.8%
20191203 337
 
3.4%
20240503 330
 
3.3%
20180927 300
 
3.0%
20120208 136
 
1.4%
20211029 123
 
1.2%
20240208 99
 
1.0%
20240102 64
 
0.6%
20120605 31
 
0.3%
Other values (475) 1918
 
19.2%
ValueCountFrequency (%)
20111227 6181
61.8%
20120112 1
 
< 0.1%
20120113 1
 
< 0.1%
20120207 481
 
4.8%
20120208 136
 
1.4%
20120222 15
 
0.1%
20120223 4
 
< 0.1%
20120229 2
 
< 0.1%
20120306 1
 
< 0.1%
20120309 3
 
< 0.1%
ValueCountFrequency (%)
20240503 330
3.3%
20240425 3
 
< 0.1%
20240420 20
 
0.2%
20240417 1
 
< 0.1%
20240416 4
 
< 0.1%
20240411 3
 
< 0.1%
20240406 4
 
< 0.1%
20240402 6
 
0.1%
20240330 6
 
0.1%
20240327 19
 
0.2%

Interactions

2024-05-04T01:56:06.280486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-04T01:56:16.159025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부작업_일자
지역지구구역_구분_코드1.0000.0690.212
대표_여부0.0691.0000.118
작업_일자0.2120.1181.000
2024-05-04T01:56:16.395077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부동_구분_코드
지역지구구역_구분_코드1.0000.1151.000
대표_여부0.1151.0001.000
동_구분_코드1.0001.0001.000
2024-05-04T01:56:16.642818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
작업_일자지역지구구역_구분_코드대표_여부동_구분_코드
작업_일자1.0000.1370.0481.000
지역지구구역_구분_코드0.1371.0000.1151.000
대표_여부0.0480.1151.0001.000
동_구분_코드1.0001.0001.0001.000

Missing values

2024-05-04T01:56:06.655184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T01:56:07.088625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-04T01:56:07.393363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_지역지구구역_pk관리_주택대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부동_구분_코드지역지구구역_명작업_일자
895211500-44411500-18398011용도구역미지정20111227
713111440-46511440-891102301제3종일반주거지역20111227
1649711740-1011740-5110201<NA>일반주거지역20120207
1129811530-74511530-17110200<NA>일반주거지역20111227
1034511530-146111530-38226011주차장정비지구20111227
143811200-12411200-71102011일반주거지역20111227
1057311530-166711530-61299011기타지구20111227
642711410-10000237311410-1000102842UQG1301<NA>일반미관지구20190112
359111290-35411290-33328011재개발구역20111227
491111305-85311305-101102011일반주거지역20111227
관리_지역지구구역_pk관리_주택대장_pk지역지구구역_구분_코드지역지구구역_코드대표_여부동_구분_코드지역지구구역_명작업_일자
366611290-42111290-361102011일반주거지역20111227
529811320-3111320-2430501<NA>상세계획구역20111227
1298911590-33011590-271102011일반주거지역20111227
1163111545-10000164311545-1000043621UQA0010<NA>도시지역20230929
798311470-8611470-47110201<NA>일반주거지역20191203
1647411710-811710-5110201<NA>일반주거지역20111227
102011170-10000162211170-100005441110221<NA>제2종일반주거지역20161001
560511350-12811350-308216011택지개발지구20111227
49311000-10000434211000-1000065461UQA1301<NA>준주거지역20240503
549011350-111350-10110201<NA>일반주거지역20111227