Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells8
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory742.2 KiB
Average record size in memory76.0 B

Variable types

Text4
Categorical4

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15667/S/1/datasetView.do

Alerts

주_동_구분_코드 has constant value ""Constant
작업_일자 has constant value ""Constant
대표_여부 is highly imbalanced (68.8%)Imbalance
관리_지역지구구역 has unique valuesUnique

Reproduction

Analysis started2024-04-20 21:19:03.147797
Analysis finished2024-04-20 21:19:05.041256
Duration1.89 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T06:19:05.197494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length10
Mean length10.5604
Min length7

Characters and Unicode

Total characters105604
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11380-5952
2nd row11380-14346
3rd row11305-2360
4th row11320-2938
5th row11410-5113
ValueCountFrequency (%)
11380-5952 1
 
< 0.1%
11380-14856 1
 
< 0.1%
11380-7668 1
 
< 0.1%
11305-100000888 1
 
< 0.1%
11305-6174 1
 
< 0.1%
11410-2917 1
 
< 0.1%
11410-128 1
 
< 0.1%
11380-13225 1
 
< 0.1%
11440-1803 1
 
< 0.1%
11350-2264 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-04-21T06:19:05.503840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 28797
27.3%
0 16832
15.9%
3 12721
12.0%
- 10000
 
9.5%
8 7147
 
6.8%
2 6764
 
6.4%
5 6438
 
6.1%
4 6269
 
5.9%
6 4010
 
3.8%
9 3387
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 95604
90.5%
Dash Punctuation 10000
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 28797
30.1%
0 16832
17.6%
3 12721
13.3%
8 7147
 
7.5%
2 6764
 
7.1%
5 6438
 
6.7%
4 6269
 
6.6%
6 4010
 
4.2%
9 3387
 
3.5%
7 3239
 
3.4%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 105604
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 28797
27.3%
0 16832
15.9%
3 12721
12.0%
- 10000
 
9.5%
8 7147
 
6.8%
2 6764
 
6.4%
5 6438
 
6.1%
4 6269
 
5.9%
6 4010
 
3.8%
9 3387
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 28797
27.3%
0 16832
15.9%
3 12721
12.0%
- 10000
 
9.5%
8 7147
 
6.8%
2 6764
 
6.4%
5 6438
 
6.1%
4 6269
 
5.9%
6 4010
 
3.8%
9 3387
 
3.2%
Distinct8717
Distinct (%)87.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T06:19:05.755781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length10
Mean length10.2658
Min length7

Characters and Unicode

Total characters102658
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7537 ?
Unique (%)75.4%

Sample

1st row11380-3828
2nd row11380-9878
3rd row11305-1523
4th row11320-2380
5th row11410-3899
ValueCountFrequency (%)
11320-2423 4
 
< 0.1%
11320-2601 4
 
< 0.1%
11320-2398 4
 
< 0.1%
11305-2914 4
 
< 0.1%
11320-2067 4
 
< 0.1%
11320-3001 4
 
< 0.1%
11305-2054 4
 
< 0.1%
11380-12445 4
 
< 0.1%
11380-9813 4
 
< 0.1%
11305-2847 3
 
< 0.1%
Other values (8707) 9961
99.6%
2024-04-21T06:19:06.111433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 27553
26.8%
0 15741
15.3%
3 12158
11.8%
- 10000
 
9.7%
8 7168
 
7.0%
2 7044
 
6.9%
5 6357
 
6.2%
4 5830
 
5.7%
6 3791
 
3.7%
9 3633
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 92658
90.3%
Dash Punctuation 10000
 
9.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 27553
29.7%
0 15741
17.0%
3 12158
13.1%
8 7168
 
7.7%
2 7044
 
7.6%
5 6357
 
6.9%
4 5830
 
6.3%
6 3791
 
4.1%
9 3633
 
3.9%
7 3383
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 102658
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 27553
26.8%
0 15741
15.3%
3 12158
11.8%
- 10000
 
9.7%
8 7168
 
7.0%
2 7044
 
6.9%
5 6357
 
6.2%
4 5830
 
5.7%
6 3791
 
3.7%
9 3633
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 102658
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 27553
26.8%
0 15741
15.3%
3 12158
11.8%
- 10000
 
9.7%
8 7168
 
7.0%
2 7044
 
6.9%
5 6357
 
6.2%
4 5830
 
5.7%
6 3791
 
3.7%
9 3633
 
3.5%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
7851 
2
1511 
3
 
638

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row3
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
1 7851
78.5%
2 1511
 
15.1%
3 638
 
6.4%

Length

2024-04-21T06:19:06.220603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:19:06.295811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 7851
78.5%
2 1511
 
15.1%
3 638
 
6.4%
Distinct65
Distinct (%)0.7%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
2024-04-21T06:19:06.416910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length4
Mean length3.7854142
Min length3

Characters and Unicode

Total characters37839
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.1%

Sample

1st row1020
2nd row300
3rd row112
4th row102
5th row1021
ValueCountFrequency (%)
1020 4208
42.1%
1022 1260
 
12.6%
0100 804
 
8.0%
260 421
 
4.2%
1030 419
 
4.2%
1023 380
 
3.8%
112 283
 
2.8%
1330 222
 
2.2%
1120 186
 
1.9%
103 184
 
1.8%
Other values (55) 1629
 
16.3%
2024-04-21T06:19:06.663312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 16100
42.5%
1 9800
25.9%
2 8791
23.2%
3 1948
 
5.1%
6 431
 
1.1%
9 347
 
0.9%
7 222
 
0.6%
8 71
 
0.2%
4 52
 
0.1%
Z 49
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 37790
99.9%
Uppercase Letter 49
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 16100
42.6%
1 9800
25.9%
2 8791
23.3%
3 1948
 
5.2%
6 431
 
1.1%
9 347
 
0.9%
7 222
 
0.6%
8 71
 
0.2%
4 52
 
0.1%
5 28
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
Z 49
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 37790
99.9%
Latin 49
 
0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 16100
42.6%
1 9800
25.9%
2 8791
23.3%
3 1948
 
5.2%
6 431
 
1.1%
9 347
 
0.9%
7 222
 
0.6%
8 71
 
0.2%
4 52
 
0.1%
5 28
 
0.1%
Latin
ValueCountFrequency (%)
Z 49
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37839
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 16100
42.5%
1 9800
25.9%
2 8791
23.2%
3 1948
 
5.1%
6 431
 
1.1%
9 347
 
0.9%
7 222
 
0.6%
8 71
 
0.2%
4 52
 
0.1%
Z 49
 
0.1%

대표_여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
8931 
0
1064 
<NA>
 
5

Length

Max length4
Median length1
Mean length1.0015
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
1 8931
89.3%
0 1064
 
10.6%
<NA> 5
 
0.1%

Length

2024-04-21T06:19:06.771529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:19:06.853567image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 8931
89.3%
0 1064
 
10.6%
na 5
 
< 0.1%

주_동_구분_코드
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
10000 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 10000
100.0%

Length

2024-04-21T06:19:06.956255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:19:07.043547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 10000
100.0%
Distinct72
Distinct (%)0.7%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
2024-04-21T06:19:07.193403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length6
Mean length6.5084034
Min length4

Characters and Unicode

Total characters65058
Distinct characters81
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)0.1%

Sample

1st row일반주거지역
2nd row지구단위계획구역
3rd row최고고도지구
4th row역사문화미관지구
5th row제1종일반주거지역
ValueCountFrequency (%)
일반주거지역 4208
41.8%
제2종일반주거지역 1260
 
12.5%
도시지역 804
 
8.0%
주차장정비지구 421
 
4.2%
준주거지역 419
 
4.2%
제3종일반주거지역 380
 
3.8%
최고고도지구 283
 
2.8%
자연녹지지역 222
 
2.2%
일반상업지역 186
 
1.8%
일반미관지구 184
 
1.8%
Other values (62) 1699
16.9%
2024-04-21T06:19:07.480584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9965
15.3%
8635
13.3%
6975
10.7%
6554
10.1%
6380
9.8%
6380
9.8%
2373
 
3.6%
2192
 
3.4%
2043
 
3.1%
2 1332
 
2.0%
Other values (71) 12229
18.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 62940
96.7%
Decimal Number 2048
 
3.1%
Space Separator 70
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9965
15.8%
8635
13.7%
6975
11.1%
6554
10.4%
6380
10.1%
6380
10.1%
2373
 
3.8%
2192
 
3.5%
2043
 
3.2%
1181
 
1.9%
Other values (64) 10262
16.3%
Decimal Number
ValueCountFrequency (%)
2 1332
65.0%
3 380
 
18.6%
1 306
 
14.9%
4 25
 
1.2%
5 4
 
0.2%
6 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
70
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 62940
96.7%
Common 2118
 
3.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9965
15.8%
8635
13.7%
6975
11.1%
6554
10.4%
6380
10.1%
6380
10.1%
2373
 
3.8%
2192
 
3.5%
2043
 
3.2%
1181
 
1.9%
Other values (64) 10262
16.3%
Common
ValueCountFrequency (%)
2 1332
62.9%
3 380
 
17.9%
1 306
 
14.4%
70
 
3.3%
4 25
 
1.2%
5 4
 
0.2%
6 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 62940
96.7%
ASCII 2118
 
3.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9965
15.8%
8635
13.7%
6975
11.1%
6554
10.4%
6380
10.1%
6380
10.1%
2373
 
3.8%
2192
 
3.5%
2043
 
3.2%
1181
 
1.9%
Other values (64) 10262
16.3%
ASCII
ValueCountFrequency (%)
2 1332
62.9%
3 380
 
17.9%
1 306
 
14.4%
70
 
3.3%
4 25
 
1.2%
5 4
 
0.2%
6 1
 
< 0.1%

작업_일자
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20111227
10000 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20111227
2nd row20111227
3rd row20111227
4th row20111227
5th row20111227

Common Values

ValueCountFrequency (%)
20111227 10000
100.0%

Length

2024-04-21T06:19:07.608423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:19:07.685598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20111227 10000
100.0%

Correlations

2024-04-21T06:19:07.734541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부지역지구구역_명
지역지구구역_구분_코드1.0000.9950.0381.000
지역지구구역_코드0.9951.0000.5181.000
대표_여부0.0380.5181.0000.544
지역지구구역_명1.0001.0000.5441.000
2024-04-21T06:19:07.827194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부
지역지구구역_구분_코드1.0000.063
대표_여부0.0631.000
2024-04-21T06:19:07.907783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드대표_여부
지역지구구역_구분_코드1.0000.063
대표_여부0.0631.000

Missing values

2024-04-21T06:19:04.757756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T06:19:04.892192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-21T06:19:04.983491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_지역지구구역관리_허가대장지역지구구역_구분_코드지역지구구역_코드대표_여부주_동_구분_코드지역지구구역_명작업_일자
2657011380-595211380-38281102010일반주거지역20111227
1970711380-1434611380-9878330010지구단위계획구역20111227
660911305-236011305-1523211210최고고도지구20111227
253911320-293811320-2380210210역사문화미관지구20111227
2449211410-511311410-38991102100제1종일반주거지역20111227
585611305-10000344911305-1000079801102210제2종일반주거지역20111227
1584111380-1310111380-89411102010일반주거지역20111227
1291411380-1720211380-11981210310일반미관지구20111227
2431211410-204311410-15251010010도시지역20111227
359411350-214911350-18471102010일반주거지역20111227
관리_지역지구구역관리_허가대장지역지구구역_구분_코드지역지구구역_코드대표_여부주_동_구분_코드지역지구구역_명작업_일자
827411320-10000280611320-1000052861102210제2종일반주거지역20111227
2402711440-107111440-18181102010일반주거지역20111227
1885811380-10001312311380-124461103010준주거지역20111227
393711350-282811350-25591133010자연녹지지역20111227
1370711380-639211380-40471102010일반주거지역20111227
2347111440-10000386511440-100005536237010주거환경개선지구20111227
607311320-347611320-28091112010일반상업지역20111227
399811320-265911320-21381102010일반주거지역20111227
1991911380-1128011380-74571102010일반주거지역20111227
1712611380-1030211380-66621103010준주거지역20111227