Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells9343
Missing cells (%)13.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text3
Categorical4

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15658/S/1/datasetView.do

Alerts

작업_일자 has constant value ""Constant
지역지구구역_구분_코드 is highly overall correlated with 지역지구구역_코드High correlation
지역지구구역_코드 is highly overall correlated with 지역지구구역_구분_코드 and 1 other fieldsHigh correlation
대표_여부 is highly overall correlated with 지역지구구역_코드High correlation
지역지구구역_코드 is highly imbalanced (87.7%)Imbalance
대표_여부 is highly imbalanced (98.1%)Imbalance
기타_지역지구구역 has 9343 (93.4%) missing valuesMissing
관리_지역지구구역 has unique valuesUnique

Reproduction

Analysis started2024-04-20 21:16:24.074126
Analysis finished2024-04-20 21:16:25.842555
Duration1.77 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T06:16:26.001068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length10.5154
Min length7

Characters and Unicode

Total characters105154
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11290-29008
2nd row11230-7204
3rd row11215-2577
4th row11290-26242
5th row11290-34809
ValueCountFrequency (%)
11290-29008 1
 
< 0.1%
11320-7063 1
 
< 0.1%
11290-12026 1
 
< 0.1%
11290-23658 1
 
< 0.1%
11290-23341 1
 
< 0.1%
11110-5583 1
 
< 0.1%
11290-4804 1
 
< 0.1%
11290-38324 1
 
< 0.1%
11305-5470 1
 
< 0.1%
11200-8895 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-04-21T06:16:26.349593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 28002
26.6%
0 14124
13.4%
2 13442
12.8%
- 10000
 
9.5%
9 8726
 
8.3%
3 8610
 
8.2%
5 5534
 
5.3%
4 4680
 
4.5%
6 4607
 
4.4%
8 3790
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 95154
90.5%
Dash Punctuation 10000
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 28002
29.4%
0 14124
14.8%
2 13442
14.1%
9 8726
 
9.2%
3 8610
 
9.0%
5 5534
 
5.8%
4 4680
 
4.9%
6 4607
 
4.8%
8 3790
 
4.0%
7 3639
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 105154
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 28002
26.6%
0 14124
13.4%
2 13442
12.8%
- 10000
 
9.5%
9 8726
 
8.3%
3 8610
 
8.2%
5 5534
 
5.3%
4 4680
 
4.5%
6 4607
 
4.4%
8 3790
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105154
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 28002
26.6%
0 14124
13.4%
2 13442
12.8%
- 10000
 
9.5%
9 8726
 
8.3%
3 8610
 
8.2%
5 5534
 
5.3%
4 4680
 
4.5%
6 4607
 
4.4%
8 3790
 
3.6%
Distinct8571
Distinct (%)85.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T06:16:26.604042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length10
Mean length10.2561
Min length7

Characters and Unicode

Total characters102561
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7226 ?
Unique (%)72.3%

Sample

1st row11290-13273
2nd row11230-3489
3rd row11215-1343
4th row11290-12097
5th row11290-15217
ValueCountFrequency (%)
11410-1422 3
 
< 0.1%
11110-4817 3
 
< 0.1%
11290-15776 3
 
< 0.1%
11110-1776 3
 
< 0.1%
11290-2256 3
 
< 0.1%
11290-1379 3
 
< 0.1%
11230-7161 3
 
< 0.1%
11215-2761 3
 
< 0.1%
11260-804 3
 
< 0.1%
11410-933 3
 
< 0.1%
Other values (8561) 9970
99.7%
2024-04-21T06:16:26.957380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 28524
27.8%
0 13767
13.4%
2 11879
11.6%
- 10000
 
9.8%
9 8313
 
8.1%
3 7109
 
6.9%
5 5632
 
5.5%
4 5027
 
4.9%
6 4841
 
4.7%
7 3767
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 92561
90.2%
Dash Punctuation 10000
 
9.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 28524
30.8%
0 13767
14.9%
2 11879
12.8%
9 8313
 
9.0%
3 7109
 
7.7%
5 5632
 
6.1%
4 5027
 
5.4%
6 4841
 
5.2%
7 3767
 
4.1%
8 3702
 
4.0%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 102561
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 28524
27.8%
0 13767
13.4%
2 11879
11.6%
- 10000
 
9.8%
9 8313
 
8.1%
3 7109
 
6.9%
5 5632
 
5.5%
4 5027
 
4.9%
6 4841
 
4.7%
7 3767
 
3.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 102561
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 28524
27.8%
0 13767
13.4%
2 11879
11.6%
- 10000
 
9.8%
9 8313
 
8.1%
3 7109
 
6.9%
5 5632
 
5.5%
4 5027
 
4.9%
6 4841
 
4.7%
7 3767
 
3.7%

지역지구구역_구분_코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
3604 
2
3268 
3
3128 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 3604
36.0%
2 3268
32.7%
3 3128
31.3%

Length

2024-04-21T06:16:27.064368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:16:27.139689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 3604
36.0%
2 3268
32.7%
3 3128
31.3%

지역지구구역_코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct37
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9208 
1020
 
315
260
 
127
070
 
101
1022
 
61
Other values (32)
 
188

Length

Max length4
Median length4
Mean length3.9683
Min length2

Unique

Unique10 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row1022
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9208
92.1%
1020 315
 
3.1%
260 127
 
1.3%
070 101
 
1.0%
1022 61
 
0.6%
1330 48
 
0.5%
1023 18
 
0.2%
103 18
 
0.2%
1120 14
 
0.1%
1021 10
 
0.1%
Other values (27) 80
 
0.8%

Length

2024-04-21T06:16:27.230780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 9208
92.1%
1020 315
 
3.1%
260 127
 
1.3%
070 101
 
1.0%
1022 61
 
0.6%
1330 48
 
0.5%
1023 18
 
0.2%
103 18
 
0.2%
1120 14
 
0.1%
1021 10
 
0.1%
Other values (27) 80
 
0.8%

대표_여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
9971 
0
 
27
<NA>
 
2

Length

Max length4
Median length1
Mean length1.0006
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 9971
99.7%
0 27
 
0.3%
<NA> 2
 
< 0.1%

Length

2024-04-21T06:16:27.331130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:16:27.406941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 9971
99.7%
0 27
 
0.3%
na 2
 
< 0.1%
Distinct60
Distinct (%)9.1%
Missing9343
Missing (%)93.4%
Memory size156.2 KiB
2024-04-21T06:16:27.574566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length6
Mean length6.2572298
Min length2

Characters and Unicode

Total characters4111
Distinct characters68
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)5.3%

Sample

1st row2종일반주거지역
2nd row일반주거지역
3rd row일반주거지역
4th row일반주거지역
5th row주차장정비지구
ValueCountFrequency (%)
일반주거지역 288
43.1%
주차장정비지구 102
 
15.3%
자연녹지지역 47
 
7.0%
개발제한구역 47
 
7.0%
일반주거 34
 
5.1%
주차장정비 17
 
2.5%
제2종일반주거지역 12
 
1.8%
2종일반주거지역 11
 
1.6%
일반상업지역 8
 
1.2%
4종미관,공원용지 6
 
0.9%
Other values (53) 96
 
14.4%
2024-04-21T06:16:27.887260image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
590
14.4%
497
12.1%
454
11.0%
379
9.2%
372
9.0%
372
9.0%
194
 
4.7%
125
 
3.0%
125
 
3.0%
125
 
3.0%
Other values (58) 878
21.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4031
98.1%
Decimal Number 54
 
1.3%
Other Punctuation 12
 
0.3%
Space Separator 11
 
0.3%
Open Punctuation 1
 
< 0.1%
Lowercase Letter 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
590
14.6%
497
12.3%
454
11.3%
379
9.4%
372
9.2%
372
9.2%
194
 
4.8%
125
 
3.1%
125
 
3.1%
125
 
3.1%
Other values (47) 798
19.8%
Decimal Number
ValueCountFrequency (%)
2 28
51.9%
4 9
 
16.7%
1 8
 
14.8%
3 8
 
14.8%
5 1
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 11
91.7%
/ 1
 
8.3%
Space Separator
ValueCountFrequency (%)
11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4031
98.1%
Common 79
 
1.9%
Latin 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
590
14.6%
497
12.3%
454
11.3%
379
9.4%
372
9.2%
372
9.2%
194
 
4.8%
125
 
3.1%
125
 
3.1%
125
 
3.1%
Other values (47) 798
19.8%
Common
ValueCountFrequency (%)
2 28
35.4%
11
 
13.9%
, 11
 
13.9%
4 9
 
11.4%
1 8
 
10.1%
3 8
 
10.1%
5 1
 
1.3%
( 1
 
1.3%
) 1
 
1.3%
/ 1
 
1.3%
Latin
ValueCountFrequency (%)
m 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4031
98.1%
ASCII 80
 
1.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
590
14.6%
497
12.3%
454
11.3%
379
9.4%
372
9.2%
372
9.2%
194
 
4.8%
125
 
3.1%
125
 
3.1%
125
 
3.1%
Other values (47) 798
19.8%
ASCII
ValueCountFrequency (%)
2 28
35.0%
11
 
13.8%
, 11
 
13.8%
4 9
 
11.2%
1 8
 
10.0%
3 8
 
10.0%
5 1
 
1.2%
( 1
 
1.2%
m 1
 
1.2%
) 1
 
1.2%

작업_일자
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20111227
10000 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20111227
2nd row20111227
3rd row20111227
4th row20111227
5th row20111227

Common Values

ValueCountFrequency (%)
20111227 10000
100.0%

Length

2024-04-21T06:16:27.989541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T06:16:28.057480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20111227 10000
100.0%

Correlations

2024-04-21T06:16:28.101214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역
지역지구구역_구분_코드1.0000.9890.0001.000
지역지구구역_코드0.9891.0000.8190.997
대표_여부0.0000.8191.0000.812
기타_지역지구구역1.0000.9970.8121.000
2024-04-21T06:16:28.177470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부
지역지구구역_구분_코드1.0000.8740.000
지역지구구역_코드0.8741.0000.667
대표_여부0.0000.6671.000
2024-04-21T06:16:28.256322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역지구구역_구분_코드지역지구구역_코드대표_여부
지역지구구역_구분_코드1.0000.8740.000
지역지구구역_코드0.8741.0000.667
대표_여부0.0000.6671.000

Missing values

2024-04-21T06:16:25.654497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T06:16:25.792602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_지역지구구역관리_폐쇄말소대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
2533411290-2900811290-132731<NA>1<NA>20111227
99411230-720411230-34892<NA>1<NA>20111227
3066611215-257711215-13431102212종일반주거지역20111227
5195211290-2624211290-120972<NA>1<NA>20111227
5106611290-3480911290-152172<NA>1<NA>20111227
4765611305-313711305-15892<NA>1<NA>20111227
5179711305-485611305-24053<NA>1<NA>20111227
1664211290-596311290-34023<NA>1<NA>20111227
5518711410-271311410-16101<NA>1<NA>20111227
1331311290-32011290-13511<NA>1<NA>20111227
관리_지역지구구역관리_폐쇄말소대장지역지구구역_구분_코드지역지구구역_코드대표_여부기타_지역지구구역작업_일자
5686511350-487011350-29791<NA>1<NA>20111227
430111230-217611230-15652<NA>1<NA>20111227
390511260-265411260-12253<NA>1<NA>20111227
5313911305-984711305-50182<NA>1<NA>20111227
4539611320-215511320-341<NA>1<NA>20111227
4359711320-661111320-25711<NA>1<NA>20111227
5445511410-1011911410-46001<NA>1<NA>20111227
2077711290-1067811290-55993<NA>1<NA>20111227
2729111290-1827811290-86611<NA>1<NA>20111227
2473011290-3864911290-166622<NA>1<NA>20111227