Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells1228
Missing cells (%)3.1%
Duplicate rows844
Duplicate rows (%)8.4%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text2
Categorical2

Dataset

Description서울특별시 각 구 별로 보상 실지목 리스트 현황을 정리한 파일 데이터 입니다. 토지 구분, 지목, 해당 토지에 대한 면적 현황입니다.
URLhttps://www.data.go.kr/data/15118778/fileData.do

Alerts

Dataset has 844 (8.4%) duplicate rowsDuplicates
토지구분 is highly imbalanced (54.1%)Imbalance
장소 has 1228 (12.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 21:30:42.300123
Analysis finished2023-12-12 21:30:42.801799
Duration0.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

장소
Text

MISSING 

Distinct80
Distinct (%)0.9%
Missing1228
Missing (%)12.3%
Memory size156.2 KiB
2023-12-13T06:30:42.999676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length11.308254
Min length9

Characters and Unicode

Total characters99196
Distinct characters107
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row서울시 중구 입정동
2nd row서울시 의정부시 장암동
3rd row서울시 은평구 진관내동
4th row서울시 강서구 공항동
5th row서울시 마포구 성산동
ValueCountFrequency (%)
서울시 8736
33.2%
은평구 2224
 
8.5%
진관외동 1254
 
4.8%
강서구 1070
 
4.1%
마포구 1061
 
4.0%
상암동 932
 
3.5%
중구 755
 
2.9%
송파구 752
 
2.9%
강동구 662
 
2.5%
진관내동 586
 
2.2%
Other values (95) 8284
31.5%
2023-12-13T06:30:43.382358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17544
17.7%
10361
 
10.4%
9419
 
9.5%
9145
 
9.2%
8892
 
9.0%
8736
 
8.8%
2237
 
2.3%
2224
 
2.2%
2174
 
2.2%
1885
 
1.9%
Other values (97) 26579
26.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 81086
81.7%
Space Separator 17544
 
17.7%
Decimal Number 566
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10361
12.8%
9419
 
11.6%
9145
 
11.3%
8892
 
11.0%
8736
 
10.8%
2237
 
2.8%
2224
 
2.7%
2174
 
2.7%
1885
 
2.3%
1840
 
2.3%
Other values (91) 24173
29.8%
Decimal Number
ValueCountFrequency (%)
4 185
32.7%
2 152
26.9%
3 123
21.7%
1 101
17.8%
5 5
 
0.9%
Space Separator
ValueCountFrequency (%)
17544
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 81086
81.7%
Common 18110
 
18.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10361
12.8%
9419
 
11.6%
9145
 
11.3%
8892
 
11.0%
8736
 
10.8%
2237
 
2.8%
2224
 
2.7%
2174
 
2.7%
1885
 
2.3%
1840
 
2.3%
Other values (91) 24173
29.8%
Common
ValueCountFrequency (%)
17544
96.9%
4 185
 
1.0%
2 152
 
0.8%
3 123
 
0.7%
1 101
 
0.6%
5 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 81086
81.7%
ASCII 18110
 
18.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17544
96.9%
4 185
 
1.0%
2 152
 
0.8%
3 123
 
0.7%
1 101
 
0.6%
5 5
 
< 0.1%
Hangul
ValueCountFrequency (%)
10361
12.8%
9419
 
11.6%
9145
 
11.3%
8892
 
11.0%
8736
 
10.8%
2237
 
2.8%
2224
 
2.7%
2174
 
2.7%
1885
 
2.3%
1840
 
2.3%
Other values (91) 24173
29.8%

토지구분
Categorical

IMBALANCE 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
사유지
7067 
국,공유지(보상용지)
1497 
국,공유지(무상귀속)
1226 
존치
 
86
시유지(원가반영)
 
80
Other values (2)
 
44

Length

Max length11
Median length3
Mean length5.2362
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국,공유지(보상용지)
2nd row국,공유지(보상용지)
3rd row사유지
4th row국,공유지(무상귀속)
5th row시유지(원가반영)

Common Values

ValueCountFrequency (%)
사유지 7067
70.7%
국,공유지(보상용지) 1497
 
15.0%
국,공유지(무상귀속) 1226
 
12.3%
존치 86
 
0.9%
시유지(원가반영) 80
 
0.8%
하천점용대상 31
 
0.3%
공사자산(원가반영) 13
 
0.1%

Length

2023-12-13T06:30:43.515790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:30:43.608525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
사유지 7067
70.7%
국,공유지(보상용지 1497
 
15.0%
국,공유지(무상귀속 1226
 
12.3%
존치 86
 
0.9%
시유지(원가반영 80
 
0.8%
하천점용대상 31
 
0.3%
공사자산(원가반영 13
 
0.1%

지목
Categorical

Distinct27
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
3168 
도로
2024 
1550 
1432 
임야
475 
Other values (22)
1351 

Length

Max length7
Median length1
Mean length1.5421
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row
2nd row공장용지
3rd row도로
4th row구거
5th row

Common Values

ValueCountFrequency (%)
3168
31.7%
도로 2024
20.2%
1550
15.5%
1432
14.3%
임야 475
 
4.8%
구거 346
 
3.5%
잡종지 331
 
3.3%
하천 176
 
1.8%
견사 및 건물 164
 
1.6%
철도용지 105
 
1.1%
Other values (17) 229
 
2.3%

Length

2023-12-13T06:30:43.721993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3168
30.7%
도로 2024
19.6%
1550
15.0%
1432
13.9%
임야 475
 
4.6%
구거 346
 
3.4%
잡종지 331
 
3.2%
하천 176
 
1.7%
견사 164
 
1.6%
164
 
1.6%
Other values (19) 498
 
4.8%

면적
Text

Distinct1871
Distinct (%)18.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T06:30:44.079259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length6
Mean length2.6979
Min length1

Characters and Unicode

Total characters26979
Distinct characters12
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique907 ?
Unique (%)9.1%

Sample

1st row16
2nd row66
3rd row7
4th row6
5th row3
ValueCountFrequency (%)
1 188
 
1.9%
2 143
 
1.4%
3 141
 
1.4%
7 121
 
1.2%
4 118
 
1.2%
17 100
 
1.0%
10 100
 
1.0%
6 99
 
1.0%
5 98
 
1.0%
9 86
 
0.9%
Other values (1861) 8806
88.1%
2023-12-13T06:30:44.548835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 4935
18.3%
2 3350
12.4%
3 2873
10.6%
6 2408
8.9%
4 2284
8.5%
5 2169
8.0%
9 2002
7.4%
7 1916
 
7.1%
8 1888
 
7.0%
0 1813
 
6.7%
Other values (2) 1341
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 25638
95.0%
Other Punctuation 1341
 
5.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4935
19.2%
2 3350
13.1%
3 2873
11.2%
6 2408
9.4%
4 2284
8.9%
5 2169
8.5%
9 2002
7.8%
7 1916
 
7.5%
8 1888
 
7.4%
0 1813
 
7.1%
Other Punctuation
ValueCountFrequency (%)
, 1338
99.8%
. 3
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 26979
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4935
18.3%
2 3350
12.4%
3 2873
10.6%
6 2408
8.9%
4 2284
8.5%
5 2169
8.0%
9 2002
7.4%
7 1916
 
7.1%
8 1888
 
7.0%
0 1813
 
6.7%
Other values (2) 1341
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26979
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 4935
18.3%
2 3350
12.4%
3 2873
10.6%
6 2408
8.9%
4 2284
8.5%
5 2169
8.0%
9 2002
7.4%
7 1916
 
7.1%
8 1888
 
7.0%
0 1813
 
6.7%
Other values (2) 1341
 
5.0%

Correlations

2023-12-13T06:30:44.648252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
장소토지구분지목
장소1.0000.5120.673
토지구분0.5121.0000.584
지목0.6730.5841.000
2023-12-13T06:30:44.741854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지목토지구분
지목1.0000.288
토지구분0.2881.000
2023-12-13T06:30:44.817665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
토지구분지목
토지구분1.0000.288
지목0.2881.000

Missing values

2023-12-13T06:30:42.666144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:30:42.760608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

장소토지구분지목면적
31477서울시 중구 입정동국,공유지(보상용지)16
29886서울시 의정부시 장암동국,공유지(보상용지)공장용지66
25730서울시 은평구 진관내동사유지도로7
6801서울시 강서구 공항동국,공유지(무상귀속)구거6
31250서울시 마포구 성산동시유지(원가반영)3
46437<NA>국,공유지(보상용지)제방25
35953서울시 중구 인현동2가사유지36
26462서울시 은평구 진관내동사유지78
2629서울시 강남구 율현동사유지318
15585서울시 서초구 내곡동국,공유지(무상귀속)도로40
장소토지구분지목면적
44971<NA>하천점용대상1
14426서울시 강서구 공항동국,공유지(보상용지)29
25914서울시 은평구 진관외동국,공유지(보상용지)192
7487서울시 강서구 가양동사유지2
2772서울시 강남구 수서동국,공유지(무상귀속)도로42
37370서울시 구로구 항동사유지188
10243서울시 은평구 구파발동사유지도로6
25871서울시 은평구 진관외동사유지202
16368서울시 강동구 하일동사유지도로2
410서울시 마포구 상암동국,공유지(보상용지)구거163

Duplicate rows

Most frequently occurring

장소토지구분지목면적# duplicates
375서울시 은평구 진관내동사유지16526
30서울시 강동구 하일동사유지4325
546서울시 은평구 진관외동사유지도로218
551서울시 은평구 진관외동사유지도로317
17서울시 강동구 하일동사유지2116
28서울시 강동구 하일동사유지4015
450서울시 은평구 진관외동사유지13215
464서울시 은평구 진관외동사유지16515
31서울시 강동구 하일동사유지4614
376서울시 은평구 진관내동사유지16614