Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Text3
Categorical3

Dataset

Description관리_부속_지번_PK,관리_폐쇄말소대장_PK,부속_대장_구분_코드,부속_시군구_코드,부속_법정동_코드,부속_대지_구분_코드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15399/S/1/datasetView.do

Alerts

부속_대지_구분_코드 is highly imbalanced (72.3%)Imbalance
관리_부속_지번_PK has unique valuesUnique

Reproduction

Analysis started2024-05-11 09:03:37.829162
Analysis finished2024-05-11 09:03:40.472109
Duration2.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T09:03:40.999699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length13.0919
Min length7

Characters and Unicode

Total characters130919
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11350-100036642
2nd row11305-100003184
3rd row11230-100052158
4th row11650-100012528
5th row11440-100007300
ValueCountFrequency (%)
11350-100036642 1
 
< 0.1%
11230-4930 1
 
< 0.1%
11470-100045714 1
 
< 0.1%
11470-100010865 1
 
< 0.1%
11710-1234 1
 
< 0.1%
11650-1589 1
 
< 0.1%
11140-100004952 1
 
< 0.1%
11440-100017707 1
 
< 0.1%
11530-100018400 1
 
< 0.1%
11560-100031305 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T09:03:42.571878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 35456
27.1%
1 34459
26.3%
- 10000
 
7.6%
4 8249
 
6.3%
5 8010
 
6.1%
3 7600
 
5.8%
2 6854
 
5.2%
7 5534
 
4.2%
6 5305
 
4.1%
9 4753
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 120919
92.4%
Dash Punctuation 10000
 
7.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 35456
29.3%
1 34459
28.5%
4 8249
 
6.8%
5 8010
 
6.6%
3 7600
 
6.3%
2 6854
 
5.7%
7 5534
 
4.6%
6 5305
 
4.4%
9 4753
 
3.9%
8 4699
 
3.9%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 130919
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 35456
27.1%
1 34459
26.3%
- 10000
 
7.6%
4 8249
 
6.3%
5 8010
 
6.1%
3 7600
 
5.8%
2 6854
 
5.2%
7 5534
 
4.2%
6 5305
 
4.1%
9 4753
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 130919
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 35456
27.1%
1 34459
26.3%
- 10000
 
7.6%
4 8249
 
6.3%
5 8010
 
6.1%
3 7600
 
5.8%
2 6854
 
5.2%
7 5534
 
4.2%
6 5305
 
4.1%
9 4753
 
3.6%
Distinct6111
Distinct (%)61.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T09:03:43.260134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length13.3545
Min length7

Characters and Unicode

Total characters133545
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4802 ?
Unique (%)48.0%

Sample

1st row11350-100190375
2nd row11305-100009261
3rd row11230-100288092
4th row11650-100021301
5th row11440-100015365
ValueCountFrequency (%)
11305-100124765 30
 
0.3%
11500-100412262 30
 
0.3%
11305-100124709 24
 
0.2%
11410-100030519 23
 
0.2%
11305-100124289 23
 
0.2%
11500-100399071 22
 
0.2%
11290-100009002 21
 
0.2%
11410-2516 20
 
0.2%
11305-100124606 20
 
0.2%
11305-100124546 19
 
0.2%
Other values (6101) 9768
97.7%
2024-05-11T09:03:44.255654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 34970
26.2%
0 32423
24.3%
- 10000
 
7.5%
4 9286
 
7.0%
5 9183
 
6.9%
3 8300
 
6.2%
2 7497
 
5.6%
7 5980
 
4.5%
6 5634
 
4.2%
9 5470
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 123545
92.5%
Dash Punctuation 10000
 
7.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 34970
28.3%
0 32423
26.2%
4 9286
 
7.5%
5 9183
 
7.4%
3 8300
 
6.7%
2 7497
 
6.1%
7 5980
 
4.8%
6 5634
 
4.6%
9 5470
 
4.4%
8 4802
 
3.9%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 133545
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 34970
26.2%
0 32423
24.3%
- 10000
 
7.5%
4 9286
 
7.0%
5 9183
 
6.9%
3 8300
 
6.2%
2 7497
 
5.6%
7 5980
 
4.5%
6 5634
 
4.2%
9 5470
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 133545
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 34970
26.2%
0 32423
24.3%
- 10000
 
7.5%
4 9286
 
7.0%
5 9183
 
6.9%
3 8300
 
6.2%
2 7497
 
5.6%
7 5980
 
4.5%
6 5634
 
4.2%
9 5470
 
4.1%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반
6601 
집합
2826 
<NA>
 
573

Length

Max length4
Median length2
Mean length2.1146
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row일반
4th row일반
5th row일반

Common Values

ValueCountFrequency (%)
일반 6601
66.0%
집합 2826
28.3%
<NA> 573
 
5.7%

Length

2024-05-11T09:03:44.775352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T09:03:45.148629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 6601
66.0%
집합 2826
28.3%
na 573
 
5.7%
Distinct26
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
마포구
961 
동대문구
838 
용산구
682 
동작구
645 
강북구
 
587
Other values (21)
6287 

Length

Max length4
Median length3
Mean length3.113
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row노원구
2nd row강북구
3rd row동대문구
4th row서초구
5th row마포구

Common Values

ValueCountFrequency (%)
마포구 961
 
9.6%
동대문구 838
 
8.4%
용산구 682
 
6.8%
동작구 645
 
6.5%
강북구 587
 
5.9%
종로구 534
 
5.3%
서초구 486
 
4.9%
강서구 480
 
4.8%
중구 467
 
4.7%
서대문구 448
 
4.5%
Other values (16) 3872
38.7%

Length

2024-05-11T09:03:45.529874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
마포구 961
 
9.6%
동대문구 838
 
8.4%
용산구 682
 
6.8%
동작구 645
 
6.5%
강북구 587
 
5.9%
종로구 534
 
5.3%
서초구 486
 
4.9%
강서구 480
 
4.8%
중구 467
 
4.7%
서대문구 448
 
4.5%
Other values (16) 3872
38.7%
Distinct415
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T09:03:46.146939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.2685
Min length1

Characters and Unicode

Total characters32685
Distinct characters205
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)0.5%

Sample

1st row상계동
2nd row미아동
3rd row제기동
4th row방배동
5th row아현동
ValueCountFrequency (%)
회기동 351
 
3.5%
상도동 289
 
2.9%
신문로2가 255
 
2.6%
미아동 249
 
2.5%
번동 249
 
2.5%
신정동 240
 
2.4%
합정동 238
 
2.4%
이문동 227
 
2.3%
당인동 225
 
2.3%
반포동 179
 
1.8%
Other values (404) 7470
74.9%
2024-05-11T09:03:47.286526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9550
29.2%
1455
 
4.5%
1242
 
3.8%
666
 
2.0%
657
 
2.0%
595
 
1.8%
563
 
1.7%
2 554
 
1.7%
497
 
1.5%
485
 
1.5%
Other values (195) 16421
50.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 31368
96.0%
Decimal Number 1289
 
3.9%
Space Separator 28
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9550
30.4%
1455
 
4.6%
1242
 
4.0%
666
 
2.1%
657
 
2.1%
595
 
1.9%
563
 
1.8%
497
 
1.6%
485
 
1.5%
455
 
1.5%
Other values (186) 15203
48.5%
Decimal Number
ValueCountFrequency (%)
2 554
43.0%
3 312
24.2%
1 172
 
13.3%
5 105
 
8.1%
6 56
 
4.3%
4 50
 
3.9%
7 27
 
2.1%
8 13
 
1.0%
Space Separator
ValueCountFrequency (%)
28
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 31368
96.0%
Common 1317
 
4.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9550
30.4%
1455
 
4.6%
1242
 
4.0%
666
 
2.1%
657
 
2.1%
595
 
1.9%
563
 
1.8%
497
 
1.6%
485
 
1.5%
455
 
1.5%
Other values (186) 15203
48.5%
Common
ValueCountFrequency (%)
2 554
42.1%
3 312
23.7%
1 172
 
13.1%
5 105
 
8.0%
6 56
 
4.3%
4 50
 
3.8%
28
 
2.1%
7 27
 
2.1%
8 13
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 31368
96.0%
ASCII 1317
 
4.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9550
30.4%
1455
 
4.6%
1242
 
4.0%
666
 
2.1%
657
 
2.1%
595
 
1.9%
563
 
1.8%
497
 
1.6%
485
 
1.5%
455
 
1.5%
Other values (186) 15203
48.5%
ASCII
ValueCountFrequency (%)
2 554
42.1%
3 312
23.7%
1 172
 
13.1%
5 105
 
8.0%
6 56
 
4.3%
4 50
 
3.8%
28
 
2.1%
7 27
 
2.1%
8 13
 
1.0%

부속_대지_구분_코드
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대지
9043 
 
492
<NA>
 
461
블록
 
4

Length

Max length4
Median length2
Mean length2.043
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대지
2nd row대지
3rd row대지
4th row대지
5th row대지

Common Values

ValueCountFrequency (%)
대지 9043
90.4%
492
 
4.9%
<NA> 461
 
4.6%
블록 4
 
< 0.1%

Length

2024-05-11T09:03:47.751655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T09:03:48.143776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대지 9043
90.4%
492
 
4.9%
na 461
 
4.6%
블록 4
 
< 0.1%

Correlations

2024-05-11T09:03:48.347969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대장_구분_코드부속_시군구_코드부속_대지_구분_코드
부속_대장_구분_코드1.0000.5190.015
부속_시군구_코드0.5191.0000.309
부속_대지_구분_코드0.0150.3091.000
2024-05-11T09:03:48.618474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대장_구분_코드부속_대지_구분_코드부속_시군구_코드
부속_대장_구분_코드1.0000.0240.451
부속_대지_구분_코드0.0241.0000.166
부속_시군구_코드0.4510.1661.000
2024-05-11T09:03:48.882068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대장_구분_코드부속_시군구_코드부속_대지_구분_코드
부속_대장_구분_코드1.0000.4510.024
부속_시군구_코드0.4511.0000.166
부속_대지_구분_코드0.0240.1661.000

Missing values

2024-05-11T09:03:39.737046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T09:03:40.272111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_부속_지번_PK관리_폐쇄말소대장_PK부속_대장_구분_코드부속_시군구_코드부속_법정동_코드부속_대지_구분_코드
8592811350-10003664211350-100190375일반노원구상계동대지
1694811305-10000318411305-100009261일반강북구미아동대지
7302311230-10005215811230-100288092일반동대문구제기동대지
4788211650-10001252811650-100021301일반서초구방배동대지
111440-10000730011440-100015365일반마포구아현동대지
2586511380-10000194311380-100007388일반은평구진관동대지
2967311440-533911440-10432일반마포구상암동대지
3958611650-1211650-477일반서초구방배동대지
3027211680-115711680-7751일반강남구논현동대지
2816811440-90111440-233일반마포구대흥동대지
관리_부속_지번_PK관리_폐쇄말소대장_PK부속_대장_구분_코드부속_시군구_코드부속_법정동_코드부속_대지_구분_코드
4685511140-10006942611140-100664231집합중구신당동대지
9745511110-100000000000000153757411110-1000000000000001431394일반종로구신문로2가대지
1059811440-15611440-1364집합마포구아현동대지
5669611440-10003508811440-100095308일반마포구용강동대지
1026711230-10008148411230-100481245일반동대문구청량리동대지
9509311140-10009589011140-100959971일반중구필동1가대지
4640611500-10000917511500-100023514일반강서구등촌동대지
260711140-151511140-3311집합중구신당동대지
933311230-480611230-5956일반동대문구이문동대지
2710311620-72711620-7305집합관악구신림동