Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory546.9 KiB
Average record size in memory56.0 B

Variable types

Text3
Categorical3

Dataset

Description관리_부속_지번_PK,관리_건축물대장_PK,부속_대장_구분_코드,부속_시군구_코드,부속_법정동_코드,부속_대지_구분_코드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15394/S/1/datasetView.do

Alerts

부속_대지_구분_코드 is highly imbalanced (87.3%)Imbalance
관리_부속_지번_PK has unique valuesUnique

Reproduction

Analysis started2024-05-10 23:52:09.386097
Analysis finished2024-05-10 23:52:12.098846
Duration2.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-10T23:52:12.725863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length12.8647
Min length7

Characters and Unicode

Total characters128647
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11620-1000000000000000596578
2nd row11650-100063467
3rd row11380-100025825
4th row11560-100010627
5th row11230-5274
ValueCountFrequency (%)
11620-1000000000000000596578 1
 
< 0.1%
11230-1685 1
 
< 0.1%
11170-900 1
 
< 0.1%
11215-5458 1
 
< 0.1%
11230-2592 1
 
< 0.1%
11215-6224 1
 
< 0.1%
11215-100024877 1
 
< 0.1%
11410-100016643 1
 
< 0.1%
11110-100012378 1
 
< 0.1%
11290-100008864 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-10T23:52:14.380854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 35543
27.6%
0 34201
26.6%
- 10000
 
7.8%
2 8612
 
6.7%
5 8161
 
6.3%
4 6633
 
5.2%
3 6580
 
5.1%
6 5581
 
4.3%
7 4500
 
3.5%
9 4450
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 118647
92.2%
Dash Punctuation 10000
 
7.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 35543
30.0%
0 34201
28.8%
2 8612
 
7.3%
5 8161
 
6.9%
4 6633
 
5.6%
3 6580
 
5.5%
6 5581
 
4.7%
7 4500
 
3.8%
9 4450
 
3.8%
8 4386
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 128647
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 35543
27.6%
0 34201
26.6%
- 10000
 
7.8%
2 8612
 
6.7%
5 8161
 
6.3%
4 6633
 
5.2%
3 6580
 
5.1%
6 5581
 
4.3%
7 4500
 
3.5%
9 4450
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 128647
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 35543
27.6%
0 34201
26.6%
- 10000
 
7.8%
2 8612
 
6.7%
5 8161
 
6.3%
4 6633
 
5.2%
3 6580
 
5.1%
6 5581
 
4.3%
7 4500
 
3.5%
9 4450
 
3.5%
Distinct6809
Distinct (%)68.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-10T23:52:15.241721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length12.8016
Min length7

Characters and Unicode

Total characters128016
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5673 ?
Unique (%)56.7%

Sample

1st row11620-1000000000000000659900
2nd row11650-100284701
3rd row11380-100285515
4th row11560-100218676
5th row11230-36131
ValueCountFrequency (%)
11500-100346752 42
 
0.4%
11500-100299412 26
 
0.3%
11650-100284655 23
 
0.2%
11500-100202802 22
 
0.2%
11650-100284707 22
 
0.2%
11215-100206904 22
 
0.2%
11650-100284608 21
 
0.2%
11650-100284558 21
 
0.2%
11215-100206905 21
 
0.2%
11215-15214 21
 
0.2%
Other values (6799) 9759
97.6%
2024-05-10T23:52:16.453166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 35403
27.7%
0 26614
20.8%
2 12534
 
9.8%
- 10000
 
7.8%
5 8812
 
6.9%
4 7074
 
5.5%
3 6555
 
5.1%
6 5803
 
4.5%
8 5304
 
4.1%
9 5067
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 118016
92.2%
Dash Punctuation 10000
 
7.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 35403
30.0%
0 26614
22.6%
2 12534
 
10.6%
5 8812
 
7.5%
4 7074
 
6.0%
3 6555
 
5.6%
6 5803
 
4.9%
8 5304
 
4.5%
9 5067
 
4.3%
7 4850
 
4.1%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 128016
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 35403
27.7%
0 26614
20.8%
2 12534
 
9.8%
- 10000
 
7.8%
5 8812
 
6.9%
4 7074
 
5.5%
3 6555
 
5.1%
6 5803
 
4.5%
8 5304
 
4.1%
9 5067
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 128016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 35403
27.7%
0 26614
20.8%
2 12534
 
9.8%
- 10000
 
7.8%
5 8812
 
6.9%
4 7074
 
5.5%
3 6555
 
5.1%
6 5803
 
4.5%
8 5304
 
4.1%
9 5067
 
4.0%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반
6348 
집합
3577 
<NA>
 
75

Length

Max length4
Median length2
Mean length2.015
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row집합
4th row집합
5th row일반

Common Values

ValueCountFrequency (%)
일반 6348
63.5%
집합 3577
35.8%
<NA> 75
 
0.8%

Length

2024-05-10T23:52:16.951766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T23:52:17.283016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 6348
63.5%
집합 3577
35.8%
na 75
 
0.8%
Distinct30
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
광진구
894 
성북구
812 
서초구
782 
서대문구
758 
종로구
 
588
Other values (25)
6166 

Length

Max length4
Median length3
Mean length3.1099
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row관악구
2nd row서초구
3rd row은평구
4th row영등포구
5th row동대문구

Common Values

ValueCountFrequency (%)
광진구 894
 
8.9%
성북구 812
 
8.1%
서초구 782
 
7.8%
서대문구 758
 
7.6%
종로구 588
 
5.9%
강서구 574
 
5.7%
은평구 566
 
5.7%
동대문구 485
 
4.9%
중구 451
 
4.5%
용산구 441
 
4.4%
Other values (20) 3649
36.5%

Length

2024-05-10T23:52:17.742695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
광진구 894
 
8.9%
성북구 812
 
8.1%
서초구 782
 
7.8%
서대문구 758
 
7.6%
종로구 588
 
5.9%
강서구 574
 
5.7%
은평구 566
 
5.7%
동대문구 485
 
4.9%
중구 451
 
4.5%
용산구 441
 
4.4%
Other values (20) 3649
36.5%
Distinct441
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-10T23:52:18.319375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.2294
Min length1

Characters and Unicode

Total characters32294
Distinct characters210
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)0.5%

Sample

1st row남현동
2nd row내곡동
3rd row응암동
4th row영등포동
5th row답십리동
ValueCountFrequency (%)
내곡동 524
 
5.3%
광장동 328
 
3.3%
정릉동 224
 
2.3%
북아현동 206
 
2.1%
불광동 187
 
1.9%
화양동 181
 
1.8%
개화동 147
 
1.5%
미아동 146
 
1.5%
면목동 128
 
1.3%
군자동 123
 
1.2%
Other values (430) 7753
77.9%
2024-05-10T23:52:19.299343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9813
30.4%
1097
 
3.4%
903
 
2.8%
791
 
2.4%
610
 
1.9%
541
 
1.7%
538
 
1.7%
464
 
1.4%
434
 
1.3%
402
 
1.2%
Other values (200) 16701
51.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 31328
97.0%
Decimal Number 913
 
2.8%
Space Separator 53
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9813
31.3%
1097
 
3.5%
903
 
2.9%
791
 
2.5%
610
 
1.9%
541
 
1.7%
538
 
1.7%
464
 
1.5%
434
 
1.4%
402
 
1.3%
Other values (191) 15735
50.2%
Decimal Number
ValueCountFrequency (%)
2 282
30.9%
1 185
20.3%
3 172
18.8%
5 133
14.6%
4 65
 
7.1%
6 53
 
5.8%
7 22
 
2.4%
8 1
 
0.1%
Space Separator
ValueCountFrequency (%)
53
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 31328
97.0%
Common 966
 
3.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9813
31.3%
1097
 
3.5%
903
 
2.9%
791
 
2.5%
610
 
1.9%
541
 
1.7%
538
 
1.7%
464
 
1.5%
434
 
1.4%
402
 
1.3%
Other values (191) 15735
50.2%
Common
ValueCountFrequency (%)
2 282
29.2%
1 185
19.2%
3 172
17.8%
5 133
13.8%
4 65
 
6.7%
6 53
 
5.5%
53
 
5.5%
7 22
 
2.3%
8 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 31328
97.0%
ASCII 966
 
3.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9813
31.3%
1097
 
3.5%
903
 
2.9%
791
 
2.5%
610
 
1.9%
541
 
1.7%
538
 
1.7%
464
 
1.5%
434
 
1.4%
402
 
1.3%
Other values (191) 15735
50.2%
ASCII
ValueCountFrequency (%)
2 282
29.2%
1 185
19.2%
3 172
17.8%
5 133
13.8%
4 65
 
6.7%
6 53
 
5.5%
53
 
5.5%
7 22
 
2.3%
8 1
 
0.1%

부속_대지_구분_코드
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대지
9637 
 
278
<NA>
 
84
블록
 
1

Length

Max length4
Median length2
Mean length1.989
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row대지
2nd row대지
3rd row대지
4th row대지
5th row대지

Common Values

ValueCountFrequency (%)
대지 9637
96.4%
278
 
2.8%
<NA> 84
 
0.8%
블록 1
 
< 0.1%

Length

2024-05-10T23:52:19.734573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-10T23:52:20.089100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대지 9637
96.4%
278
 
2.8%
na 84
 
0.8%
블록 1
 
< 0.1%

Correlations

2024-05-10T23:52:20.294184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대장_구분_코드부속_시군구_코드부속_대지_구분_코드
부속_대장_구분_코드1.0000.3660.059
부속_시군구_코드0.3661.0000.232
부속_대지_구분_코드0.0590.2321.000
2024-05-10T23:52:20.556411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대지_구분_코드부속_시군구_코드부속_대장_구분_코드
부속_대지_구분_코드1.0000.1200.098
부속_시군구_코드0.1201.0000.313
부속_대장_구분_코드0.0980.3131.000
2024-05-10T23:52:20.817100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
부속_대장_구분_코드부속_시군구_코드부속_대지_구분_코드
부속_대장_구분_코드1.0000.3130.098
부속_시군구_코드0.3131.0000.120
부속_대지_구분_코드0.0980.1201.000

Missing values

2024-05-10T23:52:11.491004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-10T23:52:11.916308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_부속_지번_PK관리_건축물대장_PK부속_대장_구분_코드부속_시군구_코드부속_법정동_코드부속_대지_구분_코드
7161411620-100000000000000059657811620-1000000000000000659900일반관악구남현동대지
8194311650-10006346711650-100284701일반서초구내곡동대지
3481511380-10002582511380-100285515집합은평구응암동대지
9522111560-10001062711560-100218676집합영등포구영등포동대지
4559011230-527411230-36131일반동대문구답십리동대지
1356911290-10002547111290-100265106일반성북구종암동대지
5332611680-235311680-1821집합강남구도곡동
2594211440-708611440-32432집합마포구중동대지
5554511215-10002703711215-100267291집합광진구중곡동대지
5623211290-432611290-21469일반성북구정릉동대지
관리_부속_지번_PK관리_건축물대장_PK부속_대장_구분_코드부속_시군구_코드부속_법정동_코드부속_대지_구분_코드
6291711440-381811440-18539<NA>마포구<NA>
5648411260-45011260-13147일반중랑구면목동대지
6971711215-10000336011215-15241일반광진구광장동대지
724711620-293711620-28873일반관악구봉천동대지
8048411350-10000768911350-100201712집합노원구상계동대지
5573011650-10002797911650-100284563일반서초구내곡동대지
8714811215-241111215-15222일반광진구광장동대지
9305411380-260011380-12075집합은평구불광동대지
3002911650-382511650-15518일반서초구서초동대지
3484611740-10001977711740-100274688일반강동구고덕동대지