Overview

Dataset statistics

Number of variables7
Number of observations500
Missing cells153
Missing cells (%)4.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.4 KiB
Average record size in memory58.3 B

Variable types

Numeric1
Categorical3
Text3

Dataset

Description샘플 데이터
AuthorKB국민은행
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=1

Alerts

is highly imbalanced (83.4%)Imbalance
동읍면 has 8 (1.6%) missing valuesMissing
has 145 (29.0%) missing valuesMissing
법정동코드 has unique valuesUnique

Reproduction

Analysis started2023-12-10 14:51:16.514600
Analysis finished2023-12-10 14:51:17.365609
Duration0.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

법정동코드
Real number (ℝ)

UNIQUE 

Distinct500
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.3360896 × 109
Minimum1.1170109 × 109
Maximum5.0130253 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:51:17.448359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.1170109 × 109
5-th percentile2.7110157 × 109
Q14.223025 × 109
median4.4827817 × 109
Q34.717039 × 109
95-th percentile4.8850321 × 109
Maximum5.0130253 × 109
Range3.8960144 × 109
Interquartile range (IQR)4.94014 × 108

Descriptive statistics

Standard deviation6.799384 × 108
Coefficient of variation (CV)0.15680912
Kurtosis8.1440326
Mean4.3360896 × 109
Median Absolute Deviation (MAD)2.3725085 × 108
Skewness-2.6750845
Sum2.1680448 × 1012
Variance4.6231622 × 1017
MonotonicityNot monotonic
2023-12-10T23:51:17.599713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4717025021 1
 
0.2%
4420010500 1
 
0.2%
4423036029 1
 
0.2%
4372037023 1
 
0.2%
4575035531 1
 
0.2%
4136025326 1
 
0.2%
4223036026 1
 
0.2%
4514012000 1
 
0.2%
3611034047 1
 
0.2%
4672039027 1
 
0.2%
Other values (490) 490
98.0%
ValueCountFrequency (%)
1117010900 1
0.2%
1117013100 1
0.2%
1123010900 1
0.2%
1141010500 1
0.2%
1141011700 1
0.2%
1144000000 1
0.2%
1147010300 1
0.2%
1156012200 1
0.2%
1159010600 1
0.2%
2614010100 1
0.2%
ValueCountFrequency (%)
5013025321 1
0.2%
5013012000 1
0.2%
5013000000 1
0.2%
5011025326 1
0.2%
5011012300 1
0.2%
4972025026 1
0.2%
4972025023 1
0.2%
4971025900 1
0.2%
4971025624 1
0.2%
4913011700 1
0.2%

시도
Categorical

Distinct18
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
경상북도
82 
전라남도
67 
경상남도
62 
경기도
61 
충청남도
52 
Other values (13)
176 

Length

Max length7
Median length4
Mean length3.938
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전라남도
2nd row강원도
3rd row전라북도
4th row강원도
5th row경상남도

Common Values

ValueCountFrequency (%)
경상북도 82
16.4%
전라남도 67
13.4%
경상남도 62
12.4%
경기도 61
12.2%
충청남도 52
10.4%
강원도 40
8.0%
전라북도 37
7.4%
충청북도 35
7.0%
서울특별시 12
 
2.4%
부산광역시 9
 
1.8%
Other values (8) 43
8.6%

Length

2023-12-10T23:51:17.758304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경상북도 82
16.4%
전라남도 67
13.4%
경상남도 62
12.4%
경기도 61
12.2%
충청남도 52
10.4%
강원도 40
8.0%
전라북도 37
7.4%
충청북도 35
7.0%
서울특별시 12
 
2.4%
부산광역시 9
 
1.8%
Other values (8) 43
8.6%
Distinct160
Distinct (%)32.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:51:18.089338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.006
Min length2

Characters and Unicode

Total characters1503
Distinct characters116
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)7.0%

Sample

1st row북제주군
2nd row청주시
3rd row제주시
4th row안산시
5th row중구
ValueCountFrequency (%)
청주시 12
 
2.4%
중구 10
 
2.0%
포항시 10
 
2.0%
창원시 9
 
1.8%
영천시 9
 
1.8%
상주시 8
 
1.6%
제천시 8
 
1.6%
화성시 7
 
1.4%
경주시 7
 
1.4%
예산군 7
 
1.4%
Other values (150) 413
82.6%
2023-12-10T23:51:18.533619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
235
 
15.6%
234
 
15.6%
78
 
5.2%
61
 
4.1%
49
 
3.3%
48
 
3.2%
46
 
3.1%
36
 
2.4%
32
 
2.1%
28
 
1.9%
Other values (106) 656
43.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1503
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
235
 
15.6%
234
 
15.6%
78
 
5.2%
61
 
4.1%
49
 
3.3%
48
 
3.2%
46
 
3.1%
36
 
2.4%
32
 
2.1%
28
 
1.9%
Other values (106) 656
43.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1503
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
235
 
15.6%
234
 
15.6%
78
 
5.2%
61
 
4.1%
49
 
3.3%
48
 
3.2%
46
 
3.1%
36
 
2.4%
32
 
2.1%
28
 
1.9%
Other values (106) 656
43.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1503
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
235
 
15.6%
234
 
15.6%
78
 
5.2%
61
 
4.1%
49
 
3.3%
48
 
3.2%
46
 
3.1%
36
 
2.4%
32
 
2.1%
28
 
1.9%
Other values (106) 656
43.6%


Categorical

IMBALANCE 

Distinct19
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
461 
남구
 
5
상당구
 
4
북구
 
3
흥덕구
 
3
Other values (14)
 
24

Length

Max length5
Median length4
Mean length3.92
Min length2

Unique

Unique8 ?
Unique (%)1.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 461
92.2%
남구 5
 
1.0%
상당구 4
 
0.8%
북구 3
 
0.6%
흥덕구 3
 
0.6%
서원구 3
 
0.6%
처인구 3
 
0.6%
동남구 3
 
0.6%
마산합포구 3
 
0.6%
청원구 2
 
0.4%
Other values (9) 10
 
2.0%

Length

2023-12-10T23:51:18.671554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 461
92.2%
남구 5
 
1.0%
상당구 4
 
0.8%
북구 3
 
0.6%
흥덕구 3
 
0.6%
서원구 3
 
0.6%
처인구 3
 
0.6%
동남구 3
 
0.6%
마산합포구 3
 
0.6%
서북구 2
 
0.4%
Other values (9) 10
 
2.0%

동읍면
Text

MISSING 

Distinct415
Distinct (%)84.3%
Missing8
Missing (%)1.6%
Memory size4.0 KiB
2023-12-10T23:51:18.962150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.0223577
Min length2

Characters and Unicode

Total characters1487
Distinct characters209
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique356 ?
Unique (%)72.4%

Sample

1st row가례면
2nd row한경면
3rd row용궁면
4th row반남면
5th row덕적면
ValueCountFrequency (%)
성산읍 5
 
1.0%
동면 4
 
0.8%
남면 4
 
0.8%
입장면 4
 
0.8%
모동면 3
 
0.6%
옥천읍 3
 
0.6%
가덕면 3
 
0.6%
현도면 3
 
0.6%
광석면 3
 
0.6%
옥산면 3
 
0.6%
Other values (405) 457
92.9%
2023-12-10T23:51:19.392946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
337
22.7%
112
 
7.5%
76
 
5.1%
34
 
2.3%
33
 
2.2%
27
 
1.8%
24
 
1.6%
22
 
1.5%
21
 
1.4%
20
 
1.3%
Other values (199) 781
52.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1480
99.5%
Decimal Number 7
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
337
22.8%
112
 
7.6%
76
 
5.1%
34
 
2.3%
33
 
2.2%
27
 
1.8%
24
 
1.6%
22
 
1.5%
21
 
1.4%
20
 
1.4%
Other values (195) 774
52.3%
Decimal Number
ValueCountFrequency (%)
2 3
42.9%
1 2
28.6%
7 1
 
14.3%
3 1
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1480
99.5%
Common 7
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
337
22.8%
112
 
7.6%
76
 
5.1%
34
 
2.3%
33
 
2.2%
27
 
1.8%
24
 
1.6%
22
 
1.5%
21
 
1.4%
20
 
1.4%
Other values (195) 774
52.3%
Common
ValueCountFrequency (%)
2 3
42.9%
1 2
28.6%
7 1
 
14.3%
3 1
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1480
99.5%
ASCII 7
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
337
22.8%
112
 
7.6%
76
 
5.1%
34
 
2.3%
33
 
2.2%
27
 
1.8%
24
 
1.6%
22
 
1.5%
21
 
1.4%
20
 
1.4%
Other values (195) 774
52.3%
ASCII
ValueCountFrequency (%)
2 3
42.9%
1 2
28.6%
7 1
 
14.3%
3 1
 
14.3%


Text

MISSING 

Distinct341
Distinct (%)96.1%
Missing145
Missing (%)29.0%
Memory size4.0 KiB
2023-12-10T23:51:19.684585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length2.9915493
Min length2

Characters and Unicode

Total characters1062
Distinct characters181
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique328 ?
Unique (%)92.4%

Sample

1st row천장리
2nd row금호리
3rd row대감리
4th row부연리
5th row창선1리
ValueCountFrequency (%)
중리 3
 
0.8%
신대리 2
 
0.6%
신영리 2
 
0.6%
신흥리 2
 
0.6%
용정리 2
 
0.6%
황곡리 2
 
0.6%
학산리 2
 
0.6%
고산리 2
 
0.6%
매화리 2
 
0.6%
평지리 2
 
0.6%
Other values (331) 334
94.1%
2023-12-10T23:51:20.160981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
355
33.4%
30
 
2.8%
23
 
2.2%
20
 
1.9%
19
 
1.8%
19
 
1.8%
15
 
1.4%
15
 
1.4%
15
 
1.4%
14
 
1.3%
Other values (171) 537
50.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1059
99.7%
Decimal Number 3
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
355
33.5%
30
 
2.8%
23
 
2.2%
20
 
1.9%
19
 
1.8%
19
 
1.8%
15
 
1.4%
15
 
1.4%
15
 
1.4%
14
 
1.3%
Other values (170) 534
50.4%
Decimal Number
ValueCountFrequency (%)
1 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1059
99.7%
Common 3
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
355
33.5%
30
 
2.8%
23
 
2.2%
20
 
1.9%
19
 
1.8%
19
 
1.8%
15
 
1.4%
15
 
1.4%
15
 
1.4%
14
 
1.3%
Other values (170) 534
50.4%
Common
ValueCountFrequency (%)
1 3
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1059
99.7%
ASCII 3
 
0.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
355
33.5%
30
 
2.8%
23
 
2.2%
20
 
1.9%
19
 
1.8%
19
 
1.8%
15
 
1.4%
15
 
1.4%
15
 
1.4%
14
 
1.3%
Other values (170) 534
50.4%
ASCII
ValueCountFrequency (%)
1 3
100.0%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
1
405 
0
95 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 405
81.0%
0 95
 
19.0%

Length

2023-12-10T23:51:20.297052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:51:20.397994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 405
81.0%
0 95
 
19.0%

Interactions

2023-12-10T23:51:16.954599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:51:20.464035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
법정동코드시도조회대상여부
법정동코드1.0000.2030.7540.061
시도0.2031.0000.0000.135
0.7540.0001.0000.372
조회대상여부0.0610.1350.3721.000
2023-12-10T23:51:20.589521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시도조회대상여부
시도1.0000.0000.104
0.0001.0000.195
조회대상여부0.1040.1951.000
2023-12-10T23:51:20.732498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
법정동코드시도조회대상여부
법정동코드1.0000.0830.3080.050
시도0.0831.0000.0000.104
0.3080.0001.0000.195
조회대상여부0.0500.1040.1951.000

Missing values

2023-12-10T23:51:17.093587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:51:17.209649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-10T23:51:17.311099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

법정동코드시도구시군동읍면조회대상여부
04717025021전라남도북제주군<NA><NA>천장리0
14481036030강원도청주시<NA>가례면금호리1
24311425324전라북도제주시<NA>한경면대감리1
34615033024강원도안산시<NA>용궁면부연리1
44873032036경상남도중구<NA>반남면창선1리1
54421033027경상남도제천시<NA>덕적면기지시리0
64480036025경기도장수군<NA>노곡면<NA>1
74886036027충청남도포항시<NA>음봉면월산리0
84816025023전라남도경산시<NA>부석면<NA>1
94885038022전라북도용인시<NA>중앙동<NA>1
법정동코드시도구시군동읍면조회대상여부
4904476034023충청북도서구<NA>광석면광령1리1
4914481040021경상북도구례군<NA>동면전동리1
4924683037029경상북도종로구<NA>거동동읍리1
4934575035522경상남도군위군<NA>함안면<NA>1
4944427011100전라남도홍성군<NA><NA>하추리1
4954574032026충청북도봉화군<NA>정남면옥계리1
4964311325000경상북도영천시<NA>영광읍대판리1
4974882036028부산광역시강진군<NA>용평면<NA>1
4984223034024경상남도성남시<NA>옥천면산양리1
4994572035000경기도영등포구<NA>고령읍송포리1