Overview

Dataset statistics

Number of variables5
Number of observations4934
Missing cells224
Missing cells (%)0.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory197.7 KiB
Average record size in memory41.0 B

Variable types

Numeric1
Text3
Categorical1

Dataset

Description북한의 행정구역에 관한 내용입니다.(연번,행정구역명,시·도 구분,시·군·구역·지구 구분,구·읍·면·동·리 구분)
Author법무부
URLhttps://www.data.go.kr/data/15042255/fileData.do

Alerts

연번 is highly overall correlated with 시·도 구분High correlation
시·도 구분 is highly overall correlated with 연번High correlation
구·읍·면·동·리 구분 has 224 (4.5%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 01:05:31.083219
Analysis finished2023-12-12 01:05:32.065921
Duration0.98 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct4934
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2467.5
Minimum1
Maximum4934
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size43.5 KiB
2023-12-12T10:05:32.176510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile247.65
Q11234.25
median2467.5
Q33700.75
95-th percentile4687.35
Maximum4934
Range4933
Interquartile range (IQR)2466.5

Descriptive statistics

Standard deviation1424.4674
Coefficient of variation (CV)0.57729177
Kurtosis-1.2
Mean2467.5
Median Absolute Deviation (MAD)1233.5
Skewness0
Sum12174645
Variance2029107.5
MonotonicityStrictly increasing
2023-12-12T10:05:32.452868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
3289 1
 
< 0.1%
3296 1
 
< 0.1%
3295 1
 
< 0.1%
3294 1
 
< 0.1%
3293 1
 
< 0.1%
3292 1
 
< 0.1%
3291 1
 
< 0.1%
3290 1
 
< 0.1%
3288 1
 
< 0.1%
Other values (4924) 4924
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
4934 1
< 0.1%
4933 1
< 0.1%
4932 1
< 0.1%
4931 1
< 0.1%
4930 1
< 0.1%
4929 1
< 0.1%
4928 1
< 0.1%
4927 1
< 0.1%
4926 1
< 0.1%
4925 1
< 0.1%
Distinct4929
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size38.7 KiB
2023-12-12T10:05:32.886225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length13
Mean length13.315363
Min length4

Characters and Unicode

Total characters65698
Distinct characters348
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4924 ?
Unique (%)99.8%

Sample

1st row 강원도
2nd row 강원도 고산군
3rd row 강원도 고산군 고산읍
4th row 강원도 고산군 광명리
5th row 강원도 고산군 구령리
ValueCountFrequency (%)
함경남도 702
 
4.8%
평안북도 651
 
4.4%
평안남도 521
 
3.5%
황해남도 497
 
3.4%
강원도 487
 
3.3%
평양특별시 427
 
2.9%
함경북도 419
 
2.8%
황해북도 362
 
2.5%
자강도 353
 
2.4%
양강도 263
 
1.8%
Other values (3693) 10035
68.2%
2023-12-12T10:05:33.574891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14717
22.4%
4364
 
6.6%
3363
 
5.1%
3276
 
5.0%
2135
 
3.2%
2014
 
3.1%
1876
 
2.9%
1787
 
2.7%
1562
 
2.4%
1451
 
2.2%
Other values (338) 29153
44.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 50918
77.5%
Space Separator 14717
 
22.4%
Open Punctuation 29
 
< 0.1%
Close Punctuation 29
 
< 0.1%
Decimal Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4364
 
8.6%
3363
 
6.6%
3276
 
6.4%
2135
 
4.2%
2014
 
4.0%
1876
 
3.7%
1787
 
3.5%
1562
 
3.1%
1451
 
2.8%
1411
 
2.8%
Other values (332) 27679
54.4%
Decimal Number
ValueCountFrequency (%)
1 3
60.0%
0 1
 
20.0%
5 1
 
20.0%
Space Separator
ValueCountFrequency (%)
14717
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 50918
77.5%
Common 14780
 
22.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4364
 
8.6%
3363
 
6.6%
3276
 
6.4%
2135
 
4.2%
2014
 
4.0%
1876
 
3.7%
1787
 
3.5%
1562
 
3.1%
1451
 
2.8%
1411
 
2.8%
Other values (332) 27679
54.4%
Common
ValueCountFrequency (%)
14717
99.6%
( 29
 
0.2%
) 29
 
0.2%
1 3
 
< 0.1%
0 1
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 50918
77.5%
ASCII 14780
 
22.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14717
99.6%
( 29
 
0.2%
) 29
 
0.2%
1 3
 
< 0.1%
0 1
 
< 0.1%
5 1
 
< 0.1%
Hangul
ValueCountFrequency (%)
4364
 
8.6%
3363
 
6.6%
3276
 
6.4%
2135
 
4.2%
2014
 
4.0%
1876
 
3.7%
1787
 
3.5%
1562
 
3.1%
1451
 
2.8%
1411
 
2.8%
Other values (332) 27679
54.4%

시·도 구분
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size38.7 KiB
함경남도
701 
평안북도
651 
평안남도
521 
황해남도
496 
강원도
487 
Other values (13)
2078 

Length

Max length6
Median length6
Mean length5.3145521
Min length4

Unique

Unique4 ?
Unique (%)0.1%

Sample

1st row 강원도
2nd row 강원도
3rd row 강원도
4th row 강원도
5th row 강원도

Common Values

ValueCountFrequency (%)
함경남도 701
14.2%
평안북도 651
13.2%
평안남도 521
10.6%
황해남도 496
10.1%
강원도 487
9.9%
평양특별시 427
8.7%
함경북도 418
8.5%
황해북도 361
7.3%
자강도 353
7.2%
양강도 263
 
5.3%
Other values (8) 256
 
5.2%

Length

2023-12-12T10:05:33.788459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
함경남도 702
14.2%
평안북도 651
13.2%
평안남도 521
10.6%
황해남도 497
10.1%
강원도 487
9.9%
평양특별시 427
8.7%
함경북도 419
8.5%
황해북도 362
7.3%
자강도 353
7.2%
양강도 263
 
5.3%
Other values (4) 252
 
5.1%
Distinct284
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Memory size38.7 KiB
2023-12-12T10:05:34.257046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length4
Mean length4.4941224
Min length1

Characters and Unicode

Total characters22174
Distinct characters149
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique74 ?
Unique (%)1.5%

Sample

1st row
2nd row 고산군
3rd row 고산군
4th row 고산군
5th row 고산군
ValueCountFrequency (%)
청진시 117
 
2.3%
함흥시 81
 
1.6%
단천시 79
 
1.6%
원산시 61
 
1.2%
신의주시 59
 
1.2%
금야군 56
 
1.1%
흥남시 50
 
1.0%
정평군 46
 
0.9%
구성시 44
 
0.9%
북청군 43
 
0.8%
Other values (224) 4423
87.4%
2023-12-12T10:05:34.862921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6089
27.5%
3331
15.0%
1098
 
5.0%
752
 
3.4%
750
 
3.4%
561
 
2.5%
503
 
2.3%
393
 
1.8%
354
 
1.6%
321
 
1.4%
Other values (139) 8022
36.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 16085
72.5%
Space Separator 6089
 
27.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3331
20.7%
1098
 
6.8%
752
 
4.7%
750
 
4.7%
561
 
3.5%
503
 
3.1%
393
 
2.4%
354
 
2.2%
321
 
2.0%
270
 
1.7%
Other values (138) 7752
48.2%
Space Separator
ValueCountFrequency (%)
6089
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 16085
72.5%
Common 6089
 
27.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3331
20.7%
1098
 
6.8%
752
 
4.7%
750
 
4.7%
561
 
3.5%
503
 
3.1%
393
 
2.4%
354
 
2.2%
321
 
2.0%
270
 
1.7%
Other values (138) 7752
48.2%
Common
ValueCountFrequency (%)
6089
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 16085
72.5%
ASCII 6089
 
27.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6089
100.0%
Hangul
ValueCountFrequency (%)
3331
20.7%
1098
 
6.8%
752
 
4.7%
750
 
4.7%
561
 
3.5%
503
 
3.1%
393
 
2.4%
354
 
2.2%
321
 
2.0%
270
 
1.7%
Other values (138) 7752
48.2%
Distinct3631
Distinct (%)77.1%
Missing224
Missing (%)4.5%
Memory size38.7 KiB
2023-12-12T10:05:35.255892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length4
Mean length4.2477707
Min length1

Characters and Unicode

Total characters20007
Distinct characters344
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3064 ?
Unique (%)65.1%

Sample

1st row 고산읍
2nd row 광명리
3rd row 구령리
4th row 구읍리
5th row 금리
ValueCountFrequency (%)
66
 
1.4%
22
 
0.5%
신흥리 19
 
0.4%
은덕군 18
 
0.4%
역전동 17
 
0.4%
신풍리 12
 
0.2%
신성리 12
 
0.2%
룡산리 12
 
0.2%
오봉리 11
 
0.2%
로동자구 11
 
0.2%
Other values (3466) 4635
95.9%
2023-12-12T10:05:35.901824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4334
21.7%
3216
 
16.1%
1639
 
8.2%
391
 
2.0%
358
 
1.8%
299
 
1.5%
299
 
1.5%
289
 
1.4%
275
 
1.4%
243
 
1.2%
Other values (334) 8664
43.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15610
78.0%
Space Separator 4334
 
21.7%
Close Punctuation 29
 
0.1%
Open Punctuation 29
 
0.1%
Decimal Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3216
 
20.6%
1639
 
10.5%
391
 
2.5%
358
 
2.3%
299
 
1.9%
299
 
1.9%
289
 
1.9%
275
 
1.8%
243
 
1.6%
229
 
1.5%
Other values (328) 8372
53.6%
Decimal Number
ValueCountFrequency (%)
1 3
60.0%
0 1
 
20.0%
5 1
 
20.0%
Space Separator
ValueCountFrequency (%)
4334
100.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15610
78.0%
Common 4397
 
22.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3216
 
20.6%
1639
 
10.5%
391
 
2.5%
358
 
2.3%
299
 
1.9%
299
 
1.9%
289
 
1.9%
275
 
1.8%
243
 
1.6%
229
 
1.5%
Other values (328) 8372
53.6%
Common
ValueCountFrequency (%)
4334
98.6%
) 29
 
0.7%
( 29
 
0.7%
1 3
 
0.1%
0 1
 
< 0.1%
5 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15610
78.0%
ASCII 4397
 
22.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4334
98.6%
) 29
 
0.7%
( 29
 
0.7%
1 3
 
0.1%
0 1
 
< 0.1%
5 1
 
< 0.1%
Hangul
ValueCountFrequency (%)
3216
 
20.6%
1639
 
10.5%
391
 
2.5%
358
 
2.3%
299
 
1.9%
299
 
1.9%
289
 
1.9%
275
 
1.8%
243
 
1.6%
229
 
1.5%
Other values (328) 8372
53.6%

Interactions

2023-12-12T10:05:31.695416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T10:05:36.024670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번시·도 구분
연번1.0000.972
시·도 구분0.9721.000
2023-12-12T10:05:36.138120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번시·도 구분
연번1.0000.859
시·도 구분0.8591.000

Missing values

2023-12-12T10:05:31.860545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T10:05:32.000816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번행정구역명시·도 구분시·군·구역·지구 구분구·읍·면·동·리 구분
01강원도강원도<NA>
12강원도 고산군강원도고산군<NA>
23강원도 고산군 고산읍강원도고산군고산읍
34강원도 고산군 광명리강원도고산군광명리
45강원도 고산군 구령리강원도고산군구령리
56강원도 고산군 구읍리강원도고산군구읍리
67강원도 고산군 금리강원도고산군금리
78강원도 고산군 금천리강원도고산군금천리
89강원도 고산군 금풍리강원도고산군금풍리
910강원도 고산군 남산리강원도고산군남산리
연번행정구역명시·도 구분시·군·구역·지구 구분구·읍·면·동·리 구분
49244925황해북도 황주군 장사리황해북도황주군장사리
49254926황해북도 황주군 장천리황해북도황주군장천리
49264927황해북도 황주군 천주리황해북도황주군천주리
49274928황해북도 황주군 철도리황해북도황주군철도리
49284929황해북도 황주군 청룡리황해북도황주군청룡리
49294930황해북도 황주군 청운리황해북도황주군청운리
49304931황해북도 황주군 침촌리황해북도황주군침촌리
49314932황해북도 황주군 포남리황해북도황주군포남리
49324933황해북도 황주군 황주읍황해북도황주군황주읍
49334934황해북도 황주군 흑교리황해북도황주군흑교리