Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells6534
Missing cells (%)10.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Numeric1
Text4
Categorical1

Dataset

Description홈페이지 우편번호 데이터 제공을 위한 우편번호 도 시군구 동 상세주소를 제공하기 위하여 데이터화 시켰으며 상세주소를 공공데이터 화하고자함.
Author동해시시설관리공단
URLhttps://www.data.go.kr/data/15075517/fileData.do

Alerts

고유번호 is highly overall correlated with High correlation
is highly overall correlated with 고유번호High correlation
상세주소 has 6534 (65.3%) missing valuesMissing
고유번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 22:36:05.973053
Analysis finished2023-12-12 22:36:07.270811
Duration1.3 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

고유번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24921.046
Minimum14
Maximum49699
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T07:36:07.357630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile2658.8
Q112630.75
median24939
Q337295.5
95-th percentile47231.4
Maximum49699
Range49685
Interquartile range (IQR)24664.75

Descriptive statistics

Standard deviation14288.515
Coefficient of variation (CV)0.57335135
Kurtosis-1.1985577
Mean24921.046
Median Absolute Deviation (MAD)12339.5
Skewness0.0045044353
Sum2.4921046 × 108
Variance2.0416167 × 108
MonotonicityNot monotonic
2023-12-13T07:36:07.526687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24172 1
 
< 0.1%
49353 1
 
< 0.1%
24439 1
 
< 0.1%
36958 1
 
< 0.1%
12326 1
 
< 0.1%
19162 1
 
< 0.1%
36420 1
 
< 0.1%
40294 1
 
< 0.1%
34261 1
 
< 0.1%
12165 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
14 1
< 0.1%
20 1
< 0.1%
26 1
< 0.1%
37 1
< 0.1%
49 1
< 0.1%
52 1
< 0.1%
55 1
< 0.1%
61 1
< 0.1%
65 1
< 0.1%
67 1
< 0.1%
ValueCountFrequency (%)
49699 1
< 0.1%
49698 1
< 0.1%
49693 1
< 0.1%
49691 1
< 0.1%
49686 1
< 0.1%
49685 1
< 0.1%
49680 1
< 0.1%
49678 1
< 0.1%
49674 1
< 0.1%
49667 1
< 0.1%
Distinct8486
Distinct (%)84.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:36:07.925504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters70000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7315 ?
Unique (%)73.2%

Sample

1st row429-744
2nd row703-840
3rd row380-130
4th row718-702
5th row595-891
ValueCountFrequency (%)
138-873 9
 
0.1%
701-819 8
 
0.1%
486-859 7
 
0.1%
476-809 7
 
0.1%
701-813 7
 
0.1%
482-869 6
 
0.1%
138-819 6
 
0.1%
706-808 6
 
0.1%
701-810 6
 
0.1%
702-843 5
 
< 0.1%
Other values (8476) 9933
99.3%
2023-12-13T07:36:08.466970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 10000
14.3%
0 7625
10.9%
8 7507
10.7%
1 7152
10.2%
3 6297
9.0%
7 6288
9.0%
2 5618
8.0%
6 5420
7.7%
4 5392
7.7%
5 5321
7.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 60000
85.7%
Dash Punctuation 10000
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 7625
12.7%
8 7507
12.5%
1 7152
11.9%
3 6297
10.5%
7 6288
10.5%
2 5618
9.4%
6 5420
9.0%
4 5392
9.0%
5 5321
8.9%
9 3380
5.6%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 70000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 10000
14.3%
0 7625
10.9%
8 7507
10.7%
1 7152
10.2%
3 6297
9.0%
7 6288
9.0%
2 5618
8.0%
6 5420
7.7%
4 5392
7.7%
5 5321
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 70000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 10000
14.3%
0 7625
10.9%
8 7507
10.7%
1 7152
10.2%
3 6297
9.0%
7 6288
9.0%
2 5618
8.0%
6 5420
7.7%
4 5392
7.7%
5 5321
7.6%


Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기
1593 
서울
1491 
경북
989 
전남
799 
경남
728 
Other values (11)
4400 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기
2nd row대구
3rd row충북
4th row경북
5th row전북

Common Values

ValueCountFrequency (%)
경기 1593
15.9%
서울 1491
14.9%
경북 989
9.9%
전남 799
8.0%
경남 728
7.3%
부산 670
6.7%
충남 639
6.4%
강원 540
 
5.4%
전북 536
 
5.4%
대구 512
 
5.1%
Other values (6) 1503
15.0%

Length

2023-12-13T07:36:08.623980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 1593
15.9%
서울 1491
14.9%
경북 989
9.9%
전남 799
8.0%
경남 728
7.3%
부산 670
6.7%
충남 639
6.4%
강원 540
 
5.4%
전북 536
 
5.4%
대구 512
 
5.1%
Other values (6) 1503
15.0%
Distinct225
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:36:08.975719image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length3
Mean length3.2808
Min length2

Characters and Unicode

Total characters32808
Distinct characters140
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row시흥시
2nd row서구
3rd row충주시
4th row칠곡군
5th row순창군
ValueCountFrequency (%)
남구 322
 
3.0%
북구 286
 
2.6%
중구 251
 
2.3%
동구 230
 
2.1%
서구 221
 
2.0%
고양시 114
 
1.0%
용인시 113
 
1.0%
전주시 111
 
1.0%
강남구 104
 
1.0%
수성구 103
 
0.9%
Other values (223) 9043
83.0%
2023-12-13T07:36:09.517472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4572
 
13.9%
4067
 
12.4%
2638
 
8.0%
1116
 
3.4%
974
 
3.0%
900
 
2.7%
898
 
2.7%
805
 
2.5%
784
 
2.4%
769
 
2.3%
Other values (130) 15285
46.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 31910
97.3%
Space Separator 898
 
2.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4572
 
14.3%
4067
 
12.7%
2638
 
8.3%
1116
 
3.5%
974
 
3.1%
900
 
2.8%
805
 
2.5%
784
 
2.5%
769
 
2.4%
738
 
2.3%
Other values (129) 14547
45.6%
Space Separator
ValueCountFrequency (%)
898
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 31910
97.3%
Common 898
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4572
 
14.3%
4067
 
12.7%
2638
 
8.3%
1116
 
3.5%
974
 
3.1%
900
 
2.8%
805
 
2.5%
784
 
2.5%
769
 
2.4%
738
 
2.3%
Other values (129) 14547
45.6%
Common
ValueCountFrequency (%)
898
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 31910
97.3%
ASCII 898
 
2.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4572
 
14.3%
4067
 
12.7%
2638
 
8.3%
1116
 
3.5%
974
 
3.1%
900
 
2.8%
805
 
2.5%
784
 
2.5%
769
 
2.4%
738
 
2.3%
Other values (129) 14547
45.6%
ASCII
ValueCountFrequency (%)
898
100.0%


Text

Distinct8192
Distinct (%)81.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T07:36:09.979442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length21
Mean length6.8186
Min length2

Characters and Unicode

Total characters68186
Distinct characters568
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7302 ?
Unique (%)73.0%

Sample

1st row정왕1동 금강아파트
2nd row평리3동
3rd row호암동
4th row석적읍 남율리 우방신천지아파트
5th row쌍치면 도고리
ValueCountFrequency (%)
사서함 78
 
0.5%
주공아파트 75
 
0.5%
남면 44
 
0.3%
서면 43
 
0.3%
현대아파트 42
 
0.3%
북면 29
 
0.2%
중동 26
 
0.2%
금곡동 23
 
0.1%
동면 21
 
0.1%
서울중앙우체국사서함 21
 
0.1%
Other values (8033) 15874
97.5%
2023-12-13T07:36:10.569986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6309
 
9.3%
6276
 
9.2%
4018
 
5.9%
3269
 
4.8%
1633
 
2.4%
1513
 
2.2%
1495
 
2.2%
1 1380
 
2.0%
2 1323
 
1.9%
1176
 
1.7%
Other values (558) 39794
58.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 57494
84.3%
Space Separator 6276
 
9.2%
Decimal Number 4147
 
6.1%
Uppercase Letter 160
 
0.2%
Open Punctuation 32
 
< 0.1%
Close Punctuation 32
 
< 0.1%
Other Punctuation 27
 
< 0.1%
Dash Punctuation 9
 
< 0.1%
Lowercase Letter 8
 
< 0.1%
Letter Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6309
 
11.0%
4018
 
7.0%
3269
 
5.7%
1633
 
2.8%
1513
 
2.6%
1495
 
2.6%
1176
 
2.0%
1028
 
1.8%
918
 
1.6%
775
 
1.3%
Other values (518) 35360
61.5%
Uppercase Letter
ValueCountFrequency (%)
K 30
18.8%
T 25
15.6%
S 24
15.0%
G 19
11.9%
L 17
10.6%
C 11
 
6.9%
B 8
 
5.0%
I 7
 
4.4%
A 3
 
1.9%
D 2
 
1.2%
Other values (10) 14
8.8%
Decimal Number
ValueCountFrequency (%)
1 1380
33.3%
2 1323
31.9%
3 659
15.9%
4 315
 
7.6%
5 167
 
4.0%
6 113
 
2.7%
7 73
 
1.8%
8 54
 
1.3%
9 39
 
0.9%
0 24
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
e 6
75.0%
t 1
 
12.5%
h 1
 
12.5%
Other Punctuation
ValueCountFrequency (%)
. 24
88.9%
& 3
 
11.1%
Space Separator
ValueCountFrequency (%)
6276
100.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%
Close Punctuation
ValueCountFrequency (%)
) 32
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 9
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 57494
84.3%
Common 10523
 
15.4%
Latin 169
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6309
 
11.0%
4018
 
7.0%
3269
 
5.7%
1633
 
2.8%
1513
 
2.6%
1495
 
2.6%
1176
 
2.0%
1028
 
1.8%
918
 
1.6%
775
 
1.3%
Other values (518) 35360
61.5%
Latin
ValueCountFrequency (%)
K 30
17.8%
T 25
14.8%
S 24
14.2%
G 19
11.2%
L 17
10.1%
C 11
 
6.5%
B 8
 
4.7%
I 7
 
4.1%
e 6
 
3.6%
A 3
 
1.8%
Other values (14) 19
11.2%
Common
ValueCountFrequency (%)
6276
59.6%
1 1380
 
13.1%
2 1323
 
12.6%
3 659
 
6.3%
4 315
 
3.0%
5 167
 
1.6%
6 113
 
1.1%
7 73
 
0.7%
8 54
 
0.5%
9 39
 
0.4%
Other values (6) 124
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 57494
84.3%
ASCII 10691
 
15.7%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6309
 
11.0%
4018
 
7.0%
3269
 
5.7%
1633
 
2.8%
1513
 
2.6%
1495
 
2.6%
1176
 
2.0%
1028
 
1.8%
918
 
1.6%
775
 
1.3%
Other values (518) 35360
61.5%
ASCII
ValueCountFrequency (%)
6276
58.7%
1 1380
 
12.9%
2 1323
 
12.4%
3 659
 
6.2%
4 315
 
2.9%
5 167
 
1.6%
6 113
 
1.1%
7 73
 
0.7%
8 54
 
0.5%
9 39
 
0.4%
Other values (29) 292
 
2.7%
Number Forms
ValueCountFrequency (%)
1
100.0%

상세주소
Text

MISSING 

Distinct2910
Distinct (%)84.0%
Missing6534
Missing (%)65.3%
Memory size156.2 KiB
2023-12-13T07:36:10.967419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length7.3098673
Min length1

Characters and Unicode

Total characters25336
Distinct characters28
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2701 ?
Unique (%)77.9%

Sample

1st row679∼687
2nd row(101∼206동)
3rd row601∼1900
4th row(51∼53동)
5th row(201∼304동)
ValueCountFrequency (%)
101∼106동 28
 
0.8%
101∼108동 28
 
0.8%
101∼107동 25
 
0.7%
101∼105동 22
 
0.6%
101∼103동 20
 
0.6%
101∼104동 19
 
0.5%
101∼110동 15
 
0.4%
101∼109동 13
 
0.4%
101∼111동 12
 
0.3%
101∼113동 12
 
0.3%
Other values (2900) 3272
94.4%
2023-12-13T07:36:11.596034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 4497
17.7%
3104
12.3%
0 2924
11.5%
2 1854
7.3%
3 1741
 
6.9%
5 1539
 
6.1%
4 1504
 
5.9%
9 1475
 
5.8%
6 1399
 
5.5%
7 1277
 
5.0%
Other values (18) 4022
15.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 19353
76.4%
Math Symbol 3104
 
12.3%
Open Punctuation 910
 
3.6%
Close Punctuation 910
 
3.6%
Other Letter 854
 
3.4%
Dash Punctuation 199
 
0.8%
Uppercase Letter 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
711
83.3%
127
 
14.9%
5
 
0.6%
3
 
0.4%
2
 
0.2%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 4497
23.2%
0 2924
15.1%
2 1854
9.6%
3 1741
 
9.0%
5 1539
 
8.0%
4 1504
 
7.8%
9 1475
 
7.6%
6 1399
 
7.2%
7 1277
 
6.6%
8 1143
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
B 3
50.0%
A 2
33.3%
F 1
 
16.7%
Math Symbol
ValueCountFrequency (%)
3104
100.0%
Open Punctuation
ValueCountFrequency (%)
( 910
100.0%
Close Punctuation
ValueCountFrequency (%)
) 910
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 199
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 24476
96.6%
Hangul 854
 
3.4%
Latin 6
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4497
18.4%
3104
12.7%
0 2924
11.9%
2 1854
7.6%
3 1741
 
7.1%
5 1539
 
6.3%
4 1504
 
6.1%
9 1475
 
6.0%
6 1399
 
5.7%
7 1277
 
5.2%
Other values (4) 3162
12.9%
Hangul
ValueCountFrequency (%)
711
83.3%
127
 
14.9%
5
 
0.6%
3
 
0.4%
2
 
0.2%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
Latin
ValueCountFrequency (%)
B 3
50.0%
A 2
33.3%
F 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 21378
84.4%
Math Operators 3104
 
12.3%
Hangul 854
 
3.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 4497
21.0%
0 2924
13.7%
2 1854
8.7%
3 1741
 
8.1%
5 1539
 
7.2%
4 1504
 
7.0%
9 1475
 
6.9%
6 1399
 
6.5%
7 1277
 
6.0%
8 1143
 
5.3%
Other values (6) 2025
9.5%
Math Operators
ValueCountFrequency (%)
3104
100.0%
Hangul
ValueCountFrequency (%)
711
83.3%
127
 
14.9%
5
 
0.6%
3
 
0.4%
2
 
0.2%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%
1
 
0.1%

Interactions

2023-12-13T07:36:06.991463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:36:11.706664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고유번호
고유번호1.0000.953
0.9531.000
2023-12-13T07:36:11.807689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고유번호
고유번호1.0000.797
0.7971.000

Missing values

2023-12-13T07:36:07.104179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:36:07.214998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

고유번호우편번호시군구상세주소
2417124172429-744경기시흥시정왕1동 금강아파트<NA>
1256512566703-840대구서구평리3동679∼687
4808448085380-130충북충주시호암동<NA>
3698436985718-702경북칠곡군석적읍 남율리 우방신천지아파트(101∼206동)
4328643287595-891전북순창군쌍치면 도고리<NA>
2710927110450-722경기평택시비전1동 은행아파트<NA>
4522845229336-768충남아산시풍기동 주은아파트<NA>
4832848329367-883충북괴산군연풍면 율전리<NA>
65226523122-753서울은평구불광1동 대한생명빌딩<NA>
68366837110-786서울종로구신문로1가 흥국생명빌딩<NA>
고유번호우편번호시군구상세주소
1462314624406-715인천연수구옥련동 서해아파트<NA>
86538654607-782부산동래구온천3동 반도보라스카이뷰오피스텔<NA>
4769247693360-823충북청주시 상당구용암1동1100∼1599
40774078121-887서울마포구합정동430∼445
1950519506215-811강원양양군서면 영덕리<NA>
1525715258503-207광주남구석정동<NA>
2533725338487-809경기양주시봉양동 사서함118-(34∼38)
13981399157-905서울강서구화곡2동861∼870
1876618767220-802강원원주시문막읍 동화12리<NA>
1856318564220-933강원원주시관설동산1