Overview

Dataset statistics

Number of variables2
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows486
Duplicate rows (%)4.9%
Total size in memory244.1 KiB
Average record size in memory25.0 B

Variable types

Numeric1
Text1

Dataset

Description한국의료기기안전정보원이 제공하는 우편번호 데이터 중 우편 번호 한국의료기기안전정보원이 제공하는 우편번호 데이터 중 주소
URLhttps://www.data.go.kr/data/15070441/fileData.do

Alerts

Dataset has 486 (4.9%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 17:27:38.684495
Analysis finished2023-12-12 17:27:39.541710
Duration0.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

우편번호
Real number (ℝ)

Distinct8182
Distinct (%)81.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean464227.46
Minimum100012
Maximum799822
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:27:39.651794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100012
5-th percentile133777.55
Q1314882
median480081
Q3619735.75
95-th percentile750916
Maximum799822
Range699810
Interquartile range (IQR)304853.75

Descriptive statistics

Standard deviation200375.88
Coefficient of variation (CV)0.43163298
Kurtosis-1.037123
Mean464227.46
Median Absolute Deviation (MAD)148861
Skewness-0.30735118
Sum4.6422746 × 109
Variance4.0150493 × 1010
MonotonicityNot monotonic
2023-12-13T02:27:39.869862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
601820 10
 
0.1%
138873 9
 
0.1%
706853 7
 
0.1%
252829 7
 
0.1%
705833 7
 
0.1%
601814 7
 
0.1%
482839 6
 
0.1%
209819 6
 
0.1%
469853 6
 
0.1%
719811 6
 
0.1%
Other values (8172) 9929
99.3%
ValueCountFrequency (%)
100012 1
< 0.1%
100070 1
< 0.1%
100080 1
< 0.1%
100091 1
< 0.1%
100100 1
< 0.1%
100102 1
< 0.1%
100110 1
< 0.1%
100141 1
< 0.1%
100192 1
< 0.1%
100193 1
< 0.1%
ValueCountFrequency (%)
799822 2
< 0.1%
799821 2
< 0.1%
799820 1
 
< 0.1%
799813 1
 
< 0.1%
799811 1
 
< 0.1%
799810 1
 
< 0.1%
799800 1
 
< 0.1%
791948 2
< 0.1%
791945 3
< 0.1%
791944 1
 
< 0.1%

주소
Text

Distinct9514
Distinct (%)95.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T02:27:40.335858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length46
Median length41
Mean length20.2759
Min length10

Characters and Unicode

Total characters202759
Distinct characters601
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9028 ?
Unique (%)90.3%

Sample

1st row경상남도 창원시 진해구 회현동
2nd row경기도 성남시 중원구 상대원3동 1979~2954
3rd row서울특별시 은평구 대조동 삼성아파트
4th row경상북도 구미시 도량동 귀빈맨션 (101~103동)
5th row부산광역시 금정구 구서1동 420~440
ValueCountFrequency (%)
경기도 1722
 
4.1%
서울특별시 1598
 
3.8%
경상북도 960
 
2.3%
전라남도 753
 
1.8%
부산광역시 724
 
1.7%
경상남도 700
 
1.7%
강원도 643
 
1.5%
전라북도 608
 
1.5%
대구광역시 570
 
1.4%
충청남도 439
 
1.1%
Other values (11021) 32850
79.0%
2023-12-13T02:27:41.078510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
31604
 
15.6%
8696
 
4.3%
8105
 
4.0%
1 7332
 
3.6%
6776
 
3.3%
6049
 
3.0%
0 3992
 
2.0%
2 3746
 
1.8%
3725
 
1.8%
3655
 
1.8%
Other values (591) 119079
58.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 135939
67.0%
Space Separator 31604
 
15.6%
Decimal Number 28396
 
14.0%
Math Symbol 3650
 
1.8%
Close Punctuation 1135
 
0.6%
Open Punctuation 1135
 
0.6%
Dash Punctuation 572
 
0.3%
Uppercase Letter 234
 
0.1%
Other Punctuation 61
 
< 0.1%
Lowercase Letter 31
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8696
 
6.4%
8105
 
6.0%
6776
 
5.0%
6049
 
4.4%
3725
 
2.7%
3655
 
2.7%
3228
 
2.4%
3198
 
2.4%
2995
 
2.2%
2981
 
2.2%
Other values (548) 86531
63.7%
Uppercase Letter
ValueCountFrequency (%)
K 50
21.4%
S 35
15.0%
T 33
14.1%
C 16
 
6.8%
L 15
 
6.4%
I 14
 
6.0%
B 14
 
6.0%
G 12
 
5.1%
A 11
 
4.7%
N 6
 
2.6%
Other values (11) 28
12.0%
Decimal Number
ValueCountFrequency (%)
1 7332
25.8%
0 3992
14.1%
2 3746
13.2%
3 2714
 
9.6%
4 2037
 
7.2%
5 2028
 
7.1%
6 1813
 
6.4%
9 1775
 
6.3%
7 1558
 
5.5%
8 1401
 
4.9%
Other Punctuation
ValueCountFrequency (%)
. 47
77.0%
, 10
 
16.4%
& 4
 
6.6%
Lowercase Letter
ValueCountFrequency (%)
e 29
93.5%
d 1
 
3.2%
a 1
 
3.2%
Space Separator
ValueCountFrequency (%)
31604
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3650
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1135
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1135
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 572
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 135935
67.0%
Common 66553
32.8%
Latin 267
 
0.1%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8696
 
6.4%
8105
 
6.0%
6776
 
5.0%
6049
 
4.4%
3725
 
2.7%
3655
 
2.7%
3228
 
2.4%
3198
 
2.4%
2995
 
2.2%
2981
 
2.2%
Other values (545) 86527
63.7%
Latin
ValueCountFrequency (%)
K 50
18.7%
S 35
13.1%
T 33
12.4%
e 29
10.9%
C 16
 
6.0%
L 15
 
5.6%
I 14
 
5.2%
B 14
 
5.2%
G 12
 
4.5%
A 11
 
4.1%
Other values (15) 38
14.2%
Common
ValueCountFrequency (%)
31604
47.5%
1 7332
 
11.0%
0 3992
 
6.0%
2 3746
 
5.6%
~ 3650
 
5.5%
3 2714
 
4.1%
4 2037
 
3.1%
5 2028
 
3.0%
6 1813
 
2.7%
9 1775
 
2.7%
Other values (8) 5862
 
8.8%
Han
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 135935
67.0%
ASCII 66818
33.0%
CJK 4
 
< 0.1%
Number Forms 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
31604
47.3%
1 7332
 
11.0%
0 3992
 
6.0%
2 3746
 
5.6%
~ 3650
 
5.5%
3 2714
 
4.1%
4 2037
 
3.0%
5 2028
 
3.0%
6 1813
 
2.7%
9 1775
 
2.7%
Other values (32) 6127
 
9.2%
Hangul
ValueCountFrequency (%)
8696
 
6.4%
8105
 
6.0%
6776
 
5.0%
6049
 
4.4%
3725
 
2.7%
3655
 
2.7%
3228
 
2.4%
3198
 
2.4%
2995
 
2.2%
2981
 
2.2%
Other values (545) 86527
63.7%
Number Forms
ValueCountFrequency (%)
2
100.0%
CJK
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%

Interactions

2023-12-13T02:27:39.208694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-13T02:27:39.404814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:27:39.494958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

우편번호주소
19644645130경상남도 창원시 진해구 회현동
62184462814경기도 성남시 중원구 상대원3동 1979~2954
38991122768서울특별시 은평구 대조동 삼성아파트
19050730758경상북도 구미시 도량동 귀빈맨션 (101~103동)
22568609853부산광역시 금정구 구서1동 420~440
72013618820부산광역시 강서구 송정동 1700~1799
49370423823경기도 광명시 소하2동 294~420
81255606041부산광역시 영도구 영선동1가
65047799822경상북도 울릉군 북면 천부4리
75085501082광주광역시 동구 계림2동
우편번호주소
74020704941대구광역시 달서구 이곡2동 1270~1335
59166425865경기도 안산시 단원구 초지동 605~610
58800448525경기도 용인시 수지구 상현1동 상현마을현대2차아이파크아파트 (201~207동)
56405469863경기도 여주군 강천면 부평리
1695269813강원도 철원군 동송읍 양지리
38352110609서울특별시 종로구 광화문우체국사서함 900~999
45899550766전라남도 여수시 문수동 주공아파트
91678122805서울특별시 은평구 갈현1동 292~301
96904530768전라남도 목포시 부흥동 우미오션빌아파트
69538668852경상남도 남해군 창선면 광천리

Duplicate rows

Most frequently occurring

우편번호주소# duplicates
0100230서울특별시 중구 수표동2
1100282서울특별시 중구 인현동2가2
2100683서울특별시 중구 서울중앙우체국사서함 8300~83992
3100703서울특별시 중구 남대문로2가 국민은행본점빌딩2
4110150서울특별시 종로구 중학동2
5110843서울특별시 종로구 창신2동 688~7002
6110873서울특별시 종로구 내수동 경희궁의아침4단지2
7120160서울특별시 서대문구 대신동2
8121756서울특별시 마포구 동교동 상진빌딩2
9121871서울특별시 마포구 염리동 20~502