Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory322.3 KiB
Average record size in memory33.0 B

Variable types

Categorical1
Numeric1
Text1

Dataset

Description건강보험심사평가원 데이터베이스에 구축된 행정구역 코드마스터 정보
Author건강보험심사평가원
URLhttps://www.data.go.kr/data/15067469/fileData.do

Alerts

코드 is highly overall correlated with 코드구분High correlation
코드구분 is highly overall correlated with 코드High correlation
코드구분 is highly imbalanced (97.3%)Imbalance
코드명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 05:19:01.360898
Analysis finished2023-12-12 05:19:02.169090
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

코드구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
우편번호
9955 
지역(시군구)코드
 
44
지역(시도)코드
 
1

Length

Max length9
Median length4
Mean length4.0224
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row우편번호
2nd row우편번호
3rd row우편번호
4th row우편번호
5th row우편번호

Common Values

ValueCountFrequency (%)
우편번호 9955
99.6%
지역(시군구)코드 44
 
0.4%
지역(시도)코드 1
 
< 0.1%

Length

2023-12-12T14:19:02.232950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:19:02.333643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
우편번호 9955
99.6%
지역(시군구)코드 44
 
0.4%
지역(시도)코드 1
 
< 0.1%

코드
Real number (ℝ)

HIGH CORRELATION 

Distinct8483
Distinct (%)84.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean459590.72
Minimum31
Maximum799812
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:19:02.439277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum31
5-th percentile134671.9
Q1325852.5
median467852.5
Q3616837.25
95-th percentile750892.45
Maximum799812
Range799781
Interquartile range (IQR)290984.75

Descriptive statistics

Standard deviation197137.3
Coefficient of variation (CV)0.42894099
Kurtosis-1.0168527
Mean459590.72
Median Absolute Deviation (MAD)146951
Skewness-0.24040818
Sum4.5959072 × 109
Variance3.8863114 × 1010
MonotonicityNot monotonic
2023-12-12T14:19:02.584047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
138873 10
 
0.1%
482869 9
 
0.1%
209819 8
 
0.1%
430835 8
 
0.1%
601809 7
 
0.1%
701819 7
 
0.1%
250889 7
 
0.1%
487868 6
 
0.1%
482879 6
 
0.1%
138820 6
 
0.1%
Other values (8473) 9926
99.3%
ValueCountFrequency (%)
31 1
< 0.1%
100051 1
< 0.1%
100070 1
< 0.1%
100101 1
< 0.1%
100130 1
< 0.1%
100141 1
< 0.1%
100151 1
< 0.1%
100260 1
< 0.1%
100360 1
< 0.1%
100372 1
< 0.1%
ValueCountFrequency (%)
799812 1
< 0.1%
799811 2
< 0.1%
799810 1
< 0.1%
799805 1
< 0.1%
799803 1
< 0.1%
799801 1
< 0.1%
799800 1
< 0.1%
791948 1
< 0.1%
791947 1
< 0.1%
791945 2
< 0.1%

코드명
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T14:19:02.937073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length47
Median length42
Mean length18.2183
Min length2

Characters and Unicode

Total characters182183
Distinct characters596
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row부산 남구 용호1동 487
2nd row부산 연제구 연산9동 135~150
3rd row경북 영주시 휴천3동 산12~159
4th row경북 경주시 산내면 내일2리
5th row충북 괴산군 괴산읍 서부리
ValueCountFrequency (%)
경기 1597
 
3.8%
서울 1509
 
3.6%
경북 929
 
2.2%
전남 725
 
1.7%
경남 697
 
1.7%
부산 657
 
1.6%
충남 589
 
1.4%
강원 589
 
1.4%
전북 519
 
1.2%
대구 510
 
1.2%
Other values (11473) 33347
80.0%
2023-12-12T14:19:03.537472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
31707
 
17.4%
8577
 
4.7%
1 7120
 
3.9%
5920
 
3.2%
4463
 
2.4%
3857
 
2.1%
0 3789
 
2.1%
2 3766
 
2.1%
3611
 
2.0%
~ 3586
 
2.0%
Other values (586) 105787
58.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 116139
63.7%
Space Separator 31707
 
17.4%
Decimal Number 27592
 
15.1%
Math Symbol 3586
 
2.0%
Open Punctuation 1171
 
0.6%
Close Punctuation 1171
 
0.6%
Dash Punctuation 511
 
0.3%
Uppercase Letter 230
 
0.1%
Other Punctuation 50
 
< 0.1%
Lowercase Letter 24
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8577
 
7.4%
5920
 
5.1%
4463
 
3.8%
3857
 
3.3%
3611
 
3.1%
3384
 
2.9%
3276
 
2.8%
2979
 
2.6%
2969
 
2.6%
2618
 
2.3%
Other values (541) 74485
64.1%
Uppercase Letter
ValueCountFrequency (%)
K 32
13.9%
S 31
13.5%
T 30
13.0%
A 22
9.6%
L 15
 
6.5%
G 13
 
5.7%
C 13
 
5.7%
I 12
 
5.2%
B 12
 
5.2%
P 8
 
3.5%
Other values (12) 42
18.3%
Decimal Number
ValueCountFrequency (%)
1 7120
25.8%
0 3789
13.7%
2 3766
13.6%
3 2617
 
9.5%
4 2051
 
7.4%
5 1962
 
7.1%
6 1736
 
6.3%
9 1676
 
6.1%
7 1534
 
5.6%
8 1341
 
4.9%
Other Punctuation
ValueCountFrequency (%)
. 39
78.0%
, 9
 
18.0%
& 2
 
4.0%
Lowercase Letter
ValueCountFrequency (%)
e 22
91.7%
w 1
 
4.2%
i 1
 
4.2%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
31707
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3586
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1171
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1171
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 511
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 116135
63.7%
Common 65788
36.1%
Latin 256
 
0.1%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8577
 
7.4%
5920
 
5.1%
4463
 
3.8%
3857
 
3.3%
3611
 
3.1%
3384
 
2.9%
3276
 
2.8%
2979
 
2.6%
2969
 
2.6%
2618
 
2.3%
Other values (537) 74481
64.1%
Latin
ValueCountFrequency (%)
K 32
12.5%
S 31
12.1%
T 30
11.7%
e 22
8.6%
A 22
8.6%
L 15
 
5.9%
G 13
 
5.1%
C 13
 
5.1%
I 12
 
4.7%
B 12
 
4.7%
Other values (17) 54
21.1%
Common
ValueCountFrequency (%)
31707
48.2%
1 7120
 
10.8%
0 3789
 
5.8%
2 3766
 
5.7%
~ 3586
 
5.5%
3 2617
 
4.0%
4 2051
 
3.1%
5 1962
 
3.0%
6 1736
 
2.6%
9 1676
 
2.5%
Other values (8) 5778
 
8.8%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 116135
63.7%
ASCII 66042
36.3%
CJK 4
 
< 0.1%
Number Forms 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
31707
48.0%
1 7120
 
10.8%
0 3789
 
5.7%
2 3766
 
5.7%
~ 3586
 
5.4%
3 2617
 
4.0%
4 2051
 
3.1%
5 1962
 
3.0%
6 1736
 
2.6%
9 1676
 
2.5%
Other values (33) 6032
 
9.1%
Hangul
ValueCountFrequency (%)
8577
 
7.4%
5920
 
5.1%
4463
 
3.8%
3857
 
3.3%
3611
 
3.1%
3384
 
2.9%
3276
 
2.8%
2979
 
2.6%
2969
 
2.6%
2618
 
2.3%
Other values (537) 74481
64.1%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%

Interactions

2023-12-12T14:19:01.905074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:19:03.631437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
코드구분코드
코드구분1.0000.816
코드0.8161.000
2023-12-12T14:19:03.714138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
코드코드구분
코드1.0000.714
코드구분0.7141.000

Missing values

2023-12-12T14:19:02.055746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:19:02.130878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

코드구분코드코드명
38335우편번호608837부산 남구 용호1동 487
38741우편번호611812부산 연제구 연산9동 135~150
50626우편번호750916경북 영주시 휴천3동 산12~159
52507우편번호780882경북 경주시 산내면 내일2리
16576우편번호367802충북 괴산군 괴산읍 서부리
26336우편번호467831경기 이천시 백사면 신대리
42665우편번호656871경남 거제시 둔덕면 거림리
21243우편번호422801경기 부천시 소사구 괴안동 80~89
35864우편번호579932전북 부안군 백산면 덕신리
39299우편번호613809부산 수영구 광안4동 731~769
코드구분코드코드명
26003우편번호464808경기 광주시 태전동 1~275
3582우편번호135809서울 강남구 개포4동 우성6차아파트 (1~8동)
32579우편번호539844전남 진도군 임회면 죽림리
2047우편번호130829서울 동대문구 이문1동 257
40815우편번호626860경남 양산시 하북면
2614우편번호132822서울 도봉구 도봉2동 627~641
1253우편번호120861서울 서대문구 홍제1동 457~458
49868우편번호740979경북 김천시 신음동 1272~1284
50508우편번호750871경북 영주시 안정면 용산리
35683우편번호576933전북 김제시 황산면 진흥리