Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells2172
Missing cells (%)2.7%
Duplicate rows1353
Duplicate rows (%)13.5%
Total size in memory722.7 KiB
Average record size in memory74.0 B

Variable types

DateTime1
Categorical5
Text1
Numeric1

Dataset

Description황열 및 콜레라 예방접종 정보 제공 (접종일, 접종종류, 접종구분, 여행지, 접종금액, 증명서발급금액, 관할검역소, 외부접종기관)
Author질병관리청
URLhttps://www.data.go.kr/data/3068230/fileData.do

Alerts

Dataset has 1353 (13.5%) duplicate rowsDuplicates
관할검역소 is highly overall correlated with 외부접종기관High correlation
외부접종기관 is highly overall correlated with 관할검역소High correlation
접종금액 is highly overall correlated with 접종구분High correlation
접종종류 is highly overall correlated with 접종구분 and 1 other fieldsHigh correlation
접종구분 is highly overall correlated with 접종금액 and 1 other fieldsHigh correlation
증명서발급금액 is highly overall correlated with 접종종류High correlation
접종종류 is highly imbalanced (73.0%)Imbalance
접종구분 is highly imbalanced (64.9%)Imbalance
증명서발급금액 is highly imbalanced (90.8%)Imbalance
여행지 has 2172 (21.7%) missing valuesMissing
접종금액 has 2527 (25.3%) zerosZeros

Reproduction

Analysis started2023-12-12 10:31:56.838456
Analysis finished2023-12-12 10:31:58.252620
Duration1.41 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct294
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2022-01-03 00:00:00
Maximum2022-12-31 00:00:00
2023-12-12T19:31:58.682151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T19:31:58.884315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

접종종류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
황열
9537 
콜레라
 
463

Length

Max length3
Median length2
Mean length2.0463
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row황열
2nd row황열
3rd row황열
4th row황열
5th row황열

Common Values

ValueCountFrequency (%)
황열 9537
95.4%
콜레라 463
 
4.6%

Length

2023-12-12T19:31:59.051693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:31:59.211792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
황열 9537
95.4%
콜레라 463
 
4.6%

접종구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
신규
8058 
재교부
1733 
재접종
 
190
면제증명
 
17
확인날인
 
2

Length

Max length4
Median length2
Mean length2.1961
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row신규
2nd row신규
3rd row신규
4th row신규
5th row신규

Common Values

ValueCountFrequency (%)
신규 8058
80.6%
재교부 1733
 
17.3%
재접종 190
 
1.9%
면제증명 17
 
0.2%
확인날인 2
 
< 0.1%

Length

2023-12-12T19:31:59.322649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:31:59.455317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
신규 8058
80.6%
재교부 1733
 
17.3%
재접종 190
 
1.9%
면제증명 17
 
0.2%
확인날인 2
 
< 0.1%

여행지
Text

MISSING 

Distinct1072
Distinct (%)13.7%
Missing2172
Missing (%)21.7%
Memory size156.2 KiB
2023-12-12T19:31:59.882019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length56
Mean length5.3431272
Min length1

Characters and Unicode

Total characters41826
Distinct characters500
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique810 ?
Unique (%)10.3%

Sample

1st row남미
2nd row탄자니아
3rd row오만/싱가포르
4th row아프리카
5th row말라위
ValueCountFrequency (%)
브라질 611
 
6.0%
탄자니아 503
 
4.9%
케냐 451
 
4.4%
미정 385
 
3.8%
볼리비아 353
 
3.5%
우간다 285
 
2.8%
선원실습 266
 
2.6%
남미 219
 
2.1%
에티오피아 219
 
2.1%
서울특별시 205
 
2.0%
Other values (1716) 6714
65.8%
2023-12-12T19:32:00.570616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2553
 
6.1%
2395
 
5.7%
1212
 
2.9%
, 1165
 
2.8%
948
 
2.3%
880
 
2.1%
812
 
1.9%
733
 
1.8%
711
 
1.7%
687
 
1.6%
Other values (490) 29730
71.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34882
83.4%
Space Separator 2395
 
5.7%
Decimal Number 1830
 
4.4%
Other Punctuation 1318
 
3.2%
Uppercase Letter 552
 
1.3%
Close Punctuation 346
 
0.8%
Open Punctuation 346
 
0.8%
Dash Punctuation 129
 
0.3%
Lowercase Letter 26
 
0.1%
Modifier Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2553
 
7.3%
1212
 
3.5%
948
 
2.7%
880
 
2.5%
812
 
2.3%
733
 
2.1%
711
 
2.0%
687
 
2.0%
657
 
1.9%
636
 
1.8%
Other values (437) 25053
71.8%
Uppercase Letter
ValueCountFrequency (%)
A 165
29.9%
U 150
27.2%
E 148
26.8%
R 13
 
2.4%
D 11
 
2.0%
T 7
 
1.3%
I 6
 
1.1%
H 6
 
1.1%
N 6
 
1.1%
S 6
 
1.1%
Other values (11) 34
 
6.2%
Lowercase Letter
ValueCountFrequency (%)
i 5
19.2%
a 4
15.4%
e 3
11.5%
h 2
 
7.7%
n 2
 
7.7%
p 2
 
7.7%
u 2
 
7.7%
s 1
 
3.8%
o 1
 
3.8%
t 1
 
3.8%
Other values (3) 3
11.5%
Decimal Number
ValueCountFrequency (%)
1 424
23.2%
2 284
15.5%
0 241
13.2%
3 209
11.4%
4 139
 
7.6%
5 128
 
7.0%
7 120
 
6.6%
6 112
 
6.1%
8 88
 
4.8%
9 85
 
4.6%
Other Punctuation
ValueCountFrequency (%)
, 1165
88.4%
/ 120
 
9.1%
. 33
 
2.5%
Space Separator
ValueCountFrequency (%)
2395
100.0%
Close Punctuation
ValueCountFrequency (%)
) 346
100.0%
Open Punctuation
ValueCountFrequency (%)
( 346
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 129
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34882
83.4%
Common 6365
 
15.2%
Latin 579
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2553
 
7.3%
1212
 
3.5%
948
 
2.7%
880
 
2.5%
812
 
2.3%
733
 
2.1%
711
 
2.0%
687
 
2.0%
657
 
1.9%
636
 
1.8%
Other values (437) 25053
71.8%
Latin
ValueCountFrequency (%)
A 165
28.5%
U 150
25.9%
E 148
25.6%
R 13
 
2.2%
D 11
 
1.9%
T 7
 
1.2%
I 6
 
1.0%
H 6
 
1.0%
N 6
 
1.0%
S 6
 
1.0%
Other values (25) 61
 
10.5%
Common
ValueCountFrequency (%)
2395
37.6%
, 1165
18.3%
1 424
 
6.7%
) 346
 
5.4%
( 346
 
5.4%
2 284
 
4.5%
0 241
 
3.8%
3 209
 
3.3%
4 139
 
2.2%
- 129
 
2.0%
Other values (8) 687
 
10.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34873
83.4%
ASCII 6943
 
16.6%
Compat Jamo 9
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2553
 
7.3%
1212
 
3.5%
948
 
2.7%
880
 
2.5%
812
 
2.3%
733
 
2.1%
711
 
2.0%
687
 
2.0%
657
 
1.9%
636
 
1.8%
Other values (436) 25044
71.8%
ASCII
ValueCountFrequency (%)
2395
34.5%
, 1165
16.8%
1 424
 
6.1%
) 346
 
5.0%
( 346
 
5.0%
2 284
 
4.1%
0 241
 
3.5%
3 209
 
3.0%
A 165
 
2.4%
U 150
 
2.2%
Other values (42) 1218
17.5%
Compat Jamo
ValueCountFrequency (%)
9
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

접종금액
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25935.65
Minimum0
Maximum39000
Zeros2527
Zeros (%)25.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T19:32:00.760149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median34700
Q334700
95-th percentile34700
Maximum39000
Range39000
Interquartile range (IQR)34700

Descriptive statistics

Standard deviation15092.762
Coefficient of variation (CV)0.58193114
Kurtosis-0.7078314
Mean25935.65
Median Absolute Deviation (MAD)0
Skewness-1.1358108
Sum2.593565 × 108
Variance2.2779148 × 108
MonotonicityNot monotonic
2023-12-12T19:32:00.908842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
34700 7416
74.2%
0 2527
 
25.3%
39000 36
 
0.4%
31560 14
 
0.1%
1000 2
 
< 0.1%
34900 2
 
< 0.1%
31460 1
 
< 0.1%
35700 1
 
< 0.1%
36500 1
 
< 0.1%
ValueCountFrequency (%)
0 2527
 
25.3%
1000 2
 
< 0.1%
31460 1
 
< 0.1%
31560 14
 
0.1%
34700 7416
74.2%
34900 2
 
< 0.1%
35700 1
 
< 0.1%
36500 1
 
< 0.1%
39000 36
 
0.4%
ValueCountFrequency (%)
39000 36
 
0.4%
36500 1
 
< 0.1%
35700 1
 
< 0.1%
34900 2
 
< 0.1%
34700 7416
74.2%
31560 14
 
0.1%
31460 1
 
< 0.1%
1000 2
 
< 0.1%
0 2527
 
25.3%

증명서발급금액
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1000
9730 
0
 
263
34700
 
6
10000
 
1

Length

Max length5
Median length4
Mean length3.9218
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row1000
2nd row1000
3rd row1000
4th row1000
5th row1000

Common Values

ValueCountFrequency (%)
1000 9730
97.3%
0 263
 
2.6%
34700 6
 
0.1%
10000 1
 
< 0.1%

Length

2023-12-12T19:32:01.124094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:32:01.297297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1000 9730
97.3%
0 263
 
2.6%
34700 6
 
0.1%
10000 1
 
< 0.1%

관할검역소
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국립인천공항검역소
3318 
국립인천검역소
2098 
국립부산검역소
1208 
국립목포검역소
693 
국립마산검역소
677 
Other values (8)
2006 

Length

Max length9
Median length7
Mean length7.6674
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국립군산검역소
2nd row국립인천공항검역소
3rd row국립군산검역소
4th row국립마산검역소
5th row국립인천공항검역소

Common Values

ValueCountFrequency (%)
국립인천공항검역소 3318
33.2%
국립인천검역소 2098
21.0%
국립부산검역소 1208
 
12.1%
국립목포검역소 693
 
6.9%
국립마산검역소 677
 
6.8%
국립군산검역소 636
 
6.4%
국립포항검역소 480
 
4.8%
국립울산검역소 272
 
2.7%
국립평택검역소 241
 
2.4%
국립동해검역소 144
 
1.4%
Other values (3) 233
 
2.3%

Length

2023-12-12T19:32:01.468321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국립인천공항검역소 3318
33.2%
국립인천검역소 2098
21.0%
국립부산검역소 1208
 
12.1%
국립목포검역소 693
 
6.9%
국립마산검역소 677
 
6.8%
국립군산검역소 636
 
6.4%
국립포항검역소 480
 
4.8%
국립울산검역소 272
 
2.7%
국립평택검역소 241
 
2.4%
국립동해검역소 144
 
1.4%
Other values (3) 233
 
2.3%

외부접종기관
Categorical

HIGH CORRELATION 

Distinct45
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
1162 
국립중앙의료원
1070 
분당서울대학교병원
 
559
순천향대학교 서울병원
 
537
인천의료원
 
518
Other values (40)
6154 

Length

Max length15
Median length11
Mean length6.8982
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row충남대학교병원
2nd row국립중앙의료원
3rd row충남대학교병원
4th row해군해양의료원
5th row순천향대학교 서울병원

Common Values

ValueCountFrequency (%)
<NA> 1162
 
11.6%
국립중앙의료원 1070
 
10.7%
분당서울대학교병원 559
 
5.6%
순천향대학교 서울병원 537
 
5.4%
인천의료원 518
 
5.2%
대구의료원 391
 
3.9%
해군해양의료원 374
 
3.7%
목포시의료원 365
 
3.6%
구포성심병원 348
 
3.5%
가톨릭서울성모병원 316
 
3.2%
Other values (35) 4360
43.6%

Length

2023-12-12T19:32:01.656703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 1162
 
10.1%
국립중앙의료원 1070
 
9.3%
분당서울대학교병원 559
 
4.9%
순천향대학교 537
 
4.7%
서울병원 537
 
4.7%
인천의료원 518
 
4.5%
대구의료원 391
 
3.4%
해군해양의료원 374
 
3.3%
목포시의료원 365
 
3.2%
구포성심병원 348
 
3.0%
Other values (44) 5597
48.8%

Interactions

2023-12-12T19:31:57.818717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:32:01.775869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
접종종류접종구분접종금액증명서발급금액관할검역소외부접종기관
접종종류1.0000.5230.2680.9170.2360.504
접종구분0.5231.0000.6310.5410.3720.500
접종금액0.2680.6311.0000.2290.5310.720
증명서발급금액0.9170.5410.2291.0000.2520.502
관할검역소0.2360.3720.5310.2521.0001.000
외부접종기관0.5040.5000.7200.5021.0001.000
2023-12-12T19:32:01.903839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
관할검역소접종종류증명서발급금액외부접종기관접종구분
관할검역소1.0000.2200.1490.9970.212
접종종류0.2201.0000.7400.4020.634
증명서발급금액0.1490.7401.0000.2600.468
외부접종기관0.9970.4020.2601.0000.255
접종구분0.2120.6340.4680.2551.000
2023-12-12T19:32:02.026200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
접종금액접종종류접종구분증명서발급금액관할검역소외부접종기관
접종금액1.0000.4350.5920.2180.3540.484
접종종류0.4351.0000.6340.7400.2200.402
접종구분0.5920.6341.0000.4680.2120.255
증명서발급금액0.2180.7400.4681.0000.1490.260
관할검역소0.3540.2200.2120.1491.0000.997
외부접종기관0.4840.4020.2550.2600.9971.000

Missing values

2023-12-12T19:31:57.983573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:31:58.164201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

접종일접종종류접종구분여행지접종금액증명서발급금액관할검역소외부접종기관
162332022-12-15황열신규남미347001000국립군산검역소충남대학교병원
162482022-12-15황열신규<NA>347001000국립인천공항검역소국립중앙의료원
150732022-11-25황열신규탄자니아347001000국립군산검역소충남대학교병원
135492022-11-07황열신규오만/싱가포르01000국립마산검역소해군해양의료원
89292022-08-08황열신규아프리카347001000국립인천공항검역소순천향대학교 서울병원
31472022-03-28황열재교부<NA>01000국립부산검역소<NA>
137862022-11-08황열재교부말라위01000국립인천공항검역소강동경희대병원
26542022-03-10황열신규이집트347001000국립마산검역소의료법인대우의료재단 대우병원
169682022-12-28콜레라재접종<NA>00국립평택검역소세종충남대학교병원
76362022-07-14황열신규브라질347001000국립목포검역소전남대병원
접종일접종종류접종구분여행지접종금액증명서발급금액관할검역소외부접종기관
2622022-01-14황열신규케냐347001000국립인천검역소분당서울대학교병원
17822022-02-24황열신규미정347001000국립울산검역소혜명심의료재단 울산병원
1122022-01-07황열신규세네갈347001000국립인천검역소분당서울대학교병원
19772022-02-25황열신규미정347001000국립부산검역소구포성심병원
90702022-08-10황열신규모잠비크347001000국립인천검역소분당서울대학교병원
114142022-09-30황열신규선원실습347001000국립목포검역소목포시의료원
122662022-10-14황열신규애디오피아347001000국립여수검역소여수전남병원
156342022-12-06황열재교부<NA>01000국립마산검역소<NA>
155092022-12-05황열재교부<NA>01000국립인천공항검역소국립중앙의료원
50562022-05-16황열신규카메룬347001000국립인천공항검역소가톨릭서울성모병원

Duplicate rows

Most frequently occurring

접종일접종종류접종구분여행지접종금액증명서발급금액관할검역소외부접종기관# duplicates
1672022-02-28황열신규UAE, 오만01000국립마산검역소해군해양의료원139
5832022-07-18황열신규소말리아01000국립마산검역소해군해양의료원118
10422022-11-07황열신규오만/싱가포르01000국립마산검역소해군해양의료원106
1602022-02-25황열신규남수단347001000국립인천검역소인천의료원81
10542022-11-09황열신규남수단공화국347001000국립인천검역소인천의료원49
1282022-02-21황열신규남수단347001000국립인천검역소인천의료원47
10622022-11-10황열신규남수단공화국347001000국립인천검역소인천의료원46
10682022-11-11황열신규남수단공화국347001000국립인천검역소인천의료원42
9002022-10-06황열신규선원실습347001000국립목포검역소목포시의료원30
9132022-10-11황열신규선원실습347001000국립목포검역소목포시의료원30