Overview

Dataset statistics

Number of variables9
Number of observations744
Missing cells429
Missing cells (%)6.4%
Duplicate rows3
Duplicate rows (%)0.4%
Total size in memory53.9 KiB
Average record size in memory74.2 B

Variable types

Numeric2
Categorical4
DateTime2
Text1

Alerts

발생지역명1 has constant value ""Constant
Dataset has 3 (0.4%) duplicate rowsDuplicates
통계환자수 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 통계환자수High correlation
발생일자 has 134 (18.0%) missing valuesMissing
최초신고일자 has 295 (39.7%) missing valuesMissing
통계환자수 is highly skewed (γ1 = 21.23225029)Skewed

Reproduction

Analysis started2023-12-10 22:03:39.279465
Analysis finished2023-12-10 22:03:40.118144
Duration0.84 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

집계년도
Real number (ℝ)

Distinct10
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.8253
Minimum2012
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.7 KiB
2023-12-11T07:03:40.160155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2012
5-th percentile2012
Q12014
median2016
Q32018
95-th percentile2020.85
Maximum2021
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.6027434
Coefficient of variation (CV)0.0012911553
Kurtosis-0.88636896
Mean2015.8253
Median Absolute Deviation (MAD)2
Skewness0.27358194
Sum1499774
Variance6.7742731
MonotonicityDecreasing
2023-12-11T07:03:40.241172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2015 102
13.7%
2014 100
13.4%
2016 97
13.0%
2012 90
12.1%
2018 77
10.3%
2017 73
9.8%
2013 71
9.5%
2019 56
7.5%
2020 40
 
5.4%
2021 38
 
5.1%
ValueCountFrequency (%)
2012 90
12.1%
2013 71
9.5%
2014 100
13.4%
2015 102
13.7%
2016 97
13.0%
2017 73
9.8%
2018 77
10.3%
2019 56
7.5%
2020 40
 
5.4%
2021 38
 
5.1%
ValueCountFrequency (%)
2021 38
 
5.1%
2020 40
 
5.4%
2019 56
7.5%
2018 77
10.3%
2017 73
9.8%
2016 97
13.0%
2015 102
13.7%
2014 100
13.4%
2013 71
9.5%
2012 90
12.1%

시군명
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
용인시
83 
수원시
80 
부천시
60 
화성시
57 
안산시
50 
Other values (26)
414 

Length

Max length4
Median length3
Mean length3.0524194
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row고양시
2nd row고양시
3rd row고양시
4th row광명시
5th row광명시

Common Values

ValueCountFrequency (%)
용인시 83
 
11.2%
수원시 80
 
10.8%
부천시 60
 
8.1%
화성시 57
 
7.7%
안산시 50
 
6.7%
김포시 39
 
5.2%
평택시 37
 
5.0%
성남시 36
 
4.8%
고양시 33
 
4.4%
남양주시 27
 
3.6%
Other values (21) 242
32.5%

Length

2023-12-11T07:03:40.334901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
용인시 83
 
11.2%
수원시 80
 
10.8%
부천시 60
 
8.1%
화성시 57
 
7.7%
안산시 50
 
6.7%
김포시 39
 
5.2%
평택시 37
 
5.0%
성남시 36
 
4.8%
고양시 33
 
4.4%
남양주시 27
 
3.6%
Other values (21) 242
32.5%

발생일자
Date

MISSING 

Distinct506
Distinct (%)83.0%
Missing134
Missing (%)18.0%
Memory size5.9 KiB
Minimum2012-01-11 00:00:00
Maximum2018-12-10 00:00:00
2023-12-11T07:03:40.432656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:03:40.532416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

최초신고일자
Date

MISSING 

Distinct360
Distinct (%)80.2%
Missing295
Missing (%)39.7%
Memory size5.9 KiB
Minimum2014-01-03 00:00:00
Maximum2018-12-11 00:00:00
2023-12-11T07:03:40.646494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:03:40.772718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

발생지역명1
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
경기
744 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기
2nd row경기
3rd row경기
4th row경기
5th row경기

Common Values

ValueCountFrequency (%)
경기 744
100.0%

Length

2023-12-11T07:03:40.885148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T07:03:40.978908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
경기 744
100.0%
Distinct77
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
2023-12-11T07:03:41.102184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length5
Mean length5.2634409
Min length2

Characters and Unicode

Total characters3916
Distinct characters67
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)1.3%

Sample

1st row경기 고양
2nd row경기 고양
3rd row경기 고양
4th row경기 광명
5th row경기 광명
ValueCountFrequency (%)
경기 654
42.9%
용인 83
 
5.4%
수원 79
 
5.2%
부천 60
 
3.9%
화성 52
 
3.4%
안산 50
 
3.3%
김포 39
 
2.6%
성남 36
 
2.4%
고양 33
 
2.2%
평택 30
 
2.0%
Other values (47) 408
26.8%
2023-12-11T07:03:41.351847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
780
19.9%
664
17.0%
654
16.7%
109
 
2.8%
104
 
2.7%
103
 
2.6%
102
 
2.6%
102
 
2.6%
91
 
2.3%
90
 
2.3%
Other values (57) 1117
28.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3116
79.6%
Space Separator 780
 
19.9%
Close Punctuation 10
 
0.3%
Open Punctuation 10
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
664
21.3%
654
21.0%
109
 
3.5%
104
 
3.3%
103
 
3.3%
102
 
3.3%
102
 
3.3%
91
 
2.9%
90
 
2.9%
84
 
2.7%
Other values (54) 1013
32.5%
Space Separator
ValueCountFrequency (%)
780
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3116
79.6%
Common 800
 
20.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
664
21.3%
654
21.0%
109
 
3.5%
104
 
3.3%
103
 
3.3%
102
 
3.3%
102
 
3.3%
91
 
2.9%
90
 
2.9%
84
 
2.7%
Other values (54) 1013
32.5%
Common
ValueCountFrequency (%)
780
97.5%
) 10
 
1.2%
( 10
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3116
79.6%
ASCII 800
 
20.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
780
97.5%
) 10
 
1.2%
( 10
 
1.2%
Hangul
ValueCountFrequency (%)
664
21.3%
654
21.0%
109
 
3.5%
104
 
3.3%
103
 
3.3%
102
 
3.3%
102
 
3.3%
91
 
2.9%
90
 
2.9%
84
 
2.7%
Other values (54) 1013
32.5%
Distinct7
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
음식점
430 
기타
112 
학교 외 집단급식
75 
학교급식
59 
불명
 
40
Other values (2)
 
28

Length

Max length9
Median length3
Mean length3.4663978
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row음식점
2nd row음식점
3rd row기타
4th row기타
5th row음식점

Common Values

ValueCountFrequency (%)
음식점 430
57.8%
기타 112
 
15.1%
학교 외 집단급식 75
 
10.1%
학교급식 59
 
7.9%
불명 40
 
5.4%
가정집 18
 
2.4%
학교 10
 
1.3%

Length

2023-12-11T07:03:41.473443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T07:03:41.573216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
음식점 430
48.1%
기타 112
 
12.5%
학교 85
 
9.5%
75
 
8.4%
집단급식 75
 
8.4%
학교급식 59
 
6.6%
불명 40
 
4.5%
가정집 18
 
2.0%
Distinct19
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
불검출
189 
불명
149 
노로바이러스
141 
병원성대장균
68 
살모넬라
43 
Other values (14)
154 

Length

Max length15
Median length14
Mean length4.1518817
Min length2

Unique

Unique4 ?
Unique (%)0.5%

Sample

1st row살모넬라
2nd row살모넬라
3rd row리스테리아 모노사이토제네스
4th row노로바이러스
5th row살모넬라

Common Values

ValueCountFrequency (%)
불검출 189
25.4%
불명 149
20.0%
노로바이러스 141
19.0%
병원성대장균 68
 
9.1%
살모넬라 43
 
5.8%
퍼프린젠스 43
 
5.8%
원충 33
 
4.4%
캠필로박터제주니 30
 
4.0%
진행중 13
 
1.7%
장염비브리오 10
 
1.3%
Other values (9) 25
 
3.4%

Length

2023-12-11T07:03:41.690323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
불검출 189
25.3%
불명 149
19.9%
노로바이러스 142
19.0%
병원성대장균 68
 
9.1%
살모넬라 43
 
5.8%
퍼프린젠스 43
 
5.8%
원충 33
 
4.4%
캠필로박터제주니 30
 
4.0%
진행중 13
 
1.7%
장염비브리오 10
 
1.3%
Other values (11) 27
 
3.6%

통계환자수
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct93
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.719086
Minimum2
Maximum2975
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.7 KiB
2023-12-11T07:03:41.796683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2
Q13
median6
Q317
95-th percentile72.7
Maximum2975
Range2973
Interquartile range (IQR)14

Descriptive statistics

Standard deviation119.46461
Coefficient of variation (CV)5.258337
Kurtosis509.41261
Mean22.719086
Median Absolute Deviation (MAD)3
Skewness21.23225
Sum16903
Variance14271.793
MonotonicityNot monotonic
2023-12-11T07:03:42.168875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 123
16.5%
2 98
13.2%
4 85
 
11.4%
6 70
 
9.4%
5 62
 
8.3%
7 18
 
2.4%
12 17
 
2.3%
8 16
 
2.2%
17 15
 
2.0%
11 12
 
1.6%
Other values (83) 228
30.6%
ValueCountFrequency (%)
2 98
13.2%
3 123
16.5%
4 85
11.4%
5 62
8.3%
6 70
9.4%
7 18
 
2.4%
8 16
 
2.2%
9 11
 
1.5%
10 10
 
1.3%
11 12
 
1.6%
ValueCountFrequency (%)
2975 1
0.1%
1001 1
0.1%
356 1
0.1%
330 1
0.1%
305 1
0.1%
291 1
0.1%
263 1
0.1%
211 1
0.1%
192 1
0.1%
188 1
0.1%

Interactions

2023-12-11T07:03:39.722206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:03:39.583640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:03:39.796944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:03:39.647630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T07:03:42.261018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
집계년도시군명발생지역명2통계자료원인시설구분명통계원인물질역학결과내역통계환자수
집계년도1.0000.3570.6630.3480.6370.000
시군명0.3571.0001.0000.2530.3430.819
발생지역명20.6631.0001.0000.4940.0000.814
통계자료원인시설구분명0.3480.2530.4941.0000.5180.100
통계원인물질역학결과내역0.6370.3430.0000.5181.0000.000
통계환자수0.0000.8190.8140.1000.0001.000
2023-12-11T07:03:42.348938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통계자료원인시설구분명통계원인물질역학결과내역시군명
통계자료원인시설구분명1.0000.2580.106
통계원인물질역학결과내역0.2581.0000.097
시군명0.1060.0971.000
2023-12-11T07:03:42.437387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
집계년도통계환자수시군명통계자료원인시설구분명통계원인물질역학결과내역
집계년도1.0000.1000.1430.2240.292
통계환자수0.1001.0000.5710.0690.000
시군명0.1430.5711.0000.1060.097
통계자료원인시설구분명0.2240.0690.1061.0000.258
통계원인물질역학결과내역0.2920.0000.0970.2581.000

Missing values

2023-12-11T07:03:39.894936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:03:40.001764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T07:03:40.080895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

집계년도시군명발생일자최초신고일자발생지역명1발생지역명2통계자료원인시설구분명통계원인물질역학결과내역통계환자수
02021고양시<NA><NA>경기경기 고양음식점살모넬라106
12021고양시<NA><NA>경기경기 고양음식점살모넬라5
22021고양시<NA><NA>경기경기 고양기타리스테리아 모노사이토제네스17
32021광명시<NA><NA>경기경기 광명기타노로바이러스6
42021광명시<NA><NA>경기경기 광명음식점살모넬라2
52021광주시<NA><NA>경기경기 광주학교 외 집단급식병원성대장균19
62021광주시<NA><NA>경기경기 광주음식점불명10
72021군포시<NA><NA>경기경기 군포기타살모넬라4
82021남양주시<NA><NA>경기경기 남양주학교클로스트리디움퍼프린젠스46
92021성남시<NA><NA>경기경기 성남음식점살모넬라192
집계년도시군명발생일자최초신고일자발생지역명1발생지역명2통계자료원인시설구분명통계원인물질역학결과내역통계환자수
7342012의정부시2012-06-09<NA>경기의정부음식점살모넬라10
7352012이천시2012-12-24<NA>경기이천음식점노로바이러스21
7362012파주시2012-08-08<NA>경기파주음식점불검출5
7372012평택시2012-05-17<NA>경기평택불명불검출12
7382012포천시2012-09-10<NA>경기포천기타진행중4
7392012화성시2012-12-27<NA>경기화성음식점진행중4
7402012화성시2012-12-07<NA>경기화성음식점불검출22
7412012화성시2012-12-03<NA>경기화성음식점불검출6
7422012화성시2012-09-09<NA>경기화성음식점장염비브리오3
7432012화성시2012-02-20<NA>경기화성음식점불검출8

Duplicate rows

Most frequently occurring

집계년도시군명발생일자최초신고일자발생지역명1발생지역명2통계자료원인시설구분명통계원인물질역학결과내역통계환자수# duplicates
02016김포시2016-04-202016-04-21경기경기 김포음식점원충32
12019화성시<NA><NA>경기경기 화성학교 외 집단급식노로바이러스172
22020용인시<NA><NA>경기경기 용인음식점불명62