Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells55
Missing cells (%)0.1%
Duplicate rows357
Duplicate rows (%)3.6%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Text1
Categorical3
Numeric1
Boolean1

Dataset

Description검진사후관리 대상자별 상담자 검진내역 등 열람일자 정보 1 건강사후관리번호 2 조회일자 (열람 화면별 조회일자 표기) 3 지사코드 4 삭제여부 (Y: 삭제 N: 삭제아님) □ 자료 제공 범위 o 조회일자 기준 최근 ‘1개월’ (2023년7월28일~2023년8월28일)
URLhttps://www.data.go.kr/data/15120942/fileData.do

Alerts

발췌년도 has constant value ""Constant
발췌년월 has constant value ""Constant
삭제여부 has constant value ""Constant
Dataset has 357 (3.6%) duplicate rowsDuplicates

Reproduction

Analysis started2023-12-12 15:08:35.247650
Analysis finished2023-12-12 15:08:35.836820
Duration0.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3195
Distinct (%)31.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T00:08:36.017095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length12.2772
Min length11

Characters and Unicode

Total characters122772
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3193 ?
Unique (%)31.9%

Sample

1st row5.02202E+14
2nd row5.02202E+14
3rd rowA01202301306493
4th row5.02202E+14
5th rowA01202301316368
ValueCountFrequency (%)
5.02202e+14 3422
34.2%
5.01202e+14 3385
33.9%
a01202301312497 1
 
< 0.1%
a01202301332645 1
 
< 0.1%
a01202301304199 1
 
< 0.1%
a01202301304114 1
 
< 0.1%
a01202301325792 1
 
< 0.1%
a01202301308907 1
 
< 0.1%
a01202301319596 1
 
< 0.1%
a01202301309107 1
 
< 0.1%
Other values (3185) 3185
31.9%
2023-12-13T00:08:36.491619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 25761
21.0%
0 25230
20.6%
1 18905
15.4%
5 8074
 
6.6%
4 8053
 
6.6%
3 7951
 
6.5%
. 6807
 
5.5%
E 6807
 
5.5%
+ 6807
 
5.5%
A 3193
 
2.6%
Other values (4) 5184
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 99158
80.8%
Uppercase Letter 10000
 
8.1%
Other Punctuation 6807
 
5.5%
Math Symbol 6807
 
5.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 25761
26.0%
0 25230
25.4%
1 18905
19.1%
5 8074
 
8.1%
4 8053
 
8.1%
3 7951
 
8.0%
6 1307
 
1.3%
8 1294
 
1.3%
7 1292
 
1.3%
9 1291
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
E 6807
68.1%
A 3193
31.9%
Other Punctuation
ValueCountFrequency (%)
. 6807
100.0%
Math Symbol
ValueCountFrequency (%)
+ 6807
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 112772
91.9%
Latin 10000
 
8.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2 25761
22.8%
0 25230
22.4%
1 18905
16.8%
5 8074
 
7.2%
4 8053
 
7.1%
3 7951
 
7.1%
. 6807
 
6.0%
+ 6807
 
6.0%
6 1307
 
1.2%
8 1294
 
1.1%
Other values (2) 2583
 
2.3%
Latin
ValueCountFrequency (%)
E 6807
68.1%
A 3193
31.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 122772
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 25761
21.0%
0 25230
20.6%
1 18905
15.4%
5 8074
 
6.6%
4 8053
 
6.6%
3 7951
 
6.5%
. 6807
 
5.5%
E 6807
 
5.5%
+ 6807
 
5.5%
A 3193
 
2.6%
Other values (4) 5184
 
4.2%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
502
3422 
501
3385 
A01
3193 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row502
2nd row502
3rd rowA01
4th row502
5th rowA01

Common Values

ValueCountFrequency (%)
502 3422
34.2%
501 3385
33.9%
A01 3193
31.9%

Length

2023-12-13T00:08:36.617669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:08:36.706275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
502 3422
34.2%
501 3385
33.9%
a01 3193
31.9%

수행지사코드
Real number (ℝ)

Distinct179
Distinct (%)1.8%
Missing55
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean387.14781
Minimum101
Maximum9998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T00:08:36.824158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile109
Q1211
median313
Q3562
95-th percentile755
Maximum9998
Range9897
Interquartile range (IQR)351

Descriptive statistics

Standard deviation431.40286
Coefficient of variation (CV)1.1143105
Kurtosis368.91229
Mean387.14781
Median Absolute Deviation (MAD)173
Skewness16.729079
Sum3850185
Variance186108.42
MonotonicityNot monotonic
2023-12-13T00:08:36.975338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
112 168
 
1.7%
321 142
 
1.4%
232 141
 
1.4%
312 132
 
1.3%
235 119
 
1.2%
551 113
 
1.1%
251 113
 
1.1%
767 112
 
1.1%
342 111
 
1.1%
131 108
 
1.1%
Other values (169) 8686
86.9%
ValueCountFrequency (%)
101 70
0.7%
103 48
0.5%
104 56
0.6%
105 77
0.8%
106 83
0.8%
107 54
0.5%
108 56
0.6%
109 65
0.7%
110 64
0.6%
111 103
1.0%
ValueCountFrequency (%)
9998 15
 
0.1%
802 28
 
0.3%
801 60
0.6%
771 51
0.5%
769 27
 
0.3%
767 112
1.1%
765 35
 
0.4%
762 22
 
0.2%
759 25
 
0.2%
757 57
0.6%

발췌년도
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023
10000 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023
2nd row2023
3rd row2023
4th row2023
5th row2023

Common Values

ValueCountFrequency (%)
2023 10000
100.0%

Length

2023-12-13T00:08:37.095152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:08:37.196842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023 10000
100.0%

발췌년월
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-08
10000 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-08
2nd row2023-08
3rd row2023-08
4th row2023-08
5th row2023-08

Common Values

ValueCountFrequency (%)
2023-08 10000
100.0%

Length

2023-12-13T00:08:37.295720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:08:37.386344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-08 10000
100.0%

삭제여부
Boolean

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size87.9 KiB
False
10000 
ValueCountFrequency (%)
False 10000
100.0%
2023-12-13T00:08:37.464001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-13T00:08:35.519516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:08:37.517387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
건강사후업무구분코드수행지사코드
건강사후업무구분코드1.0000.016
수행지사코드0.0161.000
2023-12-13T00:08:37.594881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수행지사코드건강사후업무구분코드
수행지사코드1.0000.027
건강사후업무구분코드0.0271.000

Missing values

2023-12-13T00:08:35.652285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:08:35.769117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

건강사후관리번호건강사후업무구분코드수행지사코드발췌년도발췌년월삭제여부
834935.02202E+1450212920232023-08N
614005.02202E+1450275620232023-08N
205A01202301306493A0125120232023-08N
806915.02202E+1450225120232023-08N
9275A01202301316368A0172120232023-08N
825275.02202E+1450265220232023-08N
843515.02202E+1450270420232023-08N
19291A01202301318210A0171620232023-08N
491515.01202E+1450131420232023-08N
360045.02202E+1450210820232023-08N
건강사후관리번호건강사후업무구분코드수행지사코드발췌년도발췌년월삭제여부
9995A01202301325681A0130520232023-08N
724085.02202E+1450275120232023-08N
26270A01202301326909A0123720232023-08N
24096A01202301312497A0126420232023-08N
324925.02202E+1450232120232023-08N
288015.01202E+1450113020232023-08N
20156A01202301328538A0125420232023-08N
10150A01202301304833A0114020232023-08N
43069A01202301330182A0131820232023-08N
781645.01202E+1450122120232023-08N

Duplicate rows

Most frequently occurring

건강사후관리번호건강사후업무구분코드수행지사코드발췌년도발췌년월삭제여부# duplicates
95.01202E+1450111120232023-08N66
1885.02202E+1450211220232023-08N65
815.01202E+1450132120232023-08N61
105.01202E+1450111220232023-08N60
1725.01202E+1450176720232023-08N59
2275.02202E+1450223220232023-08N52
2295.02202E+1450223520232023-08N52
565.01202E+1450125120232023-08N51
485.01202E+1450123220232023-08N50
655.01202E+1450130220232023-08N50