Overview

Dataset statistics

Number of variables7
Number of observations6783
Missing cells594
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory397.6 KiB
Average record size in memory60.0 B

Variable types

Categorical2
Numeric3
DateTime1
Text1

Dataset

Descriptiono (내용) 사업장 건강검진 대상자 명단 발급 내역 o (대상) 당해연도 건강검진 대상자가 존재하는 건강보험 가입 사업장 o (변수 레이아웃) 1 작성차수 2 사업장기호 3 단위사업장기호 4 검진년도 5 EDI진행상태(1:미처리, 2: 처리완료, 3: 반송, 4: 삭제) 6 신청일시(EDI 신청한 일시(예: 20200612110446 = 2020년06월12일11시04분46초)) 7 송수신파일명(지사, 사업장기호, 신청발급일시가 포함된 파일명) o (자료제공범위) 조회일자 기준 최근 ‘1개월’ (2023년7월28일~2023년8월28일)
URLhttps://www.data.go.kr/data/15121843/fileData.do

Alerts

EDI진행상태 is highly imbalanced (74.4%)Imbalance
송수신파일명 has 594 (8.8%) missing valuesMissing
단위사업장기호 is highly skewed (γ1 = 50.13889188)Skewed
단위사업장기호 has 6737 (99.3%) zerosZeros

Reproduction

Analysis started2023-12-12 23:21:21.588396
Analysis finished2023-12-12 23:21:23.017391
Duration1.43 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

작성차수
Categorical

Distinct31
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size53.1 KiB
20230822ZZ
 
439
20230817ZZ
 
426
20230810ZZ
 
411
20230821ZZ
 
398
20230816ZZ
 
398
Other values (26)
4711 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row20230728ZZ
2nd row20230728ZZ
3rd row20230728ZZ
4th row20230728ZZ
5th row20230728ZZ

Common Values

ValueCountFrequency (%)
20230822ZZ 439
 
6.5%
20230817ZZ 426
 
6.3%
20230810ZZ 411
 
6.1%
20230821ZZ 398
 
5.9%
20230816ZZ 398
 
5.9%
20230824ZZ 374
 
5.5%
20230828ZZ 372
 
5.5%
20230823ZZ 371
 
5.5%
20230818ZZ 344
 
5.1%
20230825ZZ 329
 
4.9%
Other values (21) 2921
43.1%

Length

2023-12-13T08:21:23.089648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
20230822zz 439
 
6.5%
20230817zz 426
 
6.3%
20230810zz 411
 
6.1%
20230821zz 398
 
5.9%
20230816zz 398
 
5.9%
20230824zz 374
 
5.5%
20230828zz 372
 
5.5%
20230823zz 371
 
5.5%
20230818zz 344
 
5.1%
20230825zz 329
 
4.9%
Other values (21) 2921
43.1%

사업장기호
Real number (ℝ)

Distinct5411
Distinct (%)79.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72453903
Minimum10000021
Maximum79720343
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size59.7 KiB
2023-12-13T08:21:23.211738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10000021
5-th percentile70065399
Q171043966
median74670090
Q377330700
95-th percentile79325956
Maximum79720343
Range69720322
Interquartile range (IQR)6286733.5

Descriptive statistics

Standard deviation11022652
Coefficient of variation (CV)0.1521333
Kurtosis23.098609
Mean72453903
Median Absolute Deviation (MAD)3372853
Skewness-4.7191784
Sum4.9145482 × 1011
Variance1.2149885 × 1014
MonotonicityNot monotonic
2023-12-13T08:21:23.364810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
71108887 13
 
0.2%
71106811 12
 
0.2%
71114420 12
 
0.2%
72296310 11
 
0.2%
76384358 10
 
0.1%
70123482 10
 
0.1%
71098879 10
 
0.1%
76755096 9
 
0.1%
71103263 9
 
0.1%
75605310 8
 
0.1%
Other values (5401) 6679
98.5%
ValueCountFrequency (%)
10000021 1
< 0.1%
10000070 2
< 0.1%
10000158 1
< 0.1%
10000164 1
< 0.1%
10000186 1
< 0.1%
10000224 1
< 0.1%
10000229 2
< 0.1%
10000354 1
< 0.1%
10000583 1
< 0.1%
10000600 1
< 0.1%
ValueCountFrequency (%)
79720343 1
 
< 0.1%
79719144 1
 
< 0.1%
79718714 1
 
< 0.1%
79718423 1
 
< 0.1%
79717774 1
 
< 0.1%
79717673 1
 
< 0.1%
79717043 1
 
< 0.1%
79715829 3
< 0.1%
79715777 1
 
< 0.1%
79715458 1
 
< 0.1%

단위사업장기호
Real number (ℝ)

SKEWED  ZEROS 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.058823529
Minimum0
Maximum117
Zeros6737
Zeros (%)99.3%
Negative0
Negative (%)0.0%
Memory size59.7 KiB
2023-12-13T08:21:23.480057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum117
Range117
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.0808464
Coefficient of variation (CV)35.374389
Kurtosis2666.5059
Mean0.058823529
Median Absolute Deviation (MAD)0
Skewness50.138892
Sum399
Variance4.3299218
MonotonicityNot monotonic
2023-12-13T08:21:23.593440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
0 6737
99.3%
1 33
 
0.5%
5 2
 
< 0.1%
13 1
 
< 0.1%
18 1
 
< 0.1%
3 1
 
< 0.1%
117 1
 
< 0.1%
16 1
 
< 0.1%
17 1
 
< 0.1%
110 1
 
< 0.1%
Other values (4) 4
 
0.1%
ValueCountFrequency (%)
0 6737
99.3%
1 33
 
0.5%
2 1
 
< 0.1%
3 1
 
< 0.1%
4 1
 
< 0.1%
5 2
 
< 0.1%
7 1
 
< 0.1%
13 1
 
< 0.1%
16 1
 
< 0.1%
17 1
 
< 0.1%
ValueCountFrequency (%)
117 1
< 0.1%
110 1
< 0.1%
49 1
< 0.1%
18 1
< 0.1%
17 1
< 0.1%
16 1
< 0.1%
13 1
< 0.1%
7 1
< 0.1%
5 2
< 0.1%
4 1
< 0.1%

검진년도
Real number (ℝ)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2022.8664
Minimum2018
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size59.7 KiB
2023-12-13T08:21:23.722879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2018
5-th percentile2022
Q12023
median2023
Q32023
95-th percentile2023
Maximum2023
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.55339843
Coefficient of variation (CV)0.00027357141
Kurtosis27.009929
Mean2022.8664
Median Absolute Deviation (MAD)0
Skewness-5.0054495
Sum13721103
Variance0.30624982
MonotonicityNot monotonic
2023-12-13T08:21:23.906178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2023 6281
92.6%
2022 281
 
4.1%
2021 95
 
1.4%
2020 71
 
1.0%
2019 53
 
0.8%
2018 2
 
< 0.1%
ValueCountFrequency (%)
2018 2
 
< 0.1%
2019 53
 
0.8%
2020 71
 
1.0%
2021 95
 
1.4%
2022 281
 
4.1%
2023 6281
92.6%
ValueCountFrequency (%)
2023 6281
92.6%
2022 281
 
4.1%
2021 95
 
1.4%
2020 71
 
1.0%
2019 53
 
0.8%
2018 2
 
< 0.1%

EDI진행상태
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size53.1 KiB
2
6185 
3
 
477
4
 
77
1
 
44

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 6185
91.2%
3 477
 
7.0%
4 77
 
1.1%
1 44
 
0.6%

Length

2023-12-13T08:21:24.067021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:21:24.183202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 6185
91.2%
3 477
 
7.0%
4 77
 
1.1%
1 44
 
0.6%
Distinct6758
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Memory size53.1 KiB
Minimum2023-07-28 07:52:34
Maximum2023-08-28 20:55:20
2023-12-13T08:21:24.301223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:24.450502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

송수신파일명
Text

MISSING 

Distinct5883
Distinct (%)95.1%
Missing594
Missing (%)8.8%
Memory size53.1 KiB
2023-12-13T08:21:25.004023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length41
Median length41
Mean length41
Min length41

Characters and Unicode

Total characters253749
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5620 ?
Unique (%)90.8%

Sample

1st row000s2_00_0208_10001482_20230731095439.dat
2nd row000s2_00_0402_10003720_20230728174529.dat
3rd row000s2_00_0226_10006082_20230728152602.dat
4th row000s2_00_0701_10006277_20230728162344.dat
5th row000s2_01_0762_10007554_20230801103140.dat
ValueCountFrequency (%)
000s2_00_0132_70123482_20230828163750.dat 10
 
0.2%
000s2_00_0327_71449412_20230807144641.dat 5
 
0.1%
000s2_00_0301_72770264_20230811174530.dat 5
 
0.1%
000s2_00_0312_75690029_20230810153707.dat 4
 
0.1%
000s2_00_0332_70535795_20230822162249.dat 4
 
0.1%
000s2_00_0105_77031219_20230814085032.dat 4
 
0.1%
000s2_00_0316_78743271_20230811140342.dat 4
 
0.1%
000s2_00_0416_78594442_20230828105249.dat 4
 
0.1%
000s2_00_0320_76947695_20230821130726.dat 3
 
< 0.1%
000s2_00_0302_78989756_20230822111137.dat 3
 
< 0.1%
Other values (5873) 6143
99.3%
2023-12-13T08:21:25.473871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 64129
25.3%
2 32333
12.7%
_ 24756
 
9.8%
1 21626
 
8.5%
3 17707
 
7.0%
7 14291
 
5.6%
8 13071
 
5.2%
5 10529
 
4.1%
4 10283
 
4.1%
6 7562
 
3.0%
Other values (6) 37462
14.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 198048
78.0%
Connector Punctuation 24756
 
9.8%
Lowercase Letter 24756
 
9.8%
Other Punctuation 6189
 
2.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 64129
32.4%
2 32333
16.3%
1 21626
 
10.9%
3 17707
 
8.9%
7 14291
 
7.2%
8 13071
 
6.6%
5 10529
 
5.3%
4 10283
 
5.2%
6 7562
 
3.8%
9 6517
 
3.3%
Lowercase Letter
ValueCountFrequency (%)
s 6189
25.0%
d 6189
25.0%
a 6189
25.0%
t 6189
25.0%
Connector Punctuation
ValueCountFrequency (%)
_ 24756
100.0%
Other Punctuation
ValueCountFrequency (%)
. 6189
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 228993
90.2%
Latin 24756
 
9.8%

Most frequent character per script

Common
ValueCountFrequency (%)
0 64129
28.0%
2 32333
14.1%
_ 24756
 
10.8%
1 21626
 
9.4%
3 17707
 
7.7%
7 14291
 
6.2%
8 13071
 
5.7%
5 10529
 
4.6%
4 10283
 
4.5%
6 7562
 
3.3%
Other values (2) 12706
 
5.5%
Latin
ValueCountFrequency (%)
s 6189
25.0%
d 6189
25.0%
a 6189
25.0%
t 6189
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 253749
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 64129
25.3%
2 32333
12.7%
_ 24756
 
9.8%
1 21626
 
8.5%
3 17707
 
7.0%
7 14291
 
5.6%
8 13071
 
5.2%
5 10529
 
4.1%
4 10283
 
4.1%
6 7562
 
3.0%
Other values (6) 37462
14.8%

Interactions

2023-12-13T08:21:22.553747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:21.994875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.293545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.644749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.091793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.384667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.728304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.202075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:22.471963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:21:25.601928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
작성차수사업장기호단위사업장기호검진년도EDI진행상태
작성차수1.0000.1440.0000.4120.191
사업장기호0.1441.0000.0000.0630.071
단위사업장기호0.0000.0001.0000.0000.000
검진년도0.4120.0630.0001.0000.121
EDI진행상태0.1910.0710.0000.1211.000
2023-12-13T08:21:25.698855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
EDI진행상태작성차수
EDI진행상태1.0000.100
작성차수0.1001.000
2023-12-13T08:21:25.794303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업장기호단위사업장기호검진년도작성차수EDI진행상태
사업장기호1.000-0.0910.0630.0640.046
단위사업장기호-0.0911.000-0.0610.0000.000
검진년도0.063-0.0611.0000.1850.109
작성차수0.0640.0000.1851.0000.100
EDI진행상태0.0460.0000.1090.1001.000

Missing values

2023-12-13T08:21:22.850819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:21:22.965047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

작성차수사업장기호단위사업장기호검진년도EDI진행상태신청일시송수신파일명
020230728ZZ100014820202322023-07-28 11:54:02000s2_00_0208_10001482_20230731095439.dat
120230728ZZ100037200202322023-07-28 15:35:07000s2_00_0402_10003720_20230728174529.dat
220230728ZZ100060820202322023-07-28 15:18:10000s2_00_0226_10006082_20230728152602.dat
320230728ZZ100062770202322023-07-28 16:20:00000s2_00_0701_10006277_20230728162344.dat
420230728ZZ100075540202322023-07-28 09:40:33000s2_01_0762_10007554_20230801103140.dat
520230728ZZ100081930202322023-07-28 13:51:22000s2_00_0210_10008193_20230803080649.dat
620230728ZZ700154320202322023-07-28 14:39:37000s2_00_0113_70015432_20230728154129.dat
720230728ZZ700156920202322023-07-28 13:02:24000s2_00_0134_70015692_20230728130631.dat
820230728ZZ700387400202322023-07-28 11:29:07000s2_00_0112_70038740_20230728130100.dat
920230728ZZ700532350202322023-07-28 15:17:03000s2_00_0328_70053235_20230728155639.dat
작성차수사업장기호단위사업장기호검진년도EDI진행상태신청일시송수신파일명
677320230828ZZ711755500202222023-08-28 14:21:35000s2_00_0551_71175550_20230828144412.dat
677420230828ZZ793182890202322023-08-28 14:53:02000s2_00_0769_79318289_20230828150135.dat
677520230828ZZ701234820202322023-08-28 14:34:45000s2_00_0132_70123482_20230828163750.dat
677620230828ZZ705638620202322023-08-28 11:20:43000s2_00_0111_70563862_20230828112746.dat
677720230828ZZ711755500202322023-08-28 14:21:39000s2_00_0551_71175550_20230828144419.dat
677820230828ZZ701234820202322023-08-28 14:35:34000s2_00_0132_70123482_20230828163750.dat
677920230828ZZ701234820202322023-08-28 14:36:14000s2_00_0132_70123482_20230828163750.dat
678020230828ZZ701234820202322023-08-28 14:38:28000s2_00_0132_70123482_20230828163750.dat
678120230828ZZ701234820202322023-08-28 15:39:47000s2_00_0132_70123482_20230828163750.dat
678220230828ZZ701234820202322023-08-28 15:41:57000s2_00_0132_70123482_20230828163750.dat