Overview

Dataset statistics

Number of variables12
Number of observations10000
Missing cells29116
Missing cells (%)24.3%
Duplicate rows2391
Duplicate rows (%)23.9%
Total size in memory1.1 MiB
Average record size in memory113.0 B

Variable types

Categorical2
Numeric5
Unsupported2
Boolean1
DateTime2

Dataset

Description기초안전보건교육 실시계획정보
Author한국산업안전보건공단
URLhttps://www.data.go.kr/data/15066095/fileData.do

Alerts

DELETE_AT has constant value ""Constant
Dataset has 2391 (23.9%) duplicate rowsDuplicates
PLAN_SN is highly overall correlated with EDC_DE and 2 other fieldsHigh correlation
EDC_DE is highly overall correlated with PLAN_SN and 2 other fieldsHigh correlation
EDC_BEGIN_TIME is highly overall correlated with EDC_END_TIMEHigh correlation
EDC_END_TIME is highly overall correlated with EDC_BEGIN_TIMEHigh correlation
DIPLOMA_ISSU_DE is highly overall correlated with PLAN_SN and 2 other fieldsHigh correlation
EDU_YEAR is highly overall correlated with PLAN_SN and 2 other fieldsHigh correlation
FILE_EXTSN has 10000 (100.0%) missing valuesMissing
FILE_SIZE has 10000 (100.0%) missing valuesMissing
LAST_UPDUSR_PNTTM has 9099 (91.0%) missing valuesMissing
FILE_EXTSN is an unsupported type, check if it needs cleaning or further analysisUnsupported
FILE_SIZE is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 20:31:38.094344
Analysis finished2023-12-12 20:31:42.867013
Duration4.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

EDU_YEAR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2015
3907 
2016
3893 
2017
2200 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2015
3rd row2015
4th row2016
5th row2016

Common Values

ValueCountFrequency (%)
2015 3907
39.1%
2016 3893
38.9%
2017 2200
22.0%

Length

2023-12-13T05:31:43.395375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:31:43.484710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2015 3907
39.1%
2016 3893
38.9%
2017 2200
22.0%

PLAN_SN
Real number (ℝ)

HIGH CORRELATION 

Distinct1761
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3408.2516
Minimum820
Maximum6589
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T05:31:43.591046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum820
5-th percentile1041
Q11599
median3871.5
Q35027
95-th percentile5928
Maximum6589
Range5769
Interquartile range (IQR)3428

Descriptive statistics

Standard deviation1743.1857
Coefficient of variation (CV)0.51146038
Kurtosis-1.5546027
Mean3408.2516
Median Absolute Deviation (MAD)1700.5
Skewness-0.01954985
Sum34082516
Variance3038696.2
MonotonicityNot monotonic
2023-12-13T05:31:43.718225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1964 16
 
0.2%
1686 15
 
0.1%
1666 15
 
0.1%
5442 15
 
0.1%
5581 14
 
0.1%
4702 14
 
0.1%
1423 14
 
0.1%
1069 13
 
0.1%
4394 13
 
0.1%
5928 13
 
0.1%
Other values (1751) 9858
98.6%
ValueCountFrequency (%)
820 1
 
< 0.1%
845 2
 
< 0.1%
879 9
0.1%
880 5
0.1%
881 3
 
< 0.1%
882 7
0.1%
883 8
0.1%
884 11
0.1%
885 6
0.1%
886 6
0.1%
ValueCountFrequency (%)
6589 3
 
< 0.1%
6537 1
 
< 0.1%
6495 7
0.1%
6492 6
0.1%
6236 4
 
< 0.1%
6235 5
0.1%
6232 7
0.1%
6229 10
0.1%
6227 7
0.1%
6226 7
0.1%

EDC_DE
Real number (ℝ)

HIGH CORRELATION 

Distinct489
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20159056
Minimum20141231
Maximum20170701
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T05:31:43.858042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20141231
5-th percentile20150709
Q120151109
median20160613
Q320161103
95-th percentile20170525
Maximum20170701
Range29470
Interquartile range (IQR)9994

Descriptive statistics

Standard deviation7444.5654
Coefficient of variation (CV)0.00036929137
Kurtosis-1.2358241
Mean20159056
Median Absolute Deviation (MAD)9496
Skewness0.28856889
Sum2.0159056 × 1011
Variance55421554
MonotonicityNot monotonic
2023-12-13T05:31:44.013793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20151127 88
 
0.9%
20150923 88
 
0.9%
20151124 81
 
0.8%
20170408 76
 
0.8%
20151117 74
 
0.7%
20150924 74
 
0.7%
20151126 73
 
0.7%
20151130 73
 
0.7%
20151120 70
 
0.7%
20170318 70
 
0.7%
Other values (479) 9233
92.3%
ValueCountFrequency (%)
20141231 1
 
< 0.1%
20150528 4
 
< 0.1%
20150529 3
 
< 0.1%
20150601 8
 
0.1%
20150602 21
0.2%
20150603 9
0.1%
20150604 5
 
0.1%
20150605 6
 
0.1%
20150606 7
 
0.1%
20150609 4
 
< 0.1%
ValueCountFrequency (%)
20170701 7
 
0.1%
20170630 9
0.1%
20170629 8
0.1%
20170627 1
 
< 0.1%
20170624 7
 
0.1%
20170623 1
 
< 0.1%
20170622 7
 
0.1%
20170621 19
0.2%
20170620 12
0.1%
20170619 1
 
< 0.1%

EDC_BEGIN_TIME
Real number (ℝ)

HIGH CORRELATION 

Distinct45
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1314.9538
Minimum400
Maximum1930
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T05:31:44.188698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum400
5-th percentile900
Q11100
median1400
Q31400
95-th percentile1743
Maximum1930
Range1530
Interquartile range (IQR)300

Descriptive statistics

Standard deviation268.57688
Coefficient of variation (CV)0.20424815
Kurtosis-0.55678601
Mean1314.9538
Median Absolute Deviation (MAD)100
Skewness-0.1817971
Sum13149538
Variance72133.542
MonotonicityNot monotonic
2023-12-13T05:31:44.366471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
1400 3147
31.5%
900 1513
15.1%
1300 1108
 
11.1%
1330 811
 
8.1%
930 507
 
5.1%
1000 412
 
4.1%
1600 358
 
3.6%
1730 299
 
3.0%
1830 291
 
2.9%
1630 286
 
2.9%
Other values (35) 1268
12.7%
ValueCountFrequency (%)
400 10
 
0.1%
730 18
 
0.2%
830 11
 
0.1%
900 1513
15.1%
901 2
 
< 0.1%
903 2
 
< 0.1%
908 2
 
< 0.1%
910 1
 
< 0.1%
913 1
 
< 0.1%
914 1
 
< 0.1%
ValueCountFrequency (%)
1930 4
 
< 0.1%
1900 25
 
0.2%
1850 7
 
0.1%
1830 291
2.9%
1800 173
1.7%
1740 13
 
0.1%
1730 299
3.0%
1720 268
2.7%
1700 23
 
0.2%
1630 286
2.9%

EDC_END_TIME
Real number (ℝ)

HIGH CORRELATION 

Distinct33
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1604.505
Minimum700
Maximum2230
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T05:31:44.539999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum700
5-th percentile1200
Q11500
median1700
Q31700
95-th percentile2043
Maximum2230
Range1530
Interquartile range (IQR)200

Descriptive statistics

Standard deviation256.50184
Coefficient of variation (CV)0.15986353
Kurtosis-0.39130953
Mean1604.505
Median Absolute Deviation (MAD)100
Skewness-0.27296423
Sum16045050
Variance65793.194
MonotonicityNot monotonic
2023-12-13T05:31:44.676513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
1700 3127
31.3%
1200 1516
15.2%
1600 1110
 
11.1%
1630 727
 
7.3%
1230 495
 
5.0%
1300 397
 
4.0%
1900 382
 
3.8%
2130 295
 
2.9%
1530 290
 
2.9%
1930 270
 
2.7%
Other values (23) 1391
13.9%
ValueCountFrequency (%)
700 10
 
0.1%
1000 7
 
0.1%
1030 24
 
0.2%
1100 5
 
0.1%
1130 11
 
0.1%
1200 1516
15.2%
1230 495
 
5.0%
1300 397
 
4.0%
1330 7
 
0.1%
1400 9
 
0.1%
ValueCountFrequency (%)
2230 4
 
< 0.1%
2200 25
 
0.2%
2150 7
 
0.1%
2130 295
2.9%
2100 169
1.7%
2040 13
 
0.1%
2030 68
 
0.7%
2000 21
 
0.2%
1930 270
2.7%
1900 382
3.8%

DIPLOMA_ISSU_DE
Real number (ℝ)

HIGH CORRELATION 

Distinct488
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20159057
Minimum20150528
Maximum20170701
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T05:31:44.833239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20150528
5-th percentile20150709
Q120151109
median20160613
Q320161103
95-th percentile20170525
Maximum20170701
Range20173
Interquartile range (IQR)9994

Descriptive statistics

Standard deviation7442.5925
Coefficient of variation (CV)0.00036919349
Kurtosis-1.2374855
Mean20159057
Median Absolute Deviation (MAD)9496
Skewness0.28966142
Sum2.0159057 × 1011
Variance55392183
MonotonicityNot monotonic
2023-12-13T05:31:45.007925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20150923 88
 
0.9%
20151127 88
 
0.9%
20151124 81
 
0.8%
20170408 76
 
0.8%
20151117 74
 
0.7%
20150924 74
 
0.7%
20151130 73
 
0.7%
20151126 73
 
0.7%
20151123 70
 
0.7%
20170318 70
 
0.7%
Other values (478) 9233
92.3%
ValueCountFrequency (%)
20150528 4
 
< 0.1%
20150529 3
 
< 0.1%
20150601 8
 
0.1%
20150602 21
0.2%
20150603 9
0.1%
20150604 5
 
0.1%
20150605 6
 
0.1%
20150606 7
 
0.1%
20150611 15
0.1%
20150612 17
0.2%
ValueCountFrequency (%)
20170701 7
 
0.1%
20170630 9
0.1%
20170629 8
0.1%
20170627 1
 
< 0.1%
20170624 7
 
0.1%
20170623 1
 
< 0.1%
20170622 7
 
0.1%
20170621 19
0.2%
20170620 12
0.1%
20170619 1
 
< 0.1%

SEXDSTN
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2
5900 
1
4100 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 5900
59.0%
1 4100
41.0%

Length

2023-12-13T05:31:45.173210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:31:45.293321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 5900
59.0%
1 4100
41.0%

FILE_EXTSN
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

FILE_SIZE
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

DELETE_AT
Boolean

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing6
Missing (%)0.1%
Memory size97.7 KiB
False
9994 
(Missing)
 
6
ValueCountFrequency (%)
False 9994
99.9%
(Missing) 6
 
0.1%
2023-12-13T05:31:45.411781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct3578
Distinct (%)35.8%
Missing11
Missing (%)0.1%
Memory size156.2 KiB
Minimum2015-05-28 09:00:00
Maximum2017-07-05 09:36:54
2023-12-13T05:31:45.533680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:45.705610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

LAST_UPDUSR_PNTTM
Date

MISSING 

Distinct574
Distinct (%)63.7%
Missing9099
Missing (%)91.0%
Memory size156.2 KiB
Minimum2015-11-13 15:54:30
Maximum2017-07-25 17:08:59
2023-12-13T05:31:45.886650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:46.068857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-13T05:31:41.758814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.188026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.754982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.391062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.061984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.903232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.280373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.850933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.498150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.184226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:42.042198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.384377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.995096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.623894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.313808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:42.164041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.487278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.139765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.785131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.464266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:42.304030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:39.603339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.267949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:40.921110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T05:31:41.611633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T05:31:46.186143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
EDU_YEARPLAN_SNEDC_DEEDC_BEGIN_TIMEEDC_END_TIMEDIPLOMA_ISSU_DESEXDSTN
EDU_YEAR1.0000.9351.0000.2730.3660.9700.060
PLAN_SN0.9351.0000.9350.3300.3080.9480.164
EDC_DE1.0000.9351.0000.2730.3661.0000.060
EDC_BEGIN_TIME0.2730.3300.2731.0000.9610.3530.314
EDC_END_TIME0.3660.3080.3660.9611.0000.2440.297
DIPLOMA_ISSU_DE0.9700.9481.0000.3530.2441.0000.197
SEXDSTN0.0600.1640.0600.3140.2970.1971.000
2023-12-13T05:31:46.333159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SEXDSTNEDU_YEAR
SEXDSTN1.0000.099
EDU_YEAR0.0991.000
2023-12-13T05:31:46.426376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
PLAN_SNEDC_DEEDC_BEGIN_TIMEEDC_END_TIMEDIPLOMA_ISSU_DEEDU_YEARSEXDSTN
PLAN_SN1.0000.9870.004-0.0030.9870.9150.126
EDC_DE0.9871.0000.004-0.0031.0001.0000.099
EDC_BEGIN_TIME0.0040.0041.0000.9790.0040.1820.237
EDC_END_TIME-0.003-0.0030.9791.000-0.0040.1750.297
DIPLOMA_ISSU_DE0.9871.0000.004-0.0041.0001.0000.131
EDU_YEAR0.9151.0000.1820.1751.0001.0000.099
SEXDSTN0.1260.0990.2370.2970.1310.0991.000

Missing values

2023-12-13T05:31:42.470730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:31:42.663510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T05:31:42.786053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

EDU_YEARPLAN_SNEDC_DEEDC_BEGIN_TIMEEDC_END_TIMEDIPLOMA_ISSU_DESEXDSTNFILE_EXTSNFILE_SIZEDELETE_ATFRST_REGISTER_PNTTMLAST_UPDUSR_PNTTM
46904201647892016092813301630201609282<NA><NA>N2016-09-27 17:21:40NaT
13538201515902015092214001700201509222<NA><NA>N2015-09-22 09:00:00NaT
20612201516912015112713001600201511271<NA><NA>N2015-11-26 17:57:41NaT
34256201640672016070114001700201607012<NA><NA>N2016-07-07 14:39:20NaT
26684201633922016052617302030201605262<NA><NA>N2016-05-27 13:41:32NaT
3975620164476201608209001200201608202<NA><NA>N2016-08-20 11:31:34NaT
16869201516072015092413301630201509242<NA><NA>N2015-09-24 09:00:00NaT
56015201754022017032813001600201703282<NA><NA>N2017-03-30 19:49:012017-03-30 19:49:46
720120151232201506019001200201506011<NA><NA>N2015-06-01 09:00:00NaT
4393420164781201609249301230201609242<NA><NA>N2016-09-30 14:58:172016-09-30 14:58:49
EDU_YEARPLAN_SNEDC_DEEDC_BEGIN_TIMEEDC_END_TIMEDIPLOMA_ISSU_DESEXDSTNFILE_EXTSNFILE_SIZEDELETE_ATFRST_REGISTER_PNTTMLAST_UPDUSR_PNTTM
52924201753502017032413001600201703242<NA><NA>N2017-03-28 18:49:492017-03-28 18:50:36
41303201642232016072113301630201607211<NA><NA>N2016-07-22 15:40:24NaT
1953320151945201511219001200201511212<NA><NA>N2015-11-24 18:35:30NaT
39496201640542016070614301730201607062<NA><NA>N2016-07-08 09:25:03NaT
47723201650252016103013001600201610302<NA><NA>N2016-10-30 14:45:23NaT
38166201643092016081817301830201608181<NA><NA>N2016-08-22 10:26:21NaT
53019201755822017041510001300201704152<NA><NA>N2017-04-19 12:28:572017-04-19 14:33:54
4852020151314201510279001200201510271<NA><NA>N2016-10-28 17:41:48NaT
31162201623122016041513001600201604152<NA><NA>N2016-04-18 14:02:21NaT
22794201521662015120113001600201512012<NA><NA>N2015-12-01 18:38:49NaT

Duplicate rows

Most frequently occurring

EDU_YEARPLAN_SNEDC_DEEDC_BEGIN_TIMEEDC_END_TIMEDIPLOMA_ISSU_DESEXDSTNDELETE_ATFRST_REGISTER_PNTTMLAST_UPDUSR_PNTTM# duplicates
522201515902015092214001700201509222N2015-09-22 09:00:00NaT12
134201510692015062514001700201506251N2015-06-25 09:00:00NaT11
364201514232015092318302130201509232N2015-09-23 09:00:00NaT11
920158842015102710001300201510272N2015-11-05 00:00:00NaT10
64201510142015071414001700201507142N2015-07-14 09:00:00NaT10
66201510162015071614001700201507162N2015-07-16 09:00:00NaT10
70201510202015072214001700201507222N2015-07-22 09:00:00NaT10
171201510912015072814001700201507281N2015-07-28 09:00:00NaT10
189201511032015070314001700201507031N2015-07-03 09:00:00NaT10
212201511202015081414001700201508141N2015-08-14 09:00:00NaT10