Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells10000
Missing cells (%)33.3%
Duplicate rows935
Duplicate rows (%)9.3%
Total size in memory332.0 KiB
Average record size in memory34.0 B

Variable types

Numeric2
Categorical1

Dataset

Description한국기술교육대학교 온라인평생교육원 스마트 직업훈련 플랫폼 (STEP)에 대한 학습자 학습 활동과 관련된 내용을 제공합니다.
Author한국기술교육대학교
URLhttps://www.data.go.kr/data/15091062/fileData.do

Alerts

Dataset has 935 (9.3%) duplicate rowsDuplicates
과목 배움 콘텐츠 아이디 is highly overall correlated with 마이그레이션 원천 구분High correlation
과목 평가항복 아이디 is highly overall correlated with 마이그레이션 원천 구분High correlation
마이그레이션 원천 구분 is highly overall correlated with 과목 배움 콘텐츠 아이디 and 1 other fieldsHigh correlation
과목 배움 콘텐츠 아이디 has 2609 (26.1%) missing valuesMissing
과목 평가항복 아이디 has 7391 (73.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:58:57.611614
Analysis finished2023-12-12 16:58:59.129637
Duration1.52 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

과목 배움 콘텐츠 아이디
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct4833
Distinct (%)65.4%
Missing2609
Missing (%)26.1%
Infinite0
Infinite (%)0.0%
Mean1613249.6
Minimum1033
Maximum9078931
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T01:58:59.267578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1033
5-th percentile14294
Q1447592
median970376
Q32159797.5
95-th percentile5602201
Maximum9078931
Range9077898
Interquartile range (IQR)1712205.5

Descriptive statistics

Standard deviation1773780.2
Coefficient of variation (CV)1.0995076
Kurtosis3.2692741
Mean1613249.6
Median Absolute Deviation (MAD)796131
Skewness1.8411746
Sum1.1923528 × 1010
Variance3.146296 × 1012
MonotonicityNot monotonic
2023-12-13T01:58:59.470448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14188 168
 
1.7%
14485 123
 
1.2%
1001194 112
 
1.1%
14394 94
 
0.9%
14294 79
 
0.8%
2211568 48
 
0.5%
2185703 48
 
0.5%
468568 47
 
0.5%
14068 43
 
0.4%
393565 42
 
0.4%
Other values (4823) 6587
65.9%
(Missing) 2609
 
26.1%
ValueCountFrequency (%)
1033 1
 
< 0.1%
1480 1
 
< 0.1%
2253 1
 
< 0.1%
2280 1
 
< 0.1%
2287 3
< 0.1%
2288 1
 
< 0.1%
3010 1
 
< 0.1%
3889 1
 
< 0.1%
5374 1
 
< 0.1%
6739 1
 
< 0.1%
ValueCountFrequency (%)
9078931 1
< 0.1%
8745478 1
< 0.1%
8740876 1
< 0.1%
8734285 1
< 0.1%
8733394 1
< 0.1%
8732665 1
< 0.1%
8731012 1
< 0.1%
8727565 1
< 0.1%
8717332 1
< 0.1%
8707222 1
< 0.1%

과목 평가항복 아이디
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1685
Distinct (%)64.6%
Missing7391
Missing (%)73.9%
Infinite0
Infinite (%)0.0%
Mean119530.85
Minimum5943
Maximum367300
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T01:58:59.661462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5943
5-th percentile24173.4
Q163220
median108958
Q3157936
95-th percentile270909
Maximum367300
Range361357
Interquartile range (IQR)94716

Descriptive statistics

Standard deviation75713.27
Coefficient of variation (CV)0.63342031
Kurtosis0.83372719
Mean119530.85
Median Absolute Deviation (MAD)46351
Skewness0.98478179
Sum3.11856 × 108
Variance5.7324993 × 109
MonotonicityNot monotonic
2023-12-13T01:58:59.886786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
115313 90
 
0.9%
251629 31
 
0.3%
108958 27
 
0.3%
39413 26
 
0.3%
96418 22
 
0.2%
255730 19
 
0.2%
162838 19
 
0.2%
103657 18
 
0.2%
27610 16
 
0.2%
48725 15
 
0.1%
Other values (1675) 2326
 
23.3%
(Missing) 7391
73.9%
ValueCountFrequency (%)
5943 2
< 0.1%
6247 1
 
< 0.1%
6248 1
 
< 0.1%
6250 1
 
< 0.1%
6258 1
 
< 0.1%
6260 1
 
< 0.1%
6263 1
 
< 0.1%
6266 3
< 0.1%
6268 1
 
< 0.1%
6402 1
 
< 0.1%
ValueCountFrequency (%)
367300 1
< 0.1%
367165 1
< 0.1%
366688 1
< 0.1%
366349 1
< 0.1%
366163 1
< 0.1%
366121 1
< 0.1%
365974 1
< 0.1%
365725 1
< 0.1%
365680 1
< 0.1%
365347 1
< 0.1%

마이그레이션 원천 구분
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
OLEIPORTAL
7902 
<NA>
2098 

Length

Max length10
Median length10
Mean length8.7412
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOLEIPORTAL
2nd row<NA>
3rd rowOLEIPORTAL
4th rowOLEIPORTAL
5th rowOLEIPORTAL

Common Values

ValueCountFrequency (%)
OLEIPORTAL 7902
79.0%
<NA> 2098
 
21.0%

Length

2023-12-13T01:59:00.078210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:59:00.183668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
oleiportal 7902
79.0%
na 2098
 
21.0%

Interactions

2023-12-13T01:58:58.220530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:58:57.874550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:58:58.451010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:58:58.054498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:59:00.255910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
과목 배움 콘텐츠 아이디과목 평가항복 아이디
과목 배움 콘텐츠 아이디1.000NaN
과목 평가항복 아이디NaN1.000
2023-12-13T01:59:00.375221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
과목 배움 콘텐츠 아이디과목 평가항복 아이디마이그레이션 원천 구분
과목 배움 콘텐츠 아이디1.000NaN1.000
과목 평가항복 아이디NaN1.0001.000
마이그레이션 원천 구분1.0001.0001.000

Missing values

2023-12-13T01:58:58.739501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:58:58.882880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T01:58:59.039787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

과목 배움 콘텐츠 아이디과목 평가항복 아이디마이그레이션 원천 구분
32482700535<NA>OLEIPORTAL
96426<NA>109816<NA>
78124886009<NA>OLEIPORTAL
64477447592<NA>OLEIPORTAL
3632314188<NA>OLEIPORTAL
6551314188<NA>OLEIPORTAL
66651<NA>190315OLEIPORTAL
23630<NA>28006OLEIPORTAL
692431665298<NA>OLEIPORTAL
40631<NA>73874OLEIPORTAL
과목 배움 콘텐츠 아이디과목 평가항복 아이디마이그레이션 원천 구분
2457314188<NA>OLEIPORTAL
792334748967<NA><NA>
874492668643<NA><NA>
8963314068<NA><NA>
50650<NA>55686OLEIPORTAL
40207888818<NA>OLEIPORTAL
3440<NA>115319OLEIPORTAL
69946733143<NA>OLEIPORTAL
54689122504<NA>OLEIPORTAL
7274497718<NA>OLEIPORTAL

Duplicate rows

Most frequently occurring

과목 배움 콘텐츠 아이디과목 평가항복 아이디마이그레이션 원천 구분# duplicates
1214188<NA>OLEIPORTAL168
4011001194<NA>OLEIPORTAL112
2714485<NA>OLEIPORTAL105
828<NA>115313<NA>90
2314394<NA>OLEIPORTAL86
1614294<NA>OLEIPORTAL73
197468568<NA>OLEIPORTAL47
5662211568<NA><NA>47
201468631<NA>OLEIPORTAL41
148393565<NA>OLEIPORTAL40