Overview

Dataset statistics

Number of variables3
Number of observations8748
Missing cells0
Missing cells (%)0.0%
Duplicate rows1520
Duplicate rows (%)17.4%
Total size in memory213.7 KiB
Average record size in memory25.0 B

Variable types

Text1
Numeric1
Categorical1

Dataset

DescriptionHRD4U(www.hrd4u.or.kr)에서 제공하는 콘텐츠 강의관련 정보로, 강의명과 시간 등의 정보가 있음
URLhttps://www.data.go.kr/data/15121940/fileData.do

Alerts

Dataset has 1520 (17.4%) duplicate rowsDuplicates
강의시간(분) is highly overall correlated with 업데이트일자High correlation
업데이트일자 is highly overall correlated with 강의시간(분)High correlation
업데이트일자 is highly imbalanced (60.8%)Imbalance
강의시간(분) is highly skewed (γ1 = 31.0172257)Skewed

Reproduction

Analysis started2023-12-12 06:05:28.260656
Analysis finished2023-12-12 06:05:29.472868
Duration1.21 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct6418
Distinct (%)73.4%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
2023-12-12T15:05:29.796254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length86
Median length79
Mean length16.646662
Min length2

Characters and Unicode

Total characters145625
Distinct characters869
Distinct categories16 ?
Distinct scripts4 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4329 ?
Unique (%)49.5%

Sample

1st row13.고객응대 기술
2nd row14.아웃바운드 고객응대 실전
3rd row15.고객거절, 반론, 불만 극복법
4th row16.아웃바운드 판매촉진 실무
5th row17.아웃바운드 콜마케팅 혁신
ValueCountFrequency (%)
701
 
2.3%
2 308
 
1.0%
1 306
 
1.0%
가공 255
 
0.8%
실습 249
 
0.8%
3 199
 
0.7%
대한 168
 
0.5%
작성 165
 
0.5%
이해 160
 
0.5%
이론 146
 
0.5%
Other values (8529) 27947
91.3%
2023-12-12T15:05:30.344687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26207
 
18.0%
. 4605
 
3.2%
0 3363
 
2.3%
1 3098
 
2.1%
2819
 
1.9%
2 2466
 
1.7%
2105
 
1.4%
2086
 
1.4%
_ 2075
 
1.4%
2049
 
1.4%
Other values (859) 94752
65.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 92234
63.3%
Space Separator 26207
 
18.0%
Decimal Number 13199
 
9.1%
Other Punctuation 5833
 
4.0%
Uppercase Letter 2950
 
2.0%
Connector Punctuation 2075
 
1.4%
Lowercase Letter 1458
 
1.0%
Close Punctuation 582
 
0.4%
Open Punctuation 522
 
0.4%
Dash Punctuation 368
 
0.3%
Other values (6) 197
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2819
 
3.1%
2105
 
2.3%
2086
 
2.3%
2049
 
2.2%
1875
 
2.0%
1566
 
1.7%
1490
 
1.6%
1301
 
1.4%
1224
 
1.3%
1094
 
1.2%
Other values (770) 74625
80.9%
Uppercase Letter
ValueCountFrequency (%)
C 511
17.3%
D 228
 
7.7%
S 189
 
6.4%
O 183
 
6.2%
T 175
 
5.9%
W 173
 
5.9%
A 167
 
5.7%
P 152
 
5.2%
M 142
 
4.8%
N 139
 
4.7%
Other values (16) 891
30.2%
Lowercase Letter
ValueCountFrequency (%)
e 189
13.0%
t 178
12.2%
a 118
 
8.1%
i 117
 
8.0%
r 109
 
7.5%
o 95
 
6.5%
n 85
 
5.8%
s 81
 
5.6%
l 69
 
4.7%
p 63
 
4.3%
Other values (16) 354
24.3%
Other Punctuation
ValueCountFrequency (%)
. 4605
78.9%
, 692
 
11.9%
· 242
 
4.1%
? 111
 
1.9%
! 73
 
1.3%
/ 60
 
1.0%
: 24
 
0.4%
& 9
 
0.2%
' 7
 
0.1%
" 5
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 3363
25.5%
1 3098
23.5%
2 2466
18.7%
3 1792
13.6%
4 940
 
7.1%
5 451
 
3.4%
6 303
 
2.3%
7 281
 
2.1%
8 264
 
2.0%
9 241
 
1.8%
Letter Number
ValueCountFrequency (%)
9
45.0%
9
45.0%
2
 
10.0%
Close Punctuation
ValueCountFrequency (%)
) 458
78.7%
] 124
 
21.3%
Open Punctuation
ValueCountFrequency (%)
( 398
76.2%
[ 124
 
23.8%
Math Symbol
ValueCountFrequency (%)
~ 27
90.0%
+ 3
 
10.0%
Space Separator
ValueCountFrequency (%)
26207
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2075
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 368
100.0%
Control
ValueCountFrequency (%)
127
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Other Symbol
ValueCountFrequency (%)
8
100.0%
Initial Punctuation
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 92223
63.3%
Common 48955
33.6%
Latin 4428
 
3.0%
Han 19
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2819
 
3.1%
2105
 
2.3%
2086
 
2.3%
2049
 
2.2%
1875
 
2.0%
1566
 
1.7%
1490
 
1.6%
1301
 
1.4%
1224
 
1.3%
1094
 
1.2%
Other values (765) 74614
80.9%
Latin
ValueCountFrequency (%)
C 511
 
11.5%
D 228
 
5.1%
e 189
 
4.3%
S 189
 
4.3%
O 183
 
4.1%
t 178
 
4.0%
T 175
 
4.0%
W 173
 
3.9%
A 167
 
3.8%
P 152
 
3.4%
Other values (45) 2283
51.6%
Common
ValueCountFrequency (%)
26207
53.5%
. 4605
 
9.4%
0 3363
 
6.9%
1 3098
 
6.3%
2 2466
 
5.0%
_ 2075
 
4.2%
3 1792
 
3.7%
4 940
 
1.9%
, 692
 
1.4%
) 458
 
0.9%
Other values (23) 3259
 
6.7%
Han
ValueCountFrequency (%)
11
57.9%
2
 
10.5%
2
 
10.5%
2
 
10.5%
1
 
5.3%
1
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 92213
63.3%
ASCII 53109
36.5%
None 250
 
0.2%
Number Forms 20
 
< 0.1%
CJK 19
 
< 0.1%
Punctuation 12
 
< 0.1%
Compat Jamo 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
26207
49.3%
. 4605
 
8.7%
0 3363
 
6.3%
1 3098
 
5.8%
2 2466
 
4.6%
_ 2075
 
3.9%
3 1792
 
3.4%
4 940
 
1.8%
, 692
 
1.3%
C 511
 
1.0%
Other values (72) 7360
 
13.9%
Hangul
ValueCountFrequency (%)
2819
 
3.1%
2105
 
2.3%
2086
 
2.3%
2049
 
2.2%
1875
 
2.0%
1566
 
1.7%
1490
 
1.6%
1301
 
1.4%
1224
 
1.3%
1094
 
1.2%
Other values (763) 74604
80.9%
None
ValueCountFrequency (%)
· 242
96.8%
8
 
3.2%
CJK
ValueCountFrequency (%)
11
57.9%
2
 
10.5%
2
 
10.5%
2
 
10.5%
1
 
5.3%
1
 
5.3%
Number Forms
ValueCountFrequency (%)
9
45.0%
9
45.0%
2
 
10.0%
Punctuation
ValueCountFrequency (%)
9
75.0%
3
 
25.0%
Compat Jamo
ValueCountFrequency (%)
2
100.0%

강의시간(분)
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct66
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.074188
Minimum1
Maximum1024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.0 KiB
2023-12-12T15:05:30.531003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median9
Q324
95-th percentile29
Maximum1024
Range1023
Interquartile range (IQR)19

Descriptive statistics

Standard deviation15.890253
Coefficient of variation (CV)1.2153912
Kurtosis1886.7094
Mean13.074188
Median Absolute Deviation (MAD)6
Skewness31.017226
Sum114373
Variance252.50015
MonotonicityNot monotonic
2023-12-12T15:05:30.710275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1172
 
13.4%
8 645
 
7.4%
9 635
 
7.3%
7 580
 
6.6%
6 538
 
6.1%
27 408
 
4.7%
25 406
 
4.6%
29 392
 
4.5%
5 390
 
4.5%
26 343
 
3.9%
Other values (56) 3239
37.0%
ValueCountFrequency (%)
1 1172
13.4%
2 276
 
3.2%
3 208
 
2.4%
4 289
 
3.3%
5 390
 
4.5%
6 538
6.1%
7 580
6.6%
8 645
7.4%
9 635
7.3%
10 336
 
3.8%
ValueCountFrequency (%)
1024 1
 
< 0.1%
220 1
 
< 0.1%
180 8
0.1%
135 1
 
< 0.1%
129 1
 
< 0.1%
81 1
 
< 0.1%
73 1
 
< 0.1%
71 1
 
< 0.1%
68 1
 
< 0.1%
66 1
 
< 0.1%

업데이트일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct40
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size68.5 KiB
<NA>
6129 
2010/07/30
 
479
2012/06/27
 
357
2012/05/21
 
314
2013/08/22
 
219
Other values (35)
1250 

Length

Max length10
Median length4
Mean length5.7962963
Min length4

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st row2010/07/30
2nd row2010/07/30
3rd row2010/07/30
4th row2010/07/30
5th row2010/07/30

Common Values

ValueCountFrequency (%)
<NA> 6129
70.1%
2010/07/30 479
 
5.5%
2012/06/27 357
 
4.1%
2012/05/21 314
 
3.6%
2013/08/22 219
 
2.5%
2013/09/27 193
 
2.2%
2013/10/01 185
 
2.1%
2013/07/22 80
 
0.9%
2013/07/23 75
 
0.9%
2013/05/31 71
 
0.8%
Other values (30) 646
 
7.4%

Length

2023-12-12T15:05:30.858434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 6129
70.1%
2010/07/30 479
 
5.5%
2012/06/27 357
 
4.1%
2012/05/21 314
 
3.6%
2013/08/22 219
 
2.5%
2013/09/27 193
 
2.2%
2013/10/01 185
 
2.1%
2013/07/22 80
 
0.9%
2013/07/23 75
 
0.9%
2013/05/31 71
 
0.8%
Other values (30) 646
 
7.4%

Interactions

2023-12-12T15:05:28.858636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:05:30.952660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
강의시간(분)업데이트일자
강의시간(분)1.000NaN
업데이트일자NaN1.000
2023-12-12T15:05:31.069359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
강의시간(분)업데이트일자
강의시간(분)1.0001.000
업데이트일자1.0001.000

Missing values

2023-12-12T15:05:29.338874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:05:29.429808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

콘텐츠명강의시간(분)업데이트일자
013.고객응대 기술72010/07/30
114.아웃바운드 고객응대 실전22010/07/30
215.고객거절, 반론, 불만 극복법52010/07/30
316.아웃바운드 판매촉진 실무52010/07/30
417.아웃바운드 콜마케팅 혁신32010/07/30
518.아웃바운드 콜 생산성관리42010/07/30
619.나의 약점, 갭 테라피52010/07/30
720.아웃바운드 성과혁신은 이렇게52010/07/30
81.성과관리의 why와 what152012/05/21
92.목표관리제도(MBO) 바로 알기102012/05/21
콘텐츠명강의시간(분)업데이트일자
873801.용접부 검사 방법 이론 학습6<NA>
873902.굽힘시험 방법 학습6<NA>
8740부가자료_용접부 검사 방법1<NA>
874101.가공품 고정 및 모듈러 공구에 대한 이론10<NA>
874202.밀링에 대한 컴퓨터응용가공산업기사 관련 문제풀이10<NA>
874303.작업공정계획표 작성 실습3<NA>
874404.작업공정계획표 점검2<NA>
8745부가자료_가공품 고정, 작업공정계획표 작성, 점검1<NA>
874601.마스터 캠 프로그램 활용에 대한 이론6<NA>
874702.밀링에 대한 컴퓨터응용가공산업기사 관련 문제풀이10<NA>

Duplicate rows

Most frequently occurring

콘텐츠명강의시간(분)업데이트일자# duplicates
79103_GTAW 히든카드1<NA>36
1181부가자료_GTAW 용접사용 설명서1<NA>32
78603_3D 비밀노트1<NA>20
1154부가자료_3D U1<NA>20
100504.트루용접스토리6<NA>16
101504_3D 비밀노트1<NA>14
44802.컴퓨터응용가공산업기사 관련 문제풀이9<NA>10
100404.트루용접스토리5<NA>10
101704_GTAW 히든카드1<NA>10
36602.밀링에 대한 컴퓨터응용가공산업기사 관련 문제풀이8<NA>8