Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells242
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text2
DateTime1
Categorical1

Dataset

Description한국기술교육대학교 온라인평생교육원 스마트 직업훈련 플랫폼 (STEP)에 대한 사용자 기업명 내용을 제공합니다.
Author한국기술교육대학교
URLhttps://www.data.go.kr/data/15091047/fileData.do

Alerts

등록 국가 is highly imbalanced (88.5%)Imbalance
회사명 has 242 (2.4%) missing valuesMissing

Reproduction

Analysis started2024-04-17 09:54:08.630972
Analysis finished2024-04-17 09:54:09.192588
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

회사명
Text

MISSING 

Distinct5116
Distinct (%)52.4%
Missing242
Missing (%)2.4%
Memory size156.2 KiB
2024-04-17T18:54:09.375301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length41
Median length28
Mean length6.9821685
Min length1

Characters and Unicode

Total characters68132
Distinct characters748
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4137 ?
Unique (%)42.4%

Sample

1st row(주)덕산코트랜
2nd row어보브반도체
3rd row제일교육학원제일요양보호사교육원
4th row마두간호학원
5th row(주)진화이앤씨
ValueCountFrequency (%)
㈜삼성디스플레이 291
 
2.7%
인천교통공사 208
 
2.0%
주식회사 180
 
1.7%
주)미래컴퍼니 113
 
1.1%
세메스 88
 
0.8%
유라코퍼레이션 84
 
0.8%
주)테라세미콘 80
 
0.8%
에스에프에이 73
 
0.7%
케이씨텍 68
 
0.6%
주)캐스트이즈 62
 
0.6%
Other values (5237) 9413
88.3%
2024-04-17T18:54:09.798632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2740
 
4.0%
2411
 
3.5%
2201
 
3.2%
) 1938
 
2.8%
( 1898
 
2.8%
1304
 
1.9%
1249
 
1.8%
1239
 
1.8%
1238
 
1.8%
1021
 
1.5%
Other values (738) 50893
74.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60121
88.2%
Close Punctuation 1938
 
2.8%
Open Punctuation 1898
 
2.8%
Uppercase Letter 1809
 
2.7%
Space Separator 965
 
1.4%
Lowercase Letter 821
 
1.2%
Other Symbol 393
 
0.6%
Decimal Number 122
 
0.2%
Other Punctuation 47
 
0.1%
Dash Punctuation 15
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2740
 
4.6%
2411
 
4.0%
2201
 
3.7%
1304
 
2.2%
1249
 
2.1%
1239
 
2.1%
1238
 
2.1%
1021
 
1.7%
1019
 
1.7%
987
 
1.6%
Other values (665) 44712
74.4%
Uppercase Letter
ValueCountFrequency (%)
S 211
11.7%
T 165
 
9.1%
C 163
 
9.0%
K 151
 
8.3%
E 116
 
6.4%
H 105
 
5.8%
M 104
 
5.7%
I 100
 
5.5%
A 94
 
5.2%
B 91
 
5.0%
Other values (16) 509
28.1%
Lowercase Letter
ValueCountFrequency (%)
e 116
14.1%
s 98
11.9%
t 61
 
7.4%
n 58
 
7.1%
o 58
 
7.1%
c 52
 
6.3%
m 48
 
5.8%
r 43
 
5.2%
i 42
 
5.1%
a 41
 
5.0%
Other values (15) 204
24.8%
Decimal Number
ValueCountFrequency (%)
1 45
36.9%
2 25
20.5%
0 15
 
12.3%
3 13
 
10.7%
7 7
 
5.7%
5 6
 
4.9%
9 4
 
3.3%
6 4
 
3.3%
4 2
 
1.6%
8 1
 
0.8%
Other Punctuation
ValueCountFrequency (%)
. 19
40.4%
& 10
21.3%
/ 10
21.3%
, 4
 
8.5%
: 2
 
4.3%
' 2
 
4.3%
Close Punctuation
ValueCountFrequency (%)
) 1938
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1898
100.0%
Space Separator
ValueCountFrequency (%)
965
100.0%
Other Symbol
ValueCountFrequency (%)
393
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 15
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60513
88.8%
Common 4988
 
7.3%
Latin 2630
 
3.9%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2740
 
4.5%
2411
 
4.0%
2201
 
3.6%
1304
 
2.2%
1249
 
2.1%
1239
 
2.0%
1238
 
2.0%
1021
 
1.7%
1019
 
1.7%
987
 
1.6%
Other values (665) 45104
74.5%
Latin
ValueCountFrequency (%)
S 211
 
8.0%
T 165
 
6.3%
C 163
 
6.2%
K 151
 
5.7%
e 116
 
4.4%
E 116
 
4.4%
H 105
 
4.0%
M 104
 
4.0%
I 100
 
3.8%
s 98
 
3.7%
Other values (41) 1301
49.5%
Common
ValueCountFrequency (%)
) 1938
38.9%
( 1898
38.1%
965
19.3%
1 45
 
0.9%
2 25
 
0.5%
. 19
 
0.4%
0 15
 
0.3%
- 15
 
0.3%
3 13
 
0.3%
& 10
 
0.2%
Other values (11) 45
 
0.9%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60120
88.2%
ASCII 7618
 
11.2%
None 393
 
0.6%
CJK 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2740
 
4.6%
2411
 
4.0%
2201
 
3.7%
1304
 
2.2%
1249
 
2.1%
1239
 
2.1%
1238
 
2.1%
1021
 
1.7%
1019
 
1.7%
987
 
1.6%
Other values (664) 44711
74.4%
ASCII
ValueCountFrequency (%)
) 1938
25.4%
( 1898
24.9%
965
12.7%
S 211
 
2.8%
T 165
 
2.2%
C 163
 
2.1%
K 151
 
2.0%
e 116
 
1.5%
E 116
 
1.5%
H 105
 
1.4%
Other values (62) 1790
23.5%
None
ValueCountFrequency (%)
393
100.0%
CJK
ValueCountFrequency (%)
1
100.0%
Distinct5478
Distinct (%)54.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-17T18:54:10.012103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length10
Mean length10.4519
Min length1

Characters and Unicode

Total characters104519
Distinct characters17
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4470 ?
Unique (%)44.7%

Sample

1st row5048135674
2nd row1208526960
3rd row6169206236
4th row1289260072
5th row114-81-31024
ValueCountFrequency (%)
1398202409 141
 
1.4%
1248194031 131
 
1.3%
2118196221 107
 
1.1%
6098135227 69
 
0.7%
139-82-02409 67
 
0.7%
2208193209 62
 
0.6%
312-81-13969 62
 
0.6%
1428145237 59
 
0.6%
1248198532 59
 
0.6%
3148117170 53
 
0.5%
Other values (5484) 9218
91.9%
2024-04-17T18:54:10.562327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 19560
18.7%
8 13240
12.7%
2 12356
11.8%
0 11894
11.4%
3 8879
8.5%
4 7943
7.6%
5 7087
 
6.8%
6 7031
 
6.7%
9 6870
 
6.6%
7 4991
 
4.8%
Other values (7) 4668
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 99851
95.5%
Dash Punctuation 4613
 
4.4%
Space Separator 30
 
< 0.1%
Other Punctuation 13
 
< 0.1%
Uppercase Letter 11
 
< 0.1%
Other Letter 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 19560
19.6%
8 13240
13.3%
2 12356
12.4%
0 11894
11.9%
3 8879
8.9%
4 7943
8.0%
5 7087
 
7.1%
6 7031
 
7.0%
9 6870
 
6.9%
7 4991
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 10
76.9%
* 3
 
23.1%
Uppercase Letter
ValueCountFrequency (%)
E 10
90.9%
S 1
 
9.1%
Dash Punctuation
ValueCountFrequency (%)
- 4613
100.0%
Space Separator
ValueCountFrequency (%)
30
100.0%
Other Letter
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 104507
> 99.9%
Latin 11
 
< 0.1%
Hangul 1
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 19560
18.7%
8 13240
12.7%
2 12356
11.8%
0 11894
11.4%
3 8879
8.5%
4 7943
7.6%
5 7087
 
6.8%
6 7031
 
6.7%
9 6870
 
6.6%
7 4991
 
4.8%
Other values (4) 4656
 
4.5%
Latin
ValueCountFrequency (%)
E 10
90.9%
S 1
 
9.1%
Hangul
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 104518
> 99.9%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 19560
18.7%
8 13240
12.7%
2 12356
11.8%
0 11894
11.4%
3 8879
8.5%
4 7943
7.6%
5 7087
 
6.8%
6 7031
 
6.7%
9 6870
 
6.6%
7 4991
 
4.8%
Other values (6) 4667
 
4.5%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
Distinct8104
Distinct (%)81.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2014-09-15 09:59:20
Maximum2019-09-11 21:39:54
2024-04-17T18:54:10.693463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T18:54:10.818904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

등록 국가
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
KR
9571 
US
 
414
UNKNOWN
 
12
CN
 
2
GB
 
1

Length

Max length7
Median length2
Mean length2.006
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowKR
2nd rowKR
3rd rowKR
4th rowKR
5th rowKR

Common Values

ValueCountFrequency (%)
KR 9571
95.7%
US 414
 
4.1%
UNKNOWN 12
 
0.1%
CN 2
 
< 0.1%
GB 1
 
< 0.1%

Length

2024-04-17T18:54:10.939505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T18:54:11.029959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kr 9571
95.7%
us 414
 
4.1%
unknown 12
 
0.1%
cn 2
 
< 0.1%
gb 1
 
< 0.1%

Missing values

2024-04-17T18:54:09.064034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T18:54:09.142542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

회사명사업자 등록 번호등록 일시등록 국가
4860(주)덕산코트랜50481356742016-07-12 10:00:09KR
6075어보브반도체12085269602017-01-19 17:27:59KR
7521제일교육학원제일요양보호사교육원61692062362017-05-25 18:19:07KR
5743마두간호학원12892600722016-11-17 14:14:39KR
14129(주)진화이앤씨114-81-310242019-08-21 10:00:12KR
595에이치에스엘 일렉트로닉스50481353202015-06-12 08:37:02KR
3064(주)진넷시스템11386471732015-12-30 08:58:21KR
3872한국산업경영자문40981840212016-03-08 20:18:35KR
1633대한문화20129677742015-10-05 11:43:44KR
2284㈜삼성디스플레이14281454492015-11-05 11:27:52KR
회사명사업자 등록 번호등록 일시등록 국가
9678엠오에스충청31481710582018-01-30 13:39:05KR
7933이너비즈39681007112017-06-15 10:41:15KR
7136KT10281429452017-05-12 10:51:35KR
2455ABB코리아12081045892015-11-05 17:29:23KR
6432주식회사 테라세미콘124-81-940312017-02-28 13:26:13KR
3943이시다매뉴팩쳐링코리아13081671982016-03-14 09:52:27KR
7909월정(주) 나무소리409-86-325472017-06-13 15:12:10KR
2683한국다우코닝10681214142015-11-25 20:28:13KR
9341이오시스템13781091082017-12-11 11:12:57KR
480피에스케이12581068792015-05-22 11:26:14KR