Overview

Dataset statistics

Number of variables7
Number of observations3571
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory205.9 KiB
Average record size in memory59.0 B

Variable types

Numeric3
Text2
Categorical2

Dataset

Description중소벤처기업 재직 근로자의 장기재직과 자산형성 지원을 위하여 중소벤처기업진흥공단에서 관리하는 내일채움공제 휴가지원사업 참여기업 및 근로자현황
URLhttps://www.data.go.kr/data/15102337/fileData.do

Alerts

참여자연령 is highly overall correlated with 공제참여 유형High correlation
공제참여 유형 is highly overall correlated with 참여자연령High correlation
순번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:24:55.388881
Analysis finished2023-12-12 06:24:57.443213
Duration2.05 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

UNIQUE 

Distinct3571
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1786
Minimum1
Maximum3571
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.5 KiB
2023-12-12T15:24:57.527758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile179.5
Q1893.5
median1786
Q32678.5
95-th percentile3392.5
Maximum3571
Range3570
Interquartile range (IQR)1785

Descriptive statistics

Standard deviation1031.0032
Coefficient of variation (CV)0.57726945
Kurtosis-1.2
Mean1786
Median Absolute Deviation (MAD)893
Skewness0
Sum6377806
Variance1062967.7
MonotonicityStrictly increasing
2023-12-12T15:24:57.755434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2387 1
 
< 0.1%
2376 1
 
< 0.1%
2377 1
 
< 0.1%
2378 1
 
< 0.1%
2379 1
 
< 0.1%
2380 1
 
< 0.1%
2381 1
 
< 0.1%
2382 1
 
< 0.1%
2383 1
 
< 0.1%
Other values (3561) 3561
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3571 1
< 0.1%
3570 1
< 0.1%
3569 1
< 0.1%
3568 1
< 0.1%
3567 1
< 0.1%
3566 1
< 0.1%
3565 1
< 0.1%
3564 1
< 0.1%
3563 1
< 0.1%
3562 1
< 0.1%
Distinct892
Distinct (%)25.0%
Missing0
Missing (%)0.0%
Memory size28.0 KiB
2023-12-12T15:24:58.037850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length16
Mean length8.1820218
Min length2

Characters and Unicode

Total characters29218
Distinct characters512
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique314 ?
Unique (%)8.8%

Sample

1st row지엔터프라이즈주식회사
2nd row주식회사 바이오톡스텍
3rd row주식회사 바이오톡스텍
4th row김기수수안과의원
5th row(주)스카이테라퓨틱스
ValueCountFrequency (%)
주식회사 654
 
15.4%
선인 94
 
2.2%
광주글로벌모터스 79
 
1.9%
바이오톡스텍 74
 
1.7%
위세아이텍 52
 
1.2%
주)코리아보드게임즈 47
 
1.1%
공간종합건축사사무소 42
 
1.0%
테라핀 41
 
1.0%
제이제이툴스 33
 
0.8%
주에어코드 30
 
0.7%
Other values (892) 3112
73.1%
2023-12-12T15:24:58.584701image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2540
 
8.7%
1448
 
5.0%
) 1284
 
4.4%
( 1281
 
4.4%
1142
 
3.9%
1071
 
3.7%
1008
 
3.4%
981
 
3.4%
687
 
2.4%
534
 
1.8%
Other values (502) 17242
59.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 25603
87.6%
Close Punctuation 1284
 
4.4%
Open Punctuation 1281
 
4.4%
Space Separator 687
 
2.4%
Uppercase Letter 156
 
0.5%
Lowercase Letter 136
 
0.5%
Other Punctuation 44
 
0.2%
Decimal Number 20
 
0.1%
Other Symbol 6
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2540
 
9.9%
1448
 
5.7%
1142
 
4.5%
1071
 
4.2%
1008
 
3.9%
981
 
3.8%
534
 
2.1%
433
 
1.7%
371
 
1.4%
331
 
1.3%
Other values (462) 15744
61.5%
Uppercase Letter
ValueCountFrequency (%)
C 35
22.4%
E 16
10.3%
S 15
9.6%
M 13
 
8.3%
B 10
 
6.4%
K 9
 
5.8%
T 9
 
5.8%
G 8
 
5.1%
L 7
 
4.5%
R 7
 
4.5%
Other values (7) 27
17.3%
Lowercase Letter
ValueCountFrequency (%)
o 44
32.4%
r 23
16.9%
p 20
14.7%
t 14
 
10.3%
l 11
 
8.1%
c 9
 
6.6%
m 7
 
5.1%
i 4
 
2.9%
f 4
 
2.9%
Decimal Number
ValueCountFrequency (%)
3 4
20.0%
2 4
20.0%
4 3
15.0%
5 3
15.0%
6 3
15.0%
0 2
10.0%
1 1
 
5.0%
Other Punctuation
ValueCountFrequency (%)
. 37
84.1%
& 7
 
15.9%
Close Punctuation
ValueCountFrequency (%)
) 1284
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1281
100.0%
Space Separator
ValueCountFrequency (%)
687
100.0%
Other Symbol
ValueCountFrequency (%)
6
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 25604
87.6%
Common 3317
 
11.4%
Latin 292
 
1.0%
Han 5
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2540
 
9.9%
1448
 
5.7%
1142
 
4.5%
1071
 
4.2%
1008
 
3.9%
981
 
3.8%
534
 
2.1%
433
 
1.7%
371
 
1.4%
331
 
1.3%
Other values (462) 15745
61.5%
Latin
ValueCountFrequency (%)
o 44
15.1%
C 35
 
12.0%
r 23
 
7.9%
p 20
 
6.8%
E 16
 
5.5%
S 15
 
5.1%
t 14
 
4.8%
M 13
 
4.5%
l 11
 
3.8%
B 10
 
3.4%
Other values (16) 91
31.2%
Common
ValueCountFrequency (%)
) 1284
38.7%
( 1281
38.6%
687
20.7%
. 37
 
1.1%
& 7
 
0.2%
3 4
 
0.1%
2 4
 
0.1%
4 3
 
0.1%
5 3
 
0.1%
6 3
 
0.1%
Other values (3) 4
 
0.1%
Han
ValueCountFrequency (%)
5
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 25598
87.6%
ASCII 3609
 
12.4%
None 6
 
< 0.1%
CJK 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2540
 
9.9%
1448
 
5.7%
1142
 
4.5%
1071
 
4.2%
1008
 
3.9%
981
 
3.8%
534
 
2.1%
433
 
1.7%
371
 
1.4%
331
 
1.3%
Other values (461) 15739
61.5%
ASCII
ValueCountFrequency (%)
) 1284
35.6%
( 1281
35.5%
687
19.0%
o 44
 
1.2%
. 37
 
1.0%
C 35
 
1.0%
r 23
 
0.6%
p 20
 
0.6%
E 16
 
0.4%
S 15
 
0.4%
Other values (29) 167
 
4.6%
None
ValueCountFrequency (%)
6
100.0%
CJK
ValueCountFrequency (%)
5
100.0%

지역 구분
Categorical

Distinct17
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size28.0 KiB
서울
1474 
경기
627 
대전
194 
충남
191 
광주
179 
Other values (12)
906 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row충북
3rd row충북
4th row제주
5th row서울

Common Values

ValueCountFrequency (%)
서울 1474
41.3%
경기 627
17.6%
대전 194
 
5.4%
충남 191
 
5.3%
광주 179
 
5.0%
경남 160
 
4.5%
충북 143
 
4.0%
인천 140
 
3.9%
부산 136
 
3.8%
울산 77
 
2.2%
Other values (7) 250
 
7.0%

Length

2023-12-12T15:24:58.761813image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울 1474
41.3%
경기 627
17.6%
대전 194
 
5.4%
충남 191
 
5.3%
광주 179
 
5.0%
경남 160
 
4.5%
충북 143
 
4.0%
인천 140
 
3.9%
부산 136
 
3.8%
울산 77
 
2.2%
Other values (7) 250
 
7.0%

사업자번호
Real number (ℝ)

Distinct895
Distinct (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2509567 × 109
Minimum1.0101548 × 109
Maximum8.9395002 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.5 KiB
2023-12-12T15:24:58.940898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0101548 × 109
5-th percentile1.0588137 × 109
Q11.2986937 × 109
median2.4681007 × 109
Q34.3786015 × 109
95-th percentile7.7433003 × 109
Maximum8.9395002 × 109
Range7.9293453 × 109
Interquartile range (IQR)3.0799078 × 109

Descriptive statistics

Standard deviation2.1531991 × 109
Coefficient of variation (CV)0.66232783
Kurtosis-0.22742792
Mean3.2509567 × 109
Median Absolute Deviation (MAD)1.2668906 × 109
Skewness0.92763004
Sum1.1609166 × 1013
Variance4.6362662 × 1018
MonotonicityNot monotonic
2023-12-12T15:24:59.178715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3128123021 94
 
2.6%
4378601471 79
 
2.2%
3018145781 74
 
2.1%
2148121976 52
 
1.5%
1288191230 47
 
1.3%
1018104991 42
 
1.2%
2618801446 41
 
1.1%
1198627364 33
 
0.9%
1058197094 30
 
0.8%
2208153250 29
 
0.8%
Other values (885) 3050
85.4%
ValueCountFrequency (%)
1010154811 1
 
< 0.1%
1014295362 2
 
0.1%
1018104991 42
1.2%
1018603978 6
 
0.2%
1018608274 2
 
0.1%
1018623407 1
 
< 0.1%
1018639200 25
0.7%
1018649028 1
 
< 0.1%
1018664053 10
 
0.3%
1018683711 1
 
< 0.1%
ValueCountFrequency (%)
8939500150 3
 
0.1%
8938600503 12
0.3%
8898100270 4
 
0.1%
8848801026 2
 
0.1%
8848601599 5
0.1%
8832100375 2
 
0.1%
8828501373 1
 
< 0.1%
8818801672 3
 
0.1%
8818601038 5
0.1%
8808602183 1
 
< 0.1%
Distinct96
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size28.0 KiB
2023-12-12T15:24:59.526770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters10713
Distinct characters97
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)0.6%

Sample

1st row제OO
2nd row임OO
3rd row장OO
4th row변OO
5th row신OO
ValueCountFrequency (%)
김oo 747
20.9%
이oo 479
 
13.4%
박oo 277
 
7.8%
정oo 193
 
5.4%
최oo 182
 
5.1%
조oo 93
 
2.6%
임oo 92
 
2.6%
강oo 82
 
2.3%
윤oo 80
 
2.2%
유oo 78
 
2.2%
Other values (86) 1268
35.5%
2023-12-12T15:24:59.951048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
O 7142
66.7%
747
 
7.0%
479
 
4.5%
277
 
2.6%
193
 
1.8%
182
 
1.7%
93
 
0.9%
92
 
0.9%
82
 
0.8%
80
 
0.7%
Other values (87) 1346
 
12.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 7144
66.7%
Other Letter 3569
33.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
747
20.9%
479
 
13.4%
277
 
7.8%
193
 
5.4%
182
 
5.1%
93
 
2.6%
92
 
2.6%
82
 
2.3%
80
 
2.2%
78
 
2.2%
Other values (84) 1266
35.5%
Uppercase Letter
ValueCountFrequency (%)
O 7142
> 99.9%
K 1
 
< 0.1%
L 1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 7144
66.7%
Hangul 3569
33.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
747
20.9%
479
 
13.4%
277
 
7.8%
193
 
5.4%
182
 
5.1%
93
 
2.6%
92
 
2.6%
82
 
2.3%
80
 
2.2%
78
 
2.2%
Other values (84) 1266
35.5%
Latin
ValueCountFrequency (%)
O 7142
> 99.9%
K 1
 
< 0.1%
L 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7144
66.7%
Hangul 3569
33.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
O 7142
> 99.9%
K 1
 
< 0.1%
L 1
 
< 0.1%
Hangul
ValueCountFrequency (%)
747
20.9%
479
 
13.4%
277
 
7.8%
193
 
5.4%
182
 
5.1%
93
 
2.6%
92
 
2.6%
82
 
2.3%
80
 
2.2%
78
 
2.2%
Other values (84) 1266
35.5%

참여자연령
Real number (ℝ)

HIGH CORRELATION 

Distinct49
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.488379
Minimum19
Maximum71
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size31.5 KiB
2023-12-12T15:25:00.145327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile24
Q128
median31
Q336
95-th percentile46
Maximum71
Range52
Interquartile range (IQR)8

Descriptive statistics

Standard deviation6.8381093
Coefficient of variation (CV)0.21047863
Kurtosis2.798473
Mean32.488379
Median Absolute Deviation (MAD)4
Skewness1.376545
Sum116016
Variance46.759739
MonotonicityNot monotonic
2023-12-12T15:25:00.364016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
27 294
 
8.2%
28 248
 
6.9%
31 248
 
6.9%
30 245
 
6.9%
29 244
 
6.8%
26 231
 
6.5%
32 221
 
6.2%
33 190
 
5.3%
34 184
 
5.2%
35 176
 
4.9%
Other values (39) 1290
36.1%
ValueCountFrequency (%)
19 1
 
< 0.1%
20 8
 
0.2%
21 21
 
0.6%
22 21
 
0.6%
23 46
 
1.3%
24 108
 
3.0%
25 146
4.1%
26 231
6.5%
27 294
8.2%
28 248
6.9%
ValueCountFrequency (%)
71 1
 
< 0.1%
68 1
 
< 0.1%
67 1
 
< 0.1%
65 2
0.1%
63 4
0.1%
62 1
 
< 0.1%
61 4
0.1%
60 4
0.1%
59 2
0.1%
58 4
0.1%

공제참여 유형
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size28.0 KiB
청년재직자 내일채움공제
1655 
청년내일채움공제
1409 
내일채움공제
507 

Length

Max length12
Median length8
Mean length9.5698684
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row청년내일채움공제
2nd row청년재직자 내일채움공제
3rd row청년내일채움공제
4th row청년재직자 내일채움공제
5th row청년재직자 내일채움공제

Common Values

ValueCountFrequency (%)
청년재직자 내일채움공제 1655
46.3%
청년내일채움공제 1409
39.5%
내일채움공제 507
 
14.2%

Length

2023-12-12T15:25:00.548451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:25:00.672339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
내일채움공제 2162
41.4%
청년재직자 1655
31.7%
청년내일채움공제 1409
27.0%

Interactions

2023-12-12T15:24:56.683822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.037135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.360710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.775003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.141889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.480945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.869275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.269922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:24:56.584767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:25:00.762085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번지역 구분사업자번호참여 근로자명참여자연령공제참여 유형
순번1.0000.2610.2230.1650.0800.089
지역 구분0.2611.0000.7510.0000.1530.319
사업자번호0.2230.7511.0000.0000.1890.227
참여 근로자명0.1650.0000.0001.0000.0000.059
참여자연령0.0800.1530.1890.0001.0000.773
공제참여 유형0.0890.3190.2270.0590.7731.000
2023-12-12T15:25:00.871795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역 구분공제참여 유형
지역 구분1.0000.181
공제참여 유형0.1811.000
2023-12-12T15:25:00.964775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번사업자번호참여자연령지역 구분공제참여 유형
순번1.0000.096-0.0410.1040.053
사업자번호0.0961.000-0.0570.4120.139
참여자연령-0.041-0.0571.0000.0600.652
지역 구분0.1040.4120.0601.0000.181
공제참여 유형0.0530.1390.6520.1811.000

Missing values

2023-12-12T15:24:57.250214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:24:57.393594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순번기업명지역 구분사업자번호참여 근로자명참여자연령공제참여 유형
01지엔터프라이즈주식회사서울2368800397제OO27청년내일채움공제
12주식회사 바이오톡스텍충북3018145781임OO31청년재직자 내일채움공제
23주식회사 바이오톡스텍충북3018145781장OO25청년내일채움공제
34김기수수안과의원제주6169277783변OO32청년재직자 내일채움공제
45(주)스카이테라퓨틱스서울1948101619신OO35청년재직자 내일채움공제
56주식회사 테라핀서울2618801446김OO34청년내일채움공제
67김기수수안과의원제주6169277783박OO37청년재직자 내일채움공제
78(주)에스피구조안전기술사사무소서울1078633238김OO31청년재직자 내일채움공제
89주식회사앱포스터서울1078739753김OO25청년내일채움공제
910주식회사앱포스터서울1078739753예OO28청년내일채움공제
순번기업명지역 구분사업자번호참여 근로자명참여자연령공제참여 유형
35613562(주)동수기계경기5188601111최OO25내일채움공제
35623563권승훈치과의원경북1809400794민OO31청년재직자 내일채움공제
35633564성모용정형외과서울1329893083이OO28청년내일채움공제
35643565서울베스트치과서울4313400169이OO30청년재직자 내일채움공제
35653566지수치과교정과치과의원대구3059252295김OO26청년재직자 내일채움공제
35663567(주)바이텍정보통신경기1208139623문OO30청년내일채움공제
35673568(주)쎌텍충남1588801077박OO35청년내일채움공제
35683569(주)세연이앤에스광주4098653125김OO30청년재직자 내일채움공제
35693570잼손경기6758801965권OO48내일채움공제
35703571주식회사나아바코리아경기4288600949채OO27청년내일채움공제