Overview

Dataset statistics

Number of variables6
Number of observations1194
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory56.1 KiB
Average record size in memory48.1 B

Variable types

Text3
Categorical3

Dataset

Description한국산업인력공단 외국인근로자가 자주 쓰는 외국어 정보(중국어)로 외국인근로자가 자주 사용하는 중국어 문장을 제공합니다.
URLhttps://www.data.go.kr/data/15050964/fileData.do

Alerts

대분류 is highly overall correlated with 대분류코드 and 1 other fieldsHigh correlation
대분류코드 is highly overall correlated with 대분류 and 1 other fieldsHigh correlation
소분류 is highly overall correlated with 대분류코드 and 1 other fieldsHigh correlation
대분류코드 is highly imbalanced (99.0%)Imbalance
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:06:08.890372
Analysis finished2023-12-12 23:06:09.886922
Duration1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Text

UNIQUE 

Distinct1194
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size9.5 KiB
2023-12-13T08:06:10.247008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length3
Mean length3.0778894
Min length1

Characters and Unicode

Total characters3675
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1194 ?
Unique (%)100.0%

Sample

1st row1
2nd row2
3rd row3
4th row4
5th row5
ValueCountFrequency (%)
1 1
 
0.1%
801 1
 
0.1%
799 1
 
0.1%
798 1
 
0.1%
797 1
 
0.1%
796 1
 
0.1%
795 1
 
0.1%
794 1
 
0.1%
793 1
 
0.1%
792 1
 
0.1%
Other values (1185) 1185
99.2%
2023-12-13T08:06:10.836123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 630
17.1%
2 340
9.3%
4 339
9.2%
3 339
9.2%
5 339
9.2%
8 339
9.2%
7 339
9.2%
6 338
9.2%
9 334
9.1%
0 329
9.0%
Other values (9) 9
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3666
99.8%
Other Letter 8
 
0.2%
Space Separator 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 630
17.2%
2 340
9.3%
4 339
9.2%
3 339
9.2%
5 339
9.2%
8 339
9.2%
7 339
9.2%
6 338
9.2%
9 334
9.1%
0 329
9.0%
Other Letter
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3667
99.8%
Hangul 8
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
1 630
17.2%
2 340
9.3%
4 339
9.2%
3 339
9.2%
5 339
9.2%
8 339
9.2%
7 339
9.2%
6 338
9.2%
9 334
9.1%
0 329
9.0%
Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3667
99.8%
Hangul 8
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 630
17.2%
2 340
9.3%
4 339
9.2%
3 339
9.2%
5 339
9.2%
8 339
9.2%
7 339
9.2%
6 338
9.2%
9 334
9.1%
0 329
9.0%
Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

대분류코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size9.5 KiB
4
1193 
휴게휴일에 관한 규정은 적용받지 아니함."
 
1

Length

Max length24
Median length1
Mean length1.019263
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row4
2nd row4
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
4 1193
99.9%
휴게휴일에 관한 규정은 적용받지 아니함." 1
 
0.1%

Length

2023-12-13T08:06:10.948289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:06:11.027204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4 1193
99.6%
휴게휴일에 1
 
0.1%
관한 1
 
0.1%
규정은 1
 
0.1%
적용받지 1
 
0.1%
아니함 1
 
0.1%

대분류
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size9.5 KiB
일상생활
608 
작업지시
209 
근로관련
186 
기숙사및식당
121 
고용관련신고
69 

Length

Max length68
Median length4
Mean length4.3718593
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row일상생활
2nd row일상생활
3rd row일상생활
4th row일상생활
5th row일상생활

Common Values

ValueCountFrequency (%)
일상생활 608
50.9%
작업지시 209
 
17.5%
근로관련 186
 
15.6%
기숙사및식당 121
 
10.1%
고용관련신고 69
 
5.8%
라오뚱 뾰오준파 띠61탸오 눙린, 쒸찬, 양찬, 쉬이찬 쓰예더 칭쾅, 부쓰융 퉁 파뤼쌍더 라오뚱 스잰, 쓔시르 썅관 뀌이띵 1
 
0.1%

Length

2023-12-13T08:06:11.118946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:06:11.241374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일상생활 608
50.2%
작업지시 209
 
17.3%
근로관련 186
 
15.4%
기숙사및식당 121
 
10.0%
고용관련신고 69
 
5.7%
라오뚱 2
 
0.2%
칭쾅 1
 
0.1%
썅관 1
 
0.1%
쓔시르 1
 
0.1%
스잰 1
 
0.1%
Other values (11) 11
 
0.9%

소분류
Categorical

HIGH CORRELATION 

Distinct30
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size9.5 KiB
기타
334 
근무태도
84 
건강,병원
83 
급여,수당
66 
기숙사규칙
58 
Other values (25)
569 

Length

Max length7
Median length6
Mean length4.1289782
Min length2

Unique

Unique3 ?
Unique (%)0.3%

Sample

1st row인사,소개
2nd row인사,소개
3rd row인사,소개
4th row인사,소개
5th row인사,소개

Common Values

ValueCountFrequency (%)
기타 334
28.0%
근무태도 84
 
7.0%
건강,병원 83
 
7.0%
급여,수당 66
 
5.5%
기숙사규칙 58
 
4.9%
시장,교통 47
 
3.9%
안전규칙 46
 
3.9%
음식,식생활 46
 
3.9%
작업규칙등기타 45
 
3.8%
기타사항 44
 
3.7%
Other values (20) 341
28.6%

Length

2023-12-13T08:06:11.360572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기타 334
28.0%
근무태도 84
 
7.0%
건강,병원 83
 
7.0%
급여,수당 66
 
5.5%
기숙사규칙 58
 
4.9%
시장,교통 47
 
3.9%
안전규칙 46
 
3.9%
음식,식생활 46
 
3.9%
작업규칙등기타 45
 
3.8%
기타사항 44
 
3.7%
Other values (20) 341
28.6%
Distinct1190
Distinct (%)99.7%
Missing1
Missing (%)0.1%
Memory size9.5 KiB
2023-12-13T08:06:11.703328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length63
Median length34
Mean length14.173512
Min length1

Characters and Unicode

Total characters16909
Distinct characters615
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1187 ?
Unique (%)99.5%

Sample

1st row고맙습니다.
2nd row그동안 수고하셨습니다
3rd row다음에 또 뵈요
4th row다음에 오겠습니다.
5th row당신을 잊지 못할 것 입니다
ValueCountFrequency (%)
47
 
1.2%
합니다 36
 
0.9%
있습니다 34
 
0.9%
32
 
0.8%
마세요 28
 
0.7%
하세요 28
 
0.7%
입니다 26
 
0.7%
21
 
0.5%
안됩니다 19
 
0.5%
18
 
0.5%
Other values (2400) 3597
92.6%
2023-12-13T08:06:12.177054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2969
 
17.6%
572
 
3.4%
522
 
3.1%
. 472
 
2.8%
463
 
2.7%
359
 
2.1%
314
 
1.9%
293
 
1.7%
261
 
1.5%
251
 
1.5%
Other values (605) 10433
61.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 13005
76.9%
Space Separator 2969
 
17.6%
Other Punctuation 738
 
4.4%
Decimal Number 138
 
0.8%
Uppercase Letter 27
 
0.2%
Math Symbol 11
 
0.1%
Close Punctuation 7
 
< 0.1%
Open Punctuation 7
 
< 0.1%
Dash Punctuation 4
 
< 0.1%
Lowercase Letter 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
572
 
4.4%
522
 
4.0%
463
 
3.6%
359
 
2.8%
314
 
2.4%
293
 
2.3%
261
 
2.0%
251
 
1.9%
224
 
1.7%
215
 
1.7%
Other values (573) 9531
73.3%
Decimal Number
ValueCountFrequency (%)
0 45
32.6%
1 26
18.8%
2 17
 
12.3%
9 10
 
7.2%
3 10
 
7.2%
4 8
 
5.8%
5 7
 
5.1%
8 6
 
4.3%
6 6
 
4.3%
7 3
 
2.2%
Other Punctuation
ValueCountFrequency (%)
. 472
64.0%
? 233
31.6%
, 22
 
3.0%
/ 6
 
0.8%
* 1
 
0.1%
1
 
0.1%
% 1
 
0.1%
' 1
 
0.1%
! 1
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
O 22
81.5%
C 2
 
7.4%
E 1
 
3.7%
A 1
 
3.7%
S 1
 
3.7%
Lowercase Letter
ValueCountFrequency (%)
v 1
33.3%
t 1
33.3%
d 1
33.3%
Space Separator
ValueCountFrequency (%)
2969
100.0%
Math Symbol
ValueCountFrequency (%)
~ 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 7
100.0%
Open Punctuation
ValueCountFrequency (%)
( 7
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 13005
76.9%
Common 3874
 
22.9%
Latin 30
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
572
 
4.4%
522
 
4.0%
463
 
3.6%
359
 
2.8%
314
 
2.4%
293
 
2.3%
261
 
2.0%
251
 
1.9%
224
 
1.7%
215
 
1.7%
Other values (573) 9531
73.3%
Common
ValueCountFrequency (%)
2969
76.6%
. 472
 
12.2%
? 233
 
6.0%
0 45
 
1.2%
1 26
 
0.7%
, 22
 
0.6%
2 17
 
0.4%
~ 11
 
0.3%
9 10
 
0.3%
3 10
 
0.3%
Other values (14) 59
 
1.5%
Latin
ValueCountFrequency (%)
O 22
73.3%
C 2
 
6.7%
v 1
 
3.3%
t 1
 
3.3%
E 1
 
3.3%
d 1
 
3.3%
A 1
 
3.3%
S 1
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12866
76.1%
ASCII 3903
 
23.1%
Compat Jamo 139
 
0.8%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2969
76.1%
. 472
 
12.1%
? 233
 
6.0%
0 45
 
1.2%
1 26
 
0.7%
, 22
 
0.6%
O 22
 
0.6%
2 17
 
0.4%
~ 11
 
0.3%
9 10
 
0.3%
Other values (21) 76
 
1.9%
Hangul
ValueCountFrequency (%)
572
 
4.4%
522
 
4.1%
463
 
3.6%
359
 
2.8%
314
 
2.4%
293
 
2.3%
261
 
2.0%
251
 
2.0%
224
 
1.7%
215
 
1.7%
Other values (572) 9392
73.0%
Compat Jamo
ValueCountFrequency (%)
139
100.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

발음
Text

Distinct1179
Distinct (%)98.9%
Missing2
Missing (%)0.2%
Memory size9.5 KiB
2023-12-13T08:06:12.454623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length1023
Median length36
Mean length12.34396
Min length1

Characters and Unicode

Total characters14714
Distinct characters384
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1166 ?
Unique (%)97.8%

Sample

1st row쎄쎄
2nd row쩌똰 스잰 신쿠러
3rd row쌰츠 짜이잰
4th row쌰츠 짜이라이
5th row워 부후이 왕찌 니더
ValueCountFrequency (%)
238
 
5.4%
202
 
4.6%
짜이 65
 
1.5%
궁즈 45
 
1.0%
38
 
0.9%
37
 
0.8%
부요우 36
 
0.8%
후이 31
 
0.7%
궁쭤 30
 
0.7%
쉬요우 30
 
0.7%
Other values (1926) 3620
82.8%
2023-12-13T08:06:12.948866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4221
28.7%
867
 
5.9%
406
 
2.8%
317
 
2.2%
251
 
1.7%
? 234
 
1.6%
192
 
1.3%
O 187
 
1.3%
181
 
1.2%
175
 
1.2%
Other values (374) 7683
52.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9831
66.8%
Space Separator 4221
28.7%
Other Punctuation 372
 
2.5%
Uppercase Letter 196
 
1.3%
Decimal Number 61
 
0.4%
Math Symbol 11
 
0.1%
Close Punctuation 9
 
0.1%
Open Punctuation 9
 
0.1%
Dash Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
867
 
8.8%
406
 
4.1%
317
 
3.2%
251
 
2.6%
192
 
2.0%
181
 
1.8%
175
 
1.8%
168
 
1.7%
167
 
1.7%
137
 
1.4%
Other values (346) 6970
70.9%
Decimal Number
ValueCountFrequency (%)
1 14
23.0%
0 14
23.0%
2 10
16.4%
9 8
13.1%
4 4
 
6.6%
6 3
 
4.9%
3 3
 
4.9%
8 2
 
3.3%
5 2
 
3.3%
7 1
 
1.6%
Other Punctuation
ValueCountFrequency (%)
? 234
62.9%
, 115
30.9%
. 13
 
3.5%
/ 7
 
1.9%
% 1
 
0.3%
* 1
 
0.3%
! 1
 
0.3%
Uppercase Letter
ValueCountFrequency (%)
O 187
95.4%
N 3
 
1.5%
H 3
 
1.5%
E 1
 
0.5%
S 1
 
0.5%
A 1
 
0.5%
Space Separator
ValueCountFrequency (%)
4221
100.0%
Math Symbol
ValueCountFrequency (%)
~ 11
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9831
66.8%
Common 4687
31.9%
Latin 196
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
867
 
8.8%
406
 
4.1%
317
 
3.2%
251
 
2.6%
192
 
2.0%
181
 
1.8%
175
 
1.8%
168
 
1.7%
167
 
1.7%
137
 
1.4%
Other values (346) 6970
70.9%
Common
ValueCountFrequency (%)
4221
90.1%
? 234
 
5.0%
, 115
 
2.5%
1 14
 
0.3%
0 14
 
0.3%
. 13
 
0.3%
~ 11
 
0.2%
2 10
 
0.2%
) 9
 
0.2%
( 9
 
0.2%
Other values (12) 37
 
0.8%
Latin
ValueCountFrequency (%)
O 187
95.4%
N 3
 
1.5%
H 3
 
1.5%
E 1
 
0.5%
S 1
 
0.5%
A 1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9831
66.8%
ASCII 4883
33.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4221
86.4%
? 234
 
4.8%
O 187
 
3.8%
, 115
 
2.4%
1 14
 
0.3%
0 14
 
0.3%
. 13
 
0.3%
~ 11
 
0.2%
2 10
 
0.2%
) 9
 
0.2%
Other values (18) 55
 
1.1%
Hangul
ValueCountFrequency (%)
867
 
8.8%
406
 
4.1%
317
 
3.2%
251
 
2.6%
192
 
2.0%
181
 
1.8%
175
 
1.8%
168
 
1.7%
167
 
1.7%
137
 
1.4%
Other values (346) 6970
70.9%

Correlations

2023-12-13T08:06:13.064456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류코드대분류소분류
대분류코드1.0001.000NaN
대분류1.0001.0000.994
소분류NaN0.9941.000
2023-12-13T08:06:13.166511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류대분류코드소분류
대분류1.0000.9980.962
대분류코드0.9981.0001.000
소분류0.9621.0001.000
2023-12-13T08:06:13.253771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류코드대분류소분류
대분류코드1.0000.9981.000
대분류0.9981.0000.962
소분류1.0000.9621.000

Missing values

2023-12-13T08:06:09.583061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:06:09.696696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:06:09.811773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번대분류코드대분류소분류한국어발음
014일상생활인사,소개고맙습니다.쎄쎄
124일상생활인사,소개그동안 수고하셨습니다쩌똰 스잰 신쿠러
234일상생활인사,소개다음에 또 뵈요쌰츠 짜이잰
344일상생활인사,소개다음에 오겠습니다.쌰츠 짜이라이
454일상생활인사,소개당신을 잊지 못할 것 입니다워 부후이 왕찌 니더
564일상생활인사,소개당신의 상사는 ㅇㅇㅇ 입니다.니더 쌍스 쓰 OOO
674일상생활인사,소개당신의 성공을 기원합니다쭈니 청궁
784일상생활인사,소개동료퉁쓰
894일상생활인사,소개만나서 반갑습니다.짼따오 니 헌 까오씽
9104일상생활인사,소개맛있게 드세요칭만융
연번대분류코드대분류소분류한국어발음
118411854고용관련신고기타사항이곳에 도장을 찍어주세요짜이 쩌리 까이장
118511864고용관련신고기타사항이곳에 서류를 접수하세요짜이 쩌리 빤리 예우
118611874고용관련신고기타사항재고용근로자 안내문을 잘읽어보세요.칭 류란 짜이 쮸예 라오뚱저 즈난
118711884고용관련신고기타사항재발급을 원합니다.샹 충신 파팡
118811894고용관련신고기타사항체류기간 연장을 해야 합니다.쉬요우 얜창 쯔류치쌘
118911904고용관련신고기타사항체류기간이 만료되었습니다.쯔류치쌘 따오치
119011914고용관련신고기타사항체류지 변경시 반드시 전입신고를 해야 합니다뺀껑 쯔류띠스, 쉬요우 찐싱 좐루 선빠오
119111924고용관련신고기타사항출국시 외국인등록증은 공항 출입국관리사무소에 반납해야 합니다.추궈스, 칭 바 와이궈런 떵두쩡 쨔오게이 찌창 추루찡 관리 쓰우쒀
119211934고용관련신고기타사항출국예정일 변경 또는 재입국을 포기하고자할때는 한국산업인력공단 해외주재사무소로반드시 연락하여야합니다.뺀껑 추궈 위띵르 훠 팡치 짜이 루찡스, 삐쉬 랜씨 한궈 찬예 런리 궁퇀 쭈와이 빤쓰추
119311944고용관련신고기타사항한국체류기간이 얼마나 남았습니까?한궈 쯔류치쌘 하이썽 둬사오 탠 ?