Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells1156
Missing cells (%)1.7%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Numeric2
DateTime1
Categorical3
Text1

Dataset

Description한국장애인고용공단 장애인 취업 현황(순번,취업일자,연령,장애유형,중증여부,근무지역,취업직종대분류)
Author한국장애인고용공단
URLhttps://www.data.go.kr/data/15088956/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
연령 is highly overall correlated with 중증여부High correlation
장애유형 is highly overall correlated with 중증여부High correlation
중증여부 is highly overall correlated with 연령 and 1 other fieldsHigh correlation
순번 has 289 (2.9%) missing valuesMissing
취업일자 has 289 (2.9%) missing valuesMissing
연령 has 289 (2.9%) missing valuesMissing
근무지역 has 289 (2.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 23:00:43.898138
Analysis finished2023-12-12 23:00:45.451775
Duration1.55 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

MISSING 

Distinct9711
Distinct (%)100.0%
Missing289
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean22502.442
Minimum7
Maximum45116
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T08:00:45.553274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile2050
Q111363.5
median22476
Q333828.5
95-th percentile42779.5
Maximum45116
Range45109
Interquartile range (IQR)22465

Descriptive statistics

Standard deviation13024.919
Coefficient of variation (CV)0.57882246
Kurtosis-1.1847378
Mean22502.442
Median Absolute Deviation (MAD)11234
Skewness0.0019305164
Sum2.1852122 × 108
Variance1.6964852 × 108
MonotonicityNot monotonic
2023-12-13T08:00:45.728910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10572 1
 
< 0.1%
22401 1
 
< 0.1%
13876 1
 
< 0.1%
42708 1
 
< 0.1%
16389 1
 
< 0.1%
23213 1
 
< 0.1%
34280 1
 
< 0.1%
7736 1
 
< 0.1%
40539 1
 
< 0.1%
22773 1
 
< 0.1%
Other values (9701) 9701
97.0%
(Missing) 289
 
2.9%
ValueCountFrequency (%)
7 1
< 0.1%
20 1
< 0.1%
25 1
< 0.1%
28 1
< 0.1%
32 1
< 0.1%
34 1
< 0.1%
35 1
< 0.1%
45 1
< 0.1%
47 1
< 0.1%
48 1
< 0.1%
ValueCountFrequency (%)
45116 1
< 0.1%
45113 1
< 0.1%
45109 1
< 0.1%
45107 1
< 0.1%
45105 1
< 0.1%
45104 1
< 0.1%
45102 1
< 0.1%
45088 1
< 0.1%
45085 1
< 0.1%
45073 1
< 0.1%

취업일자
Date

MISSING 

Distinct316
Distinct (%)3.3%
Missing289
Missing (%)2.9%
Memory size156.2 KiB
Minimum2022-01-01 00:00:00
Maximum2022-12-30 00:00:00
2023-12-13T08:00:45.885300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:46.048339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

연령
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct69
Distinct (%)0.7%
Missing289
Missing (%)2.9%
Infinite0
Infinite (%)0.0%
Mean45.684584
Minimum17
Maximum85
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T08:00:46.231829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile20
Q128
median46
Q362
95-th percentile74
Maximum85
Range68
Interquartile range (IQR)34

Descriptive statistics

Standard deviation17.992506
Coefficient of variation (CV)0.39384195
Kurtosis-1.310383
Mean45.684584
Median Absolute Deviation (MAD)17
Skewness0.10263758
Sum443643
Variance323.73027
MonotonicityNot monotonic
2023-12-13T08:00:46.408057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 277
 
2.8%
20 254
 
2.5%
23 253
 
2.5%
25 251
 
2.5%
24 249
 
2.5%
27 249
 
2.5%
62 247
 
2.5%
21 241
 
2.4%
28 240
 
2.4%
26 230
 
2.3%
Other values (59) 7220
72.2%
(Missing) 289
 
2.9%
ValueCountFrequency (%)
17 2
 
< 0.1%
18 46
 
0.5%
19 210
2.1%
20 254
2.5%
21 241
2.4%
22 277
2.8%
23 253
2.5%
24 249
2.5%
25 251
2.5%
26 230
2.3%
ValueCountFrequency (%)
85 4
 
< 0.1%
84 3
 
< 0.1%
83 16
 
0.2%
82 11
 
0.1%
81 18
 
0.2%
80 33
0.3%
79 45
0.4%
78 41
0.4%
77 57
0.6%
76 63
0.6%

장애유형
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
지체장애
3136 
지적장애
2736 
시각장애
1004 
청각장애
850 
뇌병변장애
699 
Other values (13)
1575 

Length

Max length10
Median length4
Mean length4.1048
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row청각장애
2nd row뇌병변장애
3rd row지적장애
4th row지체장애
5th row자폐성장애

Common Values

ValueCountFrequency (%)
지체장애 3136
31.4%
지적장애 2736
27.4%
시각장애 1004
 
10.0%
청각장애 850
 
8.5%
뇌병변장애 699
 
7.0%
정신장애 532
 
5.3%
<NA> 289
 
2.9%
자폐성장애 282
 
2.8%
신장장애 196
 
2.0%
언어장애 84
 
0.8%
Other values (8) 192
 
1.9%

Length

2023-12-13T08:00:46.584640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
지체장애 3136
31.4%
지적장애 2736
27.4%
시각장애 1004
 
10.0%
청각장애 850
 
8.5%
뇌병변장애 699
 
7.0%
정신장애 532
 
5.3%
na 289
 
2.9%
자폐성장애 282
 
2.8%
신장장애 196
 
2.0%
언어장애 84
 
0.8%
Other values (9) 193
 
1.9%

중증여부
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중증
5659 
경증
4052 
<NA>
 
289

Length

Max length4
Median length2
Mean length2.0578
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경증
2nd row중증
3rd row중증
4th row경증
5th row중증

Common Values

ValueCountFrequency (%)
중증 5659
56.6%
경증 4052
40.5%
<NA> 289
 
2.9%

Length

2023-12-13T08:00:46.747342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:00:46.863831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
중증 5659
56.6%
경증 4052
40.5%
na 289
 
2.9%

근무지역
Text

MISSING 

Distinct264
Distinct (%)2.7%
Missing289
Missing (%)2.9%
Memory size156.2 KiB
2023-12-13T08:00:47.222115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length6
Mean length6.2810215
Min length5

Characters and Unicode

Total characters60995
Distinct characters149
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row경기 수원시
2nd row서울 영등포구
3rd row경남 창원시 의창구
4th row광주 북구
5th row서울 강서구
ValueCountFrequency (%)
서울 2131
 
10.6%
경기 1953
 
9.7%
부산 591
 
2.9%
경남 509
 
2.5%
경북 501
 
2.5%
인천 478
 
2.4%
충북 449
 
2.2%
충남 430
 
2.1%
대구 406
 
2.0%
전남 381
 
1.9%
Other values (246) 12296
61.1%
2023-12-13T08:00:47.771007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10414
17.1%
5794
 
9.5%
4490
 
7.4%
3080
 
5.0%
2938
 
4.8%
2488
 
4.1%
2291
 
3.8%
2013
 
3.3%
1734
 
2.8%
1704
 
2.8%
Other values (139) 24049
39.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 50453
82.7%
Space Separator 10414
 
17.1%
Open Punctuation 64
 
0.1%
Close Punctuation 64
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5794
 
11.5%
4490
 
8.9%
3080
 
6.1%
2938
 
5.8%
2488
 
4.9%
2291
 
4.5%
2013
 
4.0%
1734
 
3.4%
1704
 
3.4%
1637
 
3.2%
Other values (136) 22284
44.2%
Space Separator
ValueCountFrequency (%)
10414
100.0%
Open Punctuation
ValueCountFrequency (%)
( 64
100.0%
Close Punctuation
ValueCountFrequency (%)
) 64
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 50453
82.7%
Common 10542
 
17.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5794
 
11.5%
4490
 
8.9%
3080
 
6.1%
2938
 
5.8%
2488
 
4.9%
2291
 
4.5%
2013
 
4.0%
1734
 
3.4%
1704
 
3.4%
1637
 
3.2%
Other values (136) 22284
44.2%
Common
ValueCountFrequency (%)
10414
98.8%
( 64
 
0.6%
) 64
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 50453
82.7%
ASCII 10542
 
17.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10414
98.8%
( 64
 
0.6%
) 64
 
0.6%
Hangul
ValueCountFrequency (%)
5794
 
11.5%
4490
 
8.9%
3080
 
6.1%
2938
 
5.8%
2488
 
4.9%
2291
 
4.5%
2013
 
4.0%
1734
 
3.4%
1704
 
3.4%
1637
 
3.2%
Other values (136) 22284
44.2%
Distinct34
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경영·행정·사무직
3672 
청소 및 기타 개인서비스직
2536 
제조 단순직
672 
경호·경비직
478 
보건·의료직
439 
Other values (29)
2203 

Length

Max length33
Median length23
Mean length10.092
Min length3

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row경영·행정·사무직
2nd row스포츠·레크리에이션직
3rd row제조 단순직
4th row건설·채굴직
5th row스포츠·레크리에이션직

Common Values

ValueCountFrequency (%)
경영·행정·사무직 3672
36.7%
청소 및 기타 개인서비스직 2536
25.4%
제조 단순직 672
 
6.7%
경호·경비직 478
 
4.8%
보건·의료직 439
 
4.4%
음식 서비스직 328
 
3.3%
<NA> 289
 
2.9%
인쇄·목재·공예 및 기타 설치·정비·생산직 197
 
2.0%
돌봄 서비스직(간병·육아) 191
 
1.9%
영업·판매직 153
 
1.5%
Other values (24) 1045
 
10.4%

Length

2023-12-13T08:00:47.926523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경영·행정·사무직 3672
18.4%
2800
14.0%
기타 2733
13.7%
개인서비스직 2536
12.7%
청소 2536
12.7%
제조 703
 
3.5%
단순직 672
 
3.4%
경호·경비직 478
 
2.4%
보건·의료직 439
 
2.2%
설치·정비·생산직 405
 
2.0%
Other values (37) 3003
15.0%

Interactions

2023-12-13T08:00:44.836978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:44.508879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:44.944236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:00:44.631190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:00:48.020117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번연령장애유형중증여부취업직종대분류
순번1.0000.1550.1270.1090.461
연령0.1551.0000.5530.6650.415
장애유형0.1270.5531.0000.7320.470
중증여부0.1090.6650.7321.0000.375
취업직종대분류0.4610.4150.4700.3751.000
2023-12-13T08:00:48.125911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
장애유형중증여부취업직종대분류
장애유형1.0000.6730.147
중증여부0.6731.0000.318
취업직종대분류0.1470.3181.000
2023-12-13T08:00:48.238060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번연령장애유형중증여부취업직종대분류
순번1.000-0.0770.0500.0830.181
연령-0.0771.0000.2510.5170.159
장애유형0.0500.2511.0000.6730.147
중증여부0.0830.5170.6731.0000.318
취업직종대분류0.1810.1590.1470.3181.000

Missing values

2023-12-13T08:00:45.069767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:00:45.212165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:00:45.344495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순번취업일자연령장애유형중증여부근무지역취업직종대분류
193619372022-01-0167청각장애경증경기 수원시경영·행정·사무직
653665372022-09-0760뇌병변장애중증서울 영등포구스포츠·레크리에이션직
42210422112022-02-2828지적장애중증경남 창원시 의창구제조 단순직
17494174952022-05-0970지체장애경증광주 북구건설·채굴직
337333742022-04-1121자폐성장애중증서울 강서구스포츠·레크리에이션직
30965309662022-02-1741지적장애중증전남 담양군인쇄·목재·공예 및 기타 설치·정비·생산직
25261252622022-01-0162간장애경증경북 구미시청소 및 기타 개인서비스직
31401314022022-08-0950지적장애중증광주 서구제조 단순직
13472134732022-10-1122자폐성장애중증서울 강남구영업·판매직
36763367642022-08-0861청각장애중증서울 성동구청소 및 기타 개인서비스직
순번취업일자연령장애유형중증여부근무지역취업직종대분류
20781207822022-02-0166지체장애경증서울 금천구경호·경비직
37954379552022-07-0443정신장애중증경북 포항시 북구제조 단순직
38982389832022-11-1123지적장애중증서울 종로구스포츠·레크리에이션직
15733157342022-09-0155지체장애경증서울 종로구경호·경비직
42388423892022-01-0152지적장애중증경북 경산시제조 단순직
17896178972022-01-0365지체장애경증전북 남원시경영·행정·사무직
22722227232022-03-0123지적장애중증인천 서구음식 서비스직
40884408852022-08-2435지적장애중증인천 계양구제조 단순직
220522062022-01-0151심장장애경증대구 동구경영·행정·사무직
924292432022-08-0124지적장애중증서울 영등포구청소 및 기타 개인서비스직

Duplicate rows

Most frequently occurring

순번취업일자연령장애유형중증여부근무지역취업직종대분류# duplicates
0<NA><NA><NA><NA><NA><NA><NA>289