Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric3
Categorical7

Dataset

Description위생사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15083488/fileData.do

Alerts

직종 has constant value ""Constant
연도 is highly overall correlated with 회차 and 1 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
응시지역 is highly overall correlated with 학교소재지High correlation
학교소재지 is highly overall correlated with 응시지역High correlation
연령대 is highly imbalanced (70.0%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 17:11:29.389916
Analysis finished2023-12-12 17:11:31.861220
Duration2.47 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.1472
Minimum2000
Maximum2008
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:11:31.930211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2000
Q12002
median2004
Q32006
95-th percentile2008
Maximum2008
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.3525487
Coefficient of variation (CV)0.0011738403
Kurtosis-1.0849946
Mean2004.1472
Median Absolute Deviation (MAD)2
Skewness-0.016500458
Sum20041472
Variance5.5344856
MonotonicityNot monotonic
2023-12-13T02:11:32.056299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2002 1797
18.0%
2006 1314
13.1%
2005 1278
12.8%
2004 1204
12.0%
2007 1123
11.2%
2003 1084
10.8%
2008 878
8.8%
2000 671
 
6.7%
2001 651
 
6.5%
ValueCountFrequency (%)
2000 671
 
6.7%
2001 651
 
6.5%
2002 1797
18.0%
2003 1084
10.8%
2004 1204
12.0%
2005 1278
12.8%
2006 1314
13.1%
2007 1123
11.2%
2008 878
8.8%
ValueCountFrequency (%)
2008 878
8.8%
2007 1123
11.2%
2006 1314
13.1%
2005 1278
12.8%
2004 1204
12.0%
2003 1084
10.8%
2002 1797
18.0%
2001 651
 
6.5%
2000 671
 
6.7%

직종
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
위생사
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row위생사
2nd row위생사
3rd row위생사
4th row위생사
5th row위생사

Common Values

ValueCountFrequency (%)
위생사 10000
100.0%

Length

2023-12-13T02:11:32.186300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:32.284889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
위생사 10000
100.0%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.927
Minimum21
Maximum30
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:11:32.373141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median26
Q328
95-th percentile30
Maximum30
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.6552921
Coefficient of variation (CV)0.10241417
Kurtosis-0.99589018
Mean25.927
Median Absolute Deviation (MAD)2
Skewness-0.23495161
Sum259270
Variance7.0505761
MonotonicityNot monotonic
2023-12-13T02:11:32.487468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
28 1314
13.1%
27 1278
12.8%
26 1204
12.0%
29 1123
11.2%
25 1084
10.8%
24 917
9.2%
23 880
8.8%
30 878
8.8%
21 671
6.7%
22 651
6.5%
ValueCountFrequency (%)
21 671
6.7%
22 651
6.5%
23 880
8.8%
24 917
9.2%
25 1084
10.8%
26 1204
12.0%
27 1278
12.8%
28 1314
13.1%
29 1123
11.2%
30 878
8.8%
ValueCountFrequency (%)
30 878
8.8%
29 1123
11.2%
28 1314
13.1%
27 1278
12.8%
26 1204
12.0%
25 1084
10.8%
24 917
9.2%
23 880
8.8%
22 651
6.5%
21 671
6.7%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47263.767
Minimum27
Maximum95282
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:11:32.648215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum27
5-th percentile4656.1
Q123793.75
median46996.5
Q370327.5
95-th percentile90618.65
Maximum95282
Range95255
Interquartile range (IQR)46533.75

Descriptive statistics

Standard deviation27331.413
Coefficient of variation (CV)0.57827412
Kurtosis-1.1751485
Mean47263.767
Median Absolute Deviation (MAD)23301.5
Skewness0.022988525
Sum4.7263767 × 108
Variance7.4700615 × 108
MonotonicityNot monotonic
2023-12-13T02:11:32.834702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
82826 1
 
< 0.1%
92386 1
 
< 0.1%
43004 1
 
< 0.1%
80307 1
 
< 0.1%
45851 1
 
< 0.1%
94101 1
 
< 0.1%
71853 1
 
< 0.1%
24789 1
 
< 0.1%
31374 1
 
< 0.1%
39218 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
27 1
< 0.1%
47 1
< 0.1%
72 1
< 0.1%
79 1
< 0.1%
84 1
< 0.1%
105 1
< 0.1%
111 1
< 0.1%
119 1
< 0.1%
133 1
< 0.1%
134 1
< 0.1%
ValueCountFrequency (%)
95282 1
< 0.1%
95278 1
< 0.1%
95276 1
< 0.1%
95258 1
< 0.1%
95256 1
< 0.1%
95223 1
< 0.1%
95222 1
< 0.1%
95177 1
< 0.1%
95159 1
< 0.1%
95155 1
< 0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
8289 
1711 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
8289
82.9%
1711
 
17.1%

Length

2023-12-13T02:11:33.004219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:33.106906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
8289
82.9%
1711
 
17.1%

연령대
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
8893 
30
 
864
40
 
212
50
 
31

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row20
3rd row20
4th row40
5th row20

Common Values

ValueCountFrequency (%)
20 8893
88.9%
30 864
 
8.6%
40 212
 
2.1%
50 31
 
0.3%

Length

2023-12-13T02:11:33.214113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:33.322435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 8893
88.9%
30 864
 
8.6%
40 212
 
2.1%
50 31
 
0.3%

응시지역
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
4233 
부산광역시
1906 
대구광역시
1517 
대전광역시
1077 
광주광역시
859 
Other values (2)
 
408

Length

Max length5
Median length5
Mean length4.8776
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대구광역시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row부산광역시

Common Values

ValueCountFrequency (%)
서울특별시 4233
42.3%
부산광역시 1906
19.1%
대구광역시 1517
 
15.2%
대전광역시 1077
 
10.8%
광주광역시 859
 
8.6%
전주 298
 
3.0%
원주 110
 
1.1%

Length

2023-12-13T02:11:33.444618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:33.590376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 4233
42.3%
부산광역시 1906
19.1%
대구광역시 1517
 
15.2%
대전광역시 1077
 
10.8%
광주광역시 859
 
8.6%
전주 298
 
3.0%
원주 110
 
1.1%

졸업여부
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업예정
6544 
졸업
3254 
 
202

Length

Max length4
Median length4
Mean length3.2886
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row졸업
3rd row졸업예정
4th row졸업
5th row졸업

Common Values

ValueCountFrequency (%)
졸업예정 6544
65.4%
졸업 3254
32.5%
202
 
2.0%

Length

2023-12-13T02:11:33.750584image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:33.870422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업예정 6544
66.8%
졸업 3254
33.2%

합격여부
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
불합격
4552 
합격
3222 
결시
2226 

Length

Max length3
Median length2
Mean length2.4552
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row결시
3rd row결시
4th row합격
5th row결시

Common Values

ValueCountFrequency (%)
불합격 4552
45.5%
합격 3222
32.2%
결시 2226
22.3%

Length

2023-12-13T02:11:33.996098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:11:34.100292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
불합격 4552
45.5%
합격 3222
32.2%
결시 2226
22.3%

학교소재지
Categorical

HIGH CORRELATION 

Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
1625 
서울특별시
1424 
부산광역시
1091 
경상북도
991 
경상남도
714 
Other values (14)
4155 

Length

Max length5
Median length4
Mean length4.2115
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row경상북도
2nd row서울특별시
3rd row충청북도
4th row대전광역시
5th row경상남도

Common Values

ValueCountFrequency (%)
경기도 1625
16.2%
서울특별시 1424
14.2%
부산광역시 1091
10.9%
경상북도 991
9.9%
경상남도 714
7.1%
대구광역시 618
 
6.2%
대전광역시 551
 
5.5%
전라북도 532
 
5.3%
광주광역시 509
 
5.1%
충청남도 459
 
4.6%
Other values (9) 1486
14.9%

Length

2023-12-13T02:11:34.230397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 1625
16.2%
서울특별시 1424
14.2%
부산광역시 1091
10.9%
경상북도 991
9.9%
경상남도 714
7.1%
대구광역시 618
 
6.2%
대전광역시 551
 
5.5%
전라북도 532
 
5.3%
광주광역시 509
 
5.1%
충청남도 459
 
4.6%
Other values (9) 1486
14.9%

Interactions

2023-12-13T02:11:30.944894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.321131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.640442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:31.054071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.437189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.753935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:31.155173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.533166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:11:30.835688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T02:11:34.321229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9540.0630.0740.2120.2440.1950.172
회차1.0001.0000.9890.0740.0630.2270.2580.2300.170
일련번호0.9540.9891.0000.0740.0530.3130.2570.1760.259
성별0.0630.0740.0741.0000.1440.0160.0000.0200.156
연령대0.0740.0630.0530.1441.0000.0000.2410.0880.285
응시지역0.2120.2270.3130.0160.0001.0000.1780.0740.941
졸업여부0.2440.2580.2570.0000.2410.1781.0000.3900.241
합격여부0.1950.2300.1760.0200.0880.0740.3901.0000.115
학교소재지0.1720.1700.2590.1560.2850.9410.2410.1151.000
2023-12-13T02:11:34.439717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
학교소재지합격여부응시지역성별연령대졸업여부
학교소재지1.0000.0600.7920.1380.1590.130
합격여부0.0601.0000.0490.0320.0830.142
응시지역0.7920.0491.0000.0180.0000.121
성별0.1380.0320.0181.0000.0950.000
연령대0.1590.0830.0000.0951.0000.230
졸업여부0.1300.1420.1210.0000.2301.000
2023-12-13T02:11:34.594986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0000.9980.9920.0550.0370.1160.1580.1280.068
회차0.9981.0000.9940.0570.0380.1160.1590.1410.064
일련번호0.9920.9941.0000.0570.0320.1640.1580.1060.100
성별0.0550.0570.0571.0000.0950.0180.0000.0320.138
연령대0.0370.0380.0320.0951.0000.0000.2300.0830.159
응시지역0.1160.1160.1640.0180.0001.0000.1210.0490.792
졸업여부0.1580.1590.1580.0000.2300.1211.0000.1420.130
합격여부0.1280.1410.1060.0320.0830.0490.1421.0000.060
학교소재지0.0680.0640.1000.1380.1590.7920.1300.0601.000

Missing values

2023-12-13T02:11:31.297859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:11:31.794910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
828272007위생사298282620대구광역시졸업예정합격경상북도
329882003위생사253298720서울특별시졸업결시서울특별시
160892002위생사231440820서울특별시졸업예정결시충청북도
636342006위생사286363340서울특별시졸업합격대전광역시
87942001위생사22879520부산광역시졸업결시경상남도
501022004위생사265010120대전광역시졸업예정결시대전광역시
88522001위생사22885320부산광역시졸업예정합격경상남도
861942007위생사298619320전주졸업예정결시전라북도
379332003위생사253793220광주광역시졸업예정결시광주광역시
258662002위생사231943420대전광역시졸업예정결시대전광역시
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
751952006위생사287519420원주졸업예정불합격경상북도
16662000위생사21166720서울특별시졸업불합격서울특별시
900862008위생사309008520서울특별시졸업예정불합격서울특별시
175252002위생사231512620서울특별시졸업예정불합격경기도
112852001위생사221128620대전광역시졸업예정합격전라북도
812412007위생사298124020부산광역시졸업예정합격부산광역시
372862003위생사253728520광주광역시졸업예정합격전라남도
353252003위생사253532420부산광역시졸업예정불합격부산광역시
23982000위생사21239920부산광역시졸업불합격부산광역시
802152007위생사298021420서울특별시졸업불합격서울특별시