Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric3
Categorical7

Dataset

Description간호사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060459/fileData.do

Alerts

직종 has constant value ""Constant
학교소재지 is highly overall correlated with 응시지역High correlation
응시지역 is highly overall correlated with 학교소재지High correlation
연도 is highly overall correlated with 회차 and 1 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
성별 is highly imbalanced (87.6%)Imbalance
연령대 is highly imbalanced (90.0%)Imbalance
졸업여부 is highly imbalanced (69.6%)Imbalance
합격여부 is highly imbalanced (71.0%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 07:15:41.353650
Analysis finished2023-12-12 07:15:43.780332
Duration2.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2003.3101
Minimum2000
Maximum2007
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:15:43.845100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2000
Q12001
median2002
Q32005
95-th percentile2007
Maximum2007
Range7
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.2957493
Coefficient of variation (CV)0.001145978
Kurtosis-1.3300487
Mean2003.3101
Median Absolute Deviation (MAD)2
Skewness0.13373445
Sum20033101
Variance5.270465
MonotonicityNot monotonic
2023-12-12T16:15:43.992601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2002 2530
25.3%
2005 1372
13.7%
2006 1288
12.9%
2000 1269
12.7%
2004 1262
12.6%
2001 1258
12.6%
2007 1021
10.2%
ValueCountFrequency (%)
2000 1269
12.7%
2001 1258
12.6%
2002 2530
25.3%
2004 1262
12.6%
2005 1372
13.7%
2006 1288
12.9%
2007 1021
10.2%
ValueCountFrequency (%)
2007 1021
10.2%
2006 1288
12.9%
2005 1372
13.7%
2004 1262
12.6%
2002 2530
25.3%
2001 1258
12.6%
2000 1269
12.7%

직종
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
간호사
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row간호사
2nd row간호사
3rd row간호사
4th row간호사
5th row간호사

Common Values

ValueCountFrequency (%)
간호사 10000
100.0%

Length

2023-12-12T16:15:44.160525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:44.275961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
간호사 10000
100.0%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.4378
Minimum40
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:15:44.375064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile40
Q141
median43
Q345
95-th percentile47
Maximum47
Range7
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.2466054
Coefficient of variation (CV)0.051720055
Kurtosis-1.2124304
Mean43.4378
Median Absolute Deviation (MAD)2
Skewness-0.0030425279
Sum434378
Variance5.0472359
MonotonicityNot monotonic
2023-12-12T16:15:44.512894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
45 1372
13.7%
46 1288
12.9%
43 1277
12.8%
40 1269
12.7%
44 1262
12.6%
41 1258
12.6%
42 1253
12.5%
47 1021
10.2%
ValueCountFrequency (%)
40 1269
12.7%
41 1258
12.6%
42 1253
12.5%
43 1277
12.8%
44 1262
12.6%
45 1372
13.7%
46 1288
12.9%
47 1021
10.2%
ValueCountFrequency (%)
47 1021
10.2%
46 1288
12.9%
45 1372
13.7%
44 1262
12.6%
43 1277
12.8%
42 1253
12.5%
41 1258
12.6%
40 1269
12.7%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47589.881
Minimum4
Maximum95282
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:15:44.683453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile4574.7
Q123563.5
median47921.5
Q371327.5
95-th percentile90535.2
Maximum95282
Range95278
Interquartile range (IQR)47764

Descriptive statistics

Standard deviation27521.211
Coefficient of variation (CV)0.57829964
Kurtosis-1.197012
Mean47589.881
Median Absolute Deviation (MAD)23832.5
Skewness-0.011114193
Sum4.7589881 × 108
Variance7.5741705 × 108
MonotonicityNot monotonic
2023-12-12T16:15:44.872469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
55015 1
 
< 0.1%
62040 1
 
< 0.1%
53472 1
 
< 0.1%
28207 1
 
< 0.1%
37130 1
 
< 0.1%
71508 1
 
< 0.1%
22194 1
 
< 0.1%
873 1
 
< 0.1%
11124 1
 
< 0.1%
65257 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
4 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
22 1
< 0.1%
27 1
< 0.1%
34 1
< 0.1%
47 1
< 0.1%
63 1
< 0.1%
88 1
< 0.1%
ValueCountFrequency (%)
95282 1
< 0.1%
95280 1
< 0.1%
95277 1
< 0.1%
95273 1
< 0.1%
95269 1
< 0.1%
95266 1
< 0.1%
95263 1
< 0.1%
95250 1
< 0.1%
95249 1
< 0.1%
95248 1
< 0.1%

성별
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
9831 
 
169

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
9831
98.3%
169
 
1.7%

Length

2023-12-12T16:15:45.000009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:45.087574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
9831
98.3%
169
 
1.7%

연령대
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
9711 
30
 
268
40
 
20
50
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row20
2nd row20
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 9711
97.1%
30 268
 
2.7%
40 20
 
0.2%
50 1
 
< 0.1%

Length

2023-12-12T16:15:45.196054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:45.301219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 9711
97.1%
30 268
 
2.7%
40 20
 
0.2%
50 1
 
< 0.1%

응시지역
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
3929 
대구광역시
1840 
광주광역시
1552 
부산광역시
1340 
대전광역시
889 
Other values (3)
450 

Length

Max length5
Median length5
Mean length4.8757
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대구광역시
2nd row광주광역시
3rd row광주광역시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 3929
39.3%
대구광역시 1840
18.4%
광주광역시 1552
 
15.5%
부산광역시 1340
 
13.4%
대전광역시 889
 
8.9%
전주 219
 
2.2%
강릉 124
 
1.2%
제주도 107
 
1.1%

Length

2023-12-12T16:15:45.447108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:45.587305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 3929
39.3%
대구광역시 1840
18.4%
광주광역시 1552
 
15.5%
부산광역시 1340
 
13.4%
대전광역시 889
 
8.9%
전주 219
 
2.2%
강릉 124
 
1.2%
제주도 107
 
1.1%

졸업여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업(수습)예정
9033 
졸업(수습)
929 
<NA>
 
38

Length

Max length8
Median length8
Mean length7.799
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업(수습)
2nd row졸업(수습)예정
3rd row졸업(수습)예정
4th row졸업(수습)예정
5th row졸업(수습)예정

Common Values

ValueCountFrequency (%)
졸업(수습)예정 9033
90.3%
졸업(수습) 929
 
9.3%
<NA> 38
 
0.4%

Length

2023-12-12T16:15:45.727717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:46.063250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업(수습)예정 9033
90.3%
졸업(수습 929
 
9.3%
na 38
 
0.4%

합격여부
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
8869 
불합격
978 
결시
 
141
응시결격
 
12

Length

Max length4
Median length2
Mean length2.1002
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row합격

Common Values

ValueCountFrequency (%)
합격 8869
88.7%
불합격 978
 
9.8%
결시 141
 
1.4%
응시결격 12
 
0.1%

Length

2023-12-12T16:15:46.150108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:46.234921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 8869
88.7%
불합격 978
 
9.8%
결시 141
 
1.4%
응시결격 12
 
0.1%

학교소재지
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
3929 
대구광역시
1840 
광주광역시
1552 
부산광역시
1340 
대전광역시
889 
Other values (3)
450 

Length

Max length5
Median length5
Mean length4.8757
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대구광역시
2nd row광주광역시
3rd row광주광역시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 3929
39.3%
대구광역시 1840
18.4%
광주광역시 1552
 
15.5%
부산광역시 1340
 
13.4%
대전광역시 889
 
8.9%
전주 219
 
2.2%
강릉 124
 
1.2%
제주도 107
 
1.1%

Length

2023-12-12T16:15:46.330036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:15:46.444287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 3929
39.3%
대구광역시 1840
18.4%
광주광역시 1552
 
15.5%
부산광역시 1340
 
13.4%
대전광역시 889
 
8.9%
전주 219
 
2.2%
강릉 124
 
1.2%
제주도 107
 
1.1%

Interactions

2023-12-12T16:15:43.151876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:42.402725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:42.781681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:43.270792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:42.525593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:42.909956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:43.384292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:42.668157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:15:43.043622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:15:46.545700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9780.0980.0620.2460.0440.0610.246
회차1.0001.0000.9350.1060.0630.3500.0680.0880.350
일련번호0.9780.9351.0000.1070.0470.4850.0370.0800.485
성별0.0980.1060.1071.0000.1670.1210.0240.0540.121
연령대0.0620.0630.0470.1671.0000.0280.2350.2140.028
응시지역0.2460.3500.4850.1210.0281.0000.1270.1641.000
졸업여부0.0440.0680.0370.0240.2350.1271.0000.6910.127
합격여부0.0610.0880.0800.0540.2140.1640.6911.0000.164
학교소재지0.2460.3500.4850.1210.0281.0000.1270.1641.000
2023-12-12T16:15:46.705855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
졸업여부합격여부학교소재지연령대응시지역성별
졸업여부1.0000.4890.0950.1560.0950.015
합격여부0.4891.0000.0740.0860.0740.036
학교소재지0.0950.0741.0000.0131.0000.091
연령대0.1560.0860.0131.0000.0130.111
응시지역0.0950.0741.0000.0131.0000.091
성별0.0150.0360.0910.1110.0911.000
2023-12-12T16:15:46.842434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0000.9940.9860.0800.0300.1300.0520.0400.130
회차0.9941.0000.9810.0800.0280.1230.0510.0400.123
일련번호0.9860.9811.0000.0820.0280.2560.0290.0480.256
성별0.0800.0800.0821.0000.1110.0910.0150.0360.091
연령대0.0300.0280.0280.1111.0000.0130.1560.0860.013
응시지역0.1300.1230.2560.0910.0131.0000.0950.0741.000
졸업여부0.0520.0510.0290.0150.1560.0951.0000.4890.095
합격여부0.0400.0400.0480.0360.0860.0740.4891.0000.074
학교소재지0.1300.1230.2560.0910.0131.0000.0950.0741.000

Missing values

2023-12-12T16:15:43.516101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:15:43.702718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
550142004간호사445501520대구광역시졸업(수습)합격대구광역시
578102004간호사445781120광주광역시졸업(수습)예정합격광주광역시
814512006간호사468145220광주광역시졸업(수습)예정합격광주광역시
758992006간호사467590020서울특별시졸업(수습)예정합격서울특별시
287142002간호사432871520서울특별시졸업(수습)예정합격서울특별시
501312004간호사445013220서울특별시졸업(수습)결시서울특별시
309602002간호사423096120서울특별시졸업(수습)합격서울특별시
561492004간호사445615020대구광역시졸업(수습)예정합격대구광역시
300852002간호사433008620서울특별시졸업(수습)예정합격서울특별시
256362002간호사432563720서울특별시졸업(수습)예정합격서울특별시
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
737982006간호사467379920서울특별시졸업(수습)예정합격서울특별시
479012002간호사424790220광주광역시졸업(수습)예정합격광주광역시
630032005간호사456300420서울특별시졸업(수습)예정합격서울특별시
771122006간호사467711320서울특별시졸업(수습)예정합격서울특별시
157102001간호사411571120서울특별시졸업(수습)예정합격서울특별시
892632007간호사478926420서울특별시졸업(수습)예정합격서울특별시
35172000간호사40351820서울특별시졸업(수습)불합격서울특별시
407682002간호사434076920대구광역시졸업(수습)예정합격대구광역시
682042005간호사456820520대구광역시졸업(수습)예정합격대구광역시
939572007간호사479395820대구광역시졸업(수습)예정합격대구광역시