Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory918.0 KiB
Average record size in memory94.0 B

Variable types

Categorical7
Numeric3

Dataset

Description의사 국가시험 응시자의 성적 현황을 분석할 수 있는 정보(연도, 직종, 회차, 일련번호, 과목명, 과목별 점수, 총점, 합격여부, 성별, 연령대)를 제공합니다.
URLhttps://www.data.go.kr/data/15060446/fileData.do

Alerts

직종 has constant value ""Constant
회차 is highly overall correlated with 일련번호 and 2 other fieldsHigh correlation
연도 is highly overall correlated with 일련번호 and 2 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
과목별점수 is highly overall correlated with 과목명High correlation
총점 is highly overall correlated with 합격여부High correlation
과목명 is highly overall correlated with 과목별점수 and 2 other fieldsHigh correlation
합격여부 is highly overall correlated with 총점High correlation
합격여부 is highly imbalanced (69.5%)Imbalance
연령대 is highly imbalanced (68.2%)Imbalance
과목별점수 has 136 (1.4%) zerosZeros
총점 has 131 (1.3%) zerosZeros

Reproduction

Analysis started2023-12-12 17:09:53.131164
Analysis finished2023-12-12 17:09:55.748675
Duration2.62 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2003
3830 
2002
3696 
2001
2474 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2003
2nd row2003
3rd row2003
4th row2001
5th row2001

Common Values

ValueCountFrequency (%)
2003 3830
38.3%
2002 3696
37.0%
2001 2474
24.7%

Length

2023-12-13T02:09:55.825887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:55.918845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2003 3830
38.3%
2002 3696
37.0%
2001 2474
24.7%

직종
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
의사
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row의사
2nd row의사
3rd row의사
4th row의사
5th row의사

Common Values

ValueCountFrequency (%)
의사 10000
100.0%

Length

2023-12-13T02:09:56.031305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:56.149761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
의사 10000
100.0%

회차
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
67
3830 
66
3696 
65
2474 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row67
2nd row67
3rd row67
4th row65
5th row65

Common Values

ValueCountFrequency (%)
67 3830
38.3%
66 3696
37.0%
65 2474
24.7%

Length

2023-12-13T02:09:56.262684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:56.378618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
67 3830
38.3%
66 3696
37.0%
65 2474
24.7%

일련번호
Real number (ℝ)

HIGH CORRELATION 

Distinct6512
Distinct (%)65.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5566.0937
Minimum1
Maximum10192
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:09:56.536264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile655.95
Q13321.5
median5718.5
Q38089.5
95-th percentile9774.15
Maximum10192
Range10191
Interquartile range (IQR)4768

Descriptive statistics

Standard deviation2884.4749
Coefficient of variation (CV)0.51822248
Kurtosis-1.1068471
Mean5566.0937
Median Absolute Deviation (MAD)2383.5
Skewness-0.1927618
Sum55660937
Variance8320195.3
MonotonicityNot monotonic
2023-12-13T02:09:56.763395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7088 6
 
0.1%
8496 6
 
0.1%
10046 5
 
0.1%
7052 5
 
0.1%
3575 5
 
0.1%
9299 5
 
0.1%
9980 5
 
0.1%
8854 5
 
0.1%
6960 5
 
0.1%
9086 5
 
0.1%
Other values (6502) 9948
99.5%
ValueCountFrequency (%)
1 1
< 0.1%
3 2
< 0.1%
7 2
< 0.1%
8 1
< 0.1%
10 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
17 1
< 0.1%
ValueCountFrequency (%)
10192 2
< 0.1%
10191 3
< 0.1%
10190 4
< 0.1%
10188 1
 
< 0.1%
10187 1
 
< 0.1%
10186 2
< 0.1%
10185 2
< 0.1%
10181 1
 
< 0.1%
10180 3
< 0.1%
10179 1
 
< 0.1%

과목명
Categorical

HIGH CORRELATION 

Distinct20
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
보건의약관계 법규
1142 
의학각론2
739 
의학각론4
724 
의학각론3
712 
의학각론5
704 
Other values (15)
5979 

Length

Max length9
Median length5
Mean length5.6481
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row의학각론1(R형)
2nd row의학총론2(R형)
3rd row의학총론2(R형)
4th row예방의학
5th row산부인과

Common Values

ValueCountFrequency (%)
보건의약관계 법규 1142
 
11.4%
의학각론2 739
 
7.4%
의학각론4 724
 
7.2%
의학각론3 712
 
7.1%
의학각론5 704
 
7.0%
의학총론2 699
 
7.0%
의학총론1 699
 
7.0%
의학각론2(R형) 375
 
3.8%
의학각론1 372
 
3.7%
의학각론1(R형) 362
 
3.6%
Other values (10) 3472
34.7%

Length

2023-12-13T02:09:56.970176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
보건의약관계 1142
 
10.2%
법규 1142
 
10.2%
의학각론2 739
 
6.6%
의학각론4 724
 
6.5%
의학각론3 712
 
6.4%
의학각론5 704
 
6.3%
의학총론2 699
 
6.3%
의학총론1 699
 
6.3%
의학각론2(r형 375
 
3.4%
의학각론1 372
 
3.3%
Other values (11) 3834
34.4%

과목별점수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct139
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.1104
Minimum0
Maximum145
Zeros136
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:09:57.160351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q116
median34
Q346
95-th percentile57
Maximum145
Range145
Interquartile range (IQR)30

Descriptive statistics

Standard deviation21.750587
Coefficient of variation (CV)0.65691102
Kurtosis3.2906015
Mean33.1104
Median Absolute Deviation (MAD)13
Skewness1.1590389
Sum331104
Variance473.08802
MonotonicityNot monotonic
2023-12-13T02:09:57.343112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
46.0 281
 
2.8%
45.0 280
 
2.8%
44.0 279
 
2.8%
43.0 262
 
2.6%
42.0 261
 
2.6%
8.0 258
 
2.6%
48.0 258
 
2.6%
47.0 256
 
2.6%
41.0 243
 
2.4%
7.0 242
 
2.4%
Other values (129) 7380
73.8%
ValueCountFrequency (%)
0.0 136
1.4%
1.0 23
 
0.2%
2.0 133
1.3%
3.0 165
1.7%
4.0 116
1.2%
4.5 10
 
0.1%
5.0 127
1.3%
5.5 14
 
0.1%
6.0 180
1.8%
6.5 83
0.8%
ValueCountFrequency (%)
145.0 1
 
< 0.1%
138.0 1
 
< 0.1%
137.0 1
 
< 0.1%
135.0 1
 
< 0.1%
134.0 2
< 0.1%
133.0 3
< 0.1%
132.0 1
 
< 0.1%
131.0 2
< 0.1%
130.0 2
< 0.1%
129.0 3
< 0.1%

총점
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct449
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean314.94945
Minimum0
Maximum430.5
Zeros131
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T02:09:57.552235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile246.5
Q1294
median321
Q3344.5
95-th percentile377.5
Maximum430.5
Range430.5
Interquartile range (IQR)50.5

Descriptive statistics

Standard deviation52.955226
Coefficient of variation (CV)0.16813881
Kurtosis14.665391
Mean314.94945
Median Absolute Deviation (MAD)25.5
Skewness-2.8533447
Sum3149494.5
Variance2804.256
MonotonicityNot monotonic
2023-12-13T02:09:57.707875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 131
 
1.3%
316.0 76
 
0.8%
322.5 73
 
0.7%
335.0 73
 
0.7%
319.5 72
 
0.7%
337.0 72
 
0.7%
339.5 72
 
0.7%
333.5 71
 
0.7%
313.5 70
 
0.7%
330.5 69
 
0.7%
Other values (439) 9221
92.2%
ValueCountFrequency (%)
0.0 131
1.3%
19.0 1
 
< 0.1%
23.0 1
 
< 0.1%
55.0 1
 
< 0.1%
84.5 2
 
< 0.1%
122.5 2
 
< 0.1%
138.0 1
 
< 0.1%
140.0 2
 
< 0.1%
157.5 1
 
< 0.1%
162.0 2
 
< 0.1%
ValueCountFrequency (%)
430.5 1
 
< 0.1%
426.0 1
 
< 0.1%
424.0 1
 
< 0.1%
421.5 2
< 0.1%
421.0 3
< 0.1%
420.0 1
 
< 0.1%
419.5 3
< 0.1%
418.5 1
 
< 0.1%
417.5 2
< 0.1%
415.5 3
< 0.1%

합격여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
8761 
불합격
1108 
결시
 
104
응시결격
 
27

Length

Max length4
Median length2
Mean length2.1162
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row합격

Common Values

ValueCountFrequency (%)
합격 8761
87.6%
불합격 1108
 
11.1%
결시 104
 
1.0%
응시결격 27
 
0.3%

Length

2023-12-13T02:09:57.880676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:58.017076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 8761
87.6%
불합격 1108
 
11.1%
결시 104
 
1.0%
응시결격 27
 
0.3%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
7295 
2705 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
7295
73.0%
2705
 
27.1%

Length

2023-12-13T02:09:58.152042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:58.254259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7295
73.0%
2705
 
27.1%

연령대
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
8279 
30
1587 
40
 
120
50
 
12
60
 
2

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row20
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 8279
82.8%
30 1587
 
15.9%
40 120
 
1.2%
50 12
 
0.1%
60 2
 
< 0.1%

Length

2023-12-13T02:09:58.382241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T02:09:58.497591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 8279
82.8%
30 1587
 
15.9%
40 120
 
1.2%
50 12
 
0.1%
60 2
 
< 0.1%

Interactions

2023-12-13T02:09:55.170106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:54.177770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:54.828143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:55.271951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:54.599881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:54.929288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:55.369909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:54.722096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T02:09:55.050348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T02:09:58.594189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호과목명과목별점수총점합격여부성별연령대
연도1.0001.0000.9520.8990.3730.4460.0900.0000.013
회차1.0001.0000.9520.8990.3730.4460.0900.0000.013
일련번호0.9520.9521.0000.7590.3460.4430.1580.0740.270
과목명0.8990.8990.7591.0000.9050.3230.0970.0390.000
과목별점수0.3730.3730.3460.9051.0000.4390.3020.1180.171
총점0.4460.4460.4430.3230.4391.0000.8700.2940.495
합격여부0.0900.0900.1580.0970.3020.8701.0000.1960.275
성별0.0000.0000.0740.0390.1180.2940.1961.0000.093
연령대0.0130.0130.2700.0000.1710.4950.2750.0931.000
2023-12-13T02:09:58.750665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별과목명연령대회차합격여부연도
성별1.0000.0310.1140.0000.1300.000
과목명0.0311.0000.0000.7720.0460.772
연령대0.1140.0001.0000.0090.2280.009
회차0.0000.7720.0091.0000.0851.000
합격여부0.1300.0460.2280.0851.0000.085
연도0.0000.7720.0091.0000.0851.000
2023-12-13T02:09:58.874967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일련번호과목별점수총점연도회차과목명합격여부성별연령대
일련번호1.000-0.1020.3680.9460.9460.3460.0950.0570.115
과목별점수-0.1021.0000.1750.2400.2400.5410.1880.0860.071
총점0.3680.1751.0000.3000.3000.1070.7300.2250.227
연도0.9460.2400.3001.0001.0000.7720.0850.0000.009
회차0.9460.2400.3001.0001.0000.7720.0850.0000.009
과목명0.3460.5410.1070.7720.7721.0000.0460.0310.000
합격여부0.0950.1880.7300.0850.0850.0461.0000.1300.228
성별0.0570.0860.2250.0000.0000.0310.1301.0000.114
연령대0.1150.0710.2270.0090.0090.0000.2280.1141.000

Missing values

2023-12-13T02:09:55.494744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:09:55.666993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호과목명과목별점수총점합격여부성별연령대
794302003의사678751의학각론1(R형)13.0378.5합격20
914952003의사679847의학총론2(R형)5.0330.0합격20
609812003의사677073의학총론2(R형)5.0321.0합격20
98882001의사651413예방의학33.0314.0합격20
108732001의사651554산부인과47.0330.0합격20
100172001의사651432내과학119.0300.5합격20
750012003의사678348의학각론549.0348.5합격20
96262001의사651376보건의약관계 법규8.0325.0합격20
341482002의사664406의학각론348.0309.0합격20
221872001의사653170예방의학28.0328.0합격20
연도직종회차일련번호과목명과목별점수총점합격여부성별연령대
744422003의사678297의학각론737.0326.5합격20
260522002의사663596의학총론222.0318.0합격20
661432003의사677543의학각론224.0327.5합격20
666092003의사677585의학각론629.0214.5불합격30
802022003의사678821의학각론334.0305.0합격20
238142002의사663373보건의약관계 법규5.5296.5합격20
530312002의사666294의학총론140.0315.5합격20
267912002의사663670의학총론137.0338.5합격30
252062002의사663512의학각론227.0316.0합격20
543682002의사666428의학각론345.0332.5합격20