Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric3
Categorical7

Dataset

Description의사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060455/fileData.do

Alerts

연도 is highly overall correlated with 회차 and 2 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
직종 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
응시지역 is highly overall correlated with 학교소재지High correlation
학교소재지 is highly overall correlated with 응시지역High correlation
연령대 is highly imbalanced (58.6%)Imbalance
졸업여부 is highly imbalanced (64.3%)Imbalance
합격여부 is highly imbalanced (82.7%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 13:57:22.415796
Analysis finished2023-12-12 13:57:24.839306
Duration2.42 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct23
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.8152
Minimum2001
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:57:24.899487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2001
5-th percentile2002
Q12007
median2012
Q32016
95-th percentile2022
Maximum2023
Range22
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.9381794
Coefficient of variation (CV)0.0029516525
Kurtosis-0.82540647
Mean2011.8152
Median Absolute Deviation (MAD)4
Skewness0.020265671
Sum20118152
Variance35.261975
MonotonicityNot monotonic
2023-12-12T22:57:25.042353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2012 698
 
7.0%
2013 693
 
6.9%
2010 673
 
6.7%
2011 659
 
6.6%
2014 657
 
6.6%
2015 652
 
6.5%
2004 405
 
4.0%
2007 403
 
4.0%
2005 396
 
4.0%
2008 396
 
4.0%
Other values (13) 4368
43.7%
ValueCountFrequency (%)
2001 331
3.3%
2002 355
3.5%
2003 359
3.6%
2004 405
4.0%
2005 396
4.0%
2006 337
3.4%
2007 403
4.0%
2008 396
4.0%
2009 383
3.8%
2010 673
6.7%
ValueCountFrequency (%)
2023 325
3.2%
2022 335
3.4%
2021 282
2.8%
2020 318
3.2%
2019 358
3.6%
2018 308
3.1%
2017 329
3.3%
2016 348
3.5%
2015 652
6.5%
2014 657
6.6%

직종
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
의사(필기)
4616 
의사
3365 
의사(실기)
2019 

Length

Max length6
Median length6
Mean length4.654
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row의사(필기)
2nd row의사(실기)
3rd row의사
4th row의사(필기)
5th row의사

Common Values

ValueCountFrequency (%)
의사(필기) 4616
46.2%
의사 3365
33.7%
의사(실기) 2019
20.2%

Length

2023-12-12T22:57:25.189573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:25.328975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
의사(필기 4616
46.2%
의사 3365
33.7%
의사(실기 2019
20.2%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct23
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.8152
Minimum65
Maximum87
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:57:25.438939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum65
5-th percentile66
Q171
median76
Q380
95-th percentile86
Maximum87
Range22
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.9381794
Coefficient of variation (CV)0.078324392
Kurtosis-0.82540647
Mean75.8152
Median Absolute Deviation (MAD)4
Skewness0.020265671
Sum758152
Variance35.261975
MonotonicityNot monotonic
2023-12-12T22:57:25.865783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
76 698
 
7.0%
77 693
 
6.9%
74 673
 
6.7%
75 659
 
6.6%
78 657
 
6.6%
79 652
 
6.5%
68 405
 
4.0%
71 403
 
4.0%
69 396
 
4.0%
72 396
 
4.0%
Other values (13) 4368
43.7%
ValueCountFrequency (%)
65 331
3.3%
66 355
3.5%
67 359
3.6%
68 405
4.0%
69 396
4.0%
70 337
3.4%
71 403
4.0%
72 396
4.0%
73 383
3.8%
74 673
6.7%
ValueCountFrequency (%)
87 325
3.2%
86 335
3.4%
85 282
2.8%
84 318
3.2%
83 358
3.6%
82 308
3.1%
81 329
3.3%
80 348
3.5%
79 652
6.5%
78 657
6.6%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49894.037
Minimum10
Maximum99986
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T22:57:25.988377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile5107.35
Q125067
median49932.5
Q374630
95-th percentile94495.55
Maximum99986
Range99976
Interquartile range (IQR)49563

Descriptive statistics

Standard deviation28688.819
Coefficient of variation (CV)0.57499496
Kurtosis-1.1873194
Mean49894.037
Median Absolute Deviation (MAD)24775
Skewness-0.00086459543
Sum4.9894037 × 108
Variance8.2304835 × 108
MonotonicityNot monotonic
2023-12-12T22:57:26.151968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
70122 1
 
< 0.1%
16951 1
 
< 0.1%
38419 1
 
< 0.1%
20717 1
 
< 0.1%
98659 1
 
< 0.1%
30380 1
 
< 0.1%
26253 1
 
< 0.1%
29840 1
 
< 0.1%
87556 1
 
< 0.1%
58549 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
10 1
< 0.1%
11 1
< 0.1%
14 1
< 0.1%
27 1
< 0.1%
28 1
< 0.1%
76 1
< 0.1%
78 1
< 0.1%
80 1
< 0.1%
85 1
< 0.1%
87 1
< 0.1%
ValueCountFrequency (%)
99986 1
< 0.1%
99957 1
< 0.1%
99954 1
< 0.1%
99952 1
< 0.1%
99951 1
< 0.1%
99944 1
< 0.1%
99941 1
< 0.1%
99934 1
< 0.1%
99920 1
< 0.1%
99907 1
< 0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
6611 
3389 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
6611
66.1%
3389
33.9%

Length

2023-12-12T22:57:26.277854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:26.364154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6611
66.1%
3389
33.9%

연령대
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
7189 
30
2649 
40
 
128
50
 
27
60
 
7

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row30
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 7189
71.9%
30 2649
 
26.5%
40 128
 
1.3%
50 27
 
0.3%
60 7
 
0.1%

Length

2023-12-12T22:57:26.474462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:26.590249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 7189
71.9%
30 2649
 
26.5%
40 128
 
1.3%
50 27
 
0.3%
60 7
 
0.1%

응시지역
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
6160 
부산광역시
1118 
대구광역시
871 
광주광역시
710 
대전광역시
653 
Other values (2)
 
488

Length

Max length5
Median length5
Mean length4.8541
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 6160
61.6%
부산광역시 1118
 
11.2%
대구광역시 871
 
8.7%
광주광역시 710
 
7.1%
대전광역시 653
 
6.5%
전주 483
 
4.8%
제주도 5
 
0.1%

Length

2023-12-12T22:57:26.724510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:26.861808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 6160
61.6%
부산광역시 1118
 
11.2%
대구광역시 871
 
8.7%
광주광역시 710
 
7.1%
대전광역시 653
 
6.5%
전주 483
 
4.8%
제주도 5
 
< 0.1%

졸업여부
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업예정
9325 
졸업
 
675

Length

Max length4
Median length4
Mean length3.865
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row졸업예정
3rd row졸업예정
4th row졸업예정
5th row졸업예정

Common Values

ValueCountFrequency (%)
졸업예정 9325
93.2%
졸업 675
 
6.8%

Length

2023-12-12T22:57:26.993449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:27.084031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업예정 9325
93.2%
졸업 675
 
6.8%

합격여부
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
9363 
불합격
 
520
결시
 
59
응시결격
 
49
면제포기
 
9

Length

Max length4
Median length2
Mean length2.0636
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row응시결격

Common Values

ValueCountFrequency (%)
합격 9363
93.6%
불합격 520
 
5.2%
결시 59
 
0.6%
응시결격 49
 
0.5%
면제포기 9
 
0.1%

Length

2023-12-12T22:57:27.176639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T22:57:27.276046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 9363
93.6%
불합격 520
 
5.2%
결시 59
 
0.6%
응시결격 49
 
0.5%
면제포기 9
 
0.1%

학교소재지
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
3121 
부산광역시
844 
광주광역시
809 
강원도
764 
대구광역시
644 
Other values (28)
3818 

Length

Max length8
Median length5
Mean length4.3969
Min length2

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st row서울특별시
2nd row대구광역시
3rd row강원도
4th row경상북도
5th row강원도

Common Values

ValueCountFrequency (%)
서울특별시 3121
31.2%
부산광역시 844
 
8.4%
광주광역시 809
 
8.1%
강원도 764
 
7.6%
대구광역시 644
 
6.4%
전라북도 596
 
6.0%
충청남도 509
 
5.1%
대전광역시 471
 
4.7%
경기도 467
 
4.7%
경상남도 422
 
4.2%
Other values (23) 1353
13.5%

Length

2023-12-12T22:57:27.382874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 3121
31.2%
부산광역시 844
 
8.4%
광주광역시 809
 
8.1%
강원도 764
 
7.6%
대구광역시 644
 
6.4%
전라북도 596
 
6.0%
충청남도 509
 
5.1%
대전광역시 471
 
4.7%
경기도 467
 
4.7%
경상남도 422
 
4.2%
Other values (23) 1353
13.5%

Interactions

2023-12-12T22:57:24.296210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:23.661652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:23.987243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:24.393407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:23.770618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:24.093730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:24.499592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:23.883182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:57:24.197589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:57:27.465580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0000.8631.0000.9840.0870.2180.2110.0970.1650.223
직종0.8631.0000.8630.9610.0410.1470.3890.0400.1230.280
회차1.0000.8631.0000.9850.0980.2240.2160.0920.1830.225
일련번호0.9840.9610.9851.0000.0960.2200.3290.0940.1830.229
성별0.0870.0410.0980.0961.0000.0340.0260.1540.0890.164
연령대0.2180.1470.2240.2200.0341.0000.0160.3020.4440.516
응시지역0.2110.3890.2160.3290.0260.0161.0000.0420.0210.925
졸업여부0.0970.0400.0920.0940.1540.3020.0421.0000.3300.345
합격여부0.1650.1230.1830.1830.0890.4440.0210.3301.0000.203
학교소재지0.2230.2800.2250.2290.1640.5160.9250.3450.2031.000
2023-12-12T22:57:27.578308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령대성별학교소재지합격여부직종응시지역졸업여부
연령대1.0000.0410.2740.1790.1110.0100.369
성별0.0411.0000.1390.1080.0680.0280.098
학교소재지0.2740.1391.0000.0970.1340.7110.293
합격여부0.1790.1080.0971.0000.0920.0130.403
직종0.1110.0680.1340.0921.0000.2840.067
응시지역0.0100.0280.7110.0130.2841.0000.045
졸업여부0.3690.0980.2930.4030.0670.0451.000
2023-12-12T22:57:27.686295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호직종성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.7550.7890.0750.0950.1110.0700.0770.081
회차1.0001.0000.7550.7890.0750.0950.1110.0700.0770.081
일련번호0.7550.7551.0000.9610.0730.0930.1730.0720.0770.083
직종0.7890.7890.9611.0000.0680.1110.2840.0670.0920.134
성별0.0750.0750.0730.0681.0000.0410.0280.0980.1080.139
연령대0.0950.0950.0930.1110.0411.0000.0100.3690.1790.274
응시지역0.1110.1110.1730.2840.0280.0101.0000.0450.0130.711
졸업여부0.0700.0700.0720.0670.0980.3690.0451.0000.4030.293
합격여부0.0770.0770.0770.0920.1080.1790.0130.4031.0000.097
학교소재지0.0810.0810.0830.1340.1390.2740.7110.2930.0971.000

Missing values

2023-12-12T22:57:24.627235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:57:24.769879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
701212021의사(필기)857012220서울특별시졸업예정합격서울특별시
978062015의사(실기)799780730서울특별시졸업예정합격대구광역시
188782006의사701887920서울특별시졸업예정합격강원도
698762021의사(필기)856987720서울특별시졸업예정합격경상북도
76802003의사67768120서울특별시졸업예정응시결격강원도
872302012의사(실기)768723130서울특별시졸업예정합격경상북도
769832023의사(필기)877698420서울특별시졸업예정합격서울특별시
455692013의사(필기)774557020부산광역시졸업예정합격부산광역시
122682004의사681226920서울특별시졸업예정합격서울특별시
255342007의사712553530대전광역시졸업예정합격충청북도
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
535102015의사(필기)795351120전주졸업예정합격전라북도
569122017의사(필기)815691320서울특별시졸업예정합격강원도
662202019의사(필기)836622120대전광역시졸업예정합격대전광역시
443632013의사(필기)774436420서울특별시졸업예정합격서울특별시
813772010의사(실기)748137820서울특별시졸업예정합격대전광역시
902012013의사(실기)779020230서울특별시졸업예정합격경기도
817272010의사(실기)748172830서울특별시졸업예정합격부산광역시
768112023의사(필기)877681230서울특별시졸업예정합격서울특별시
158052005의사691580630서울특별시졸업예정합격강원도
574732017의사(필기)815747420서울특별시졸업예정합격서울특별시