Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric3
Categorical7

Dataset

Description치과의사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060456/fileData.do

Alerts

연도 is highly overall correlated with 회차 and 2 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
직종 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
졸업여부 is highly overall correlated with 학교소재지High correlation
학교소재지 is highly overall correlated with 졸업여부High correlation
직종 is highly imbalanced (55.2%)Imbalance
응시지역 is highly imbalanced (57.5%)Imbalance
졸업여부 is highly imbalanced (64.3%)Imbalance
합격여부 is highly imbalanced (72.2%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:54:24.760893
Analysis finished2023-12-12 04:54:27.265713
Duration2.5 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.5051
Minimum2000
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:54:27.315509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2001
Q12005
median2011
Q32018
95-th percentile2023
Maximum2023
Range23
Interquartile range (IQR)13

Descriptive statistics

Standard deviation7.4081889
Coefficient of variation (CV)0.0036829083
Kurtosis-1.3214149
Mean2011.5051
Median Absolute Deviation (MAD)7
Skewness0.062865916
Sum20115051
Variance54.881262
MonotonicityNot monotonic
2023-12-12T13:54:27.418346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2023 694
 
6.9%
2022 666
 
6.7%
2001 495
 
5.0%
2005 492
 
4.9%
2000 481
 
4.8%
2003 476
 
4.8%
2004 460
 
4.6%
2002 442
 
4.4%
2007 420
 
4.2%
2008 394
 
3.9%
Other values (14) 4980
49.8%
ValueCountFrequency (%)
2000 481
4.8%
2001 495
5.0%
2002 442
4.4%
2003 476
4.8%
2004 460
4.6%
2005 492
4.9%
2006 381
3.8%
2007 420
4.2%
2008 394
3.9%
2009 336
3.4%
ValueCountFrequency (%)
2023 694
6.9%
2022 666
6.7%
2021 330
3.3%
2020 346
3.5%
2019 385
3.9%
2018 349
3.5%
2017 363
3.6%
2016 357
3.6%
2015 328
3.3%
2014 358
3.6%

직종
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
치과의사
8640 
치과의사(필기)
 
701
치과의사(실기)
 
659

Length

Max length8
Median length4
Mean length4.544
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row치과의사
2nd row치과의사
3rd row치과의사
4th row치과의사(실기)
5th row치과의사

Common Values

ValueCountFrequency (%)
치과의사 8640
86.4%
치과의사(필기) 701
 
7.0%
치과의사(실기) 659
 
6.6%

Length

2023-12-12T13:54:27.541672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:27.639647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
치과의사 8640
86.4%
치과의사(필기 701
 
7.0%
치과의사(실기 659
 
6.6%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.5051
Minimum52
Maximum75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:54:27.731556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum52
5-th percentile53
Q157
median63
Q370
95-th percentile75
Maximum75
Range23
Interquartile range (IQR)13

Descriptive statistics

Standard deviation7.4081889
Coefficient of variation (CV)0.11665502
Kurtosis-1.3214149
Mean63.5051
Median Absolute Deviation (MAD)7
Skewness0.062865916
Sum635051
Variance54.881262
MonotonicityNot monotonic
2023-12-12T13:54:27.857531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
75 694
 
6.9%
74 666
 
6.7%
53 495
 
5.0%
57 492
 
4.9%
52 481
 
4.8%
55 476
 
4.8%
56 460
 
4.6%
54 442
 
4.4%
59 420
 
4.2%
60 394
 
3.9%
Other values (14) 4980
49.8%
ValueCountFrequency (%)
52 481
4.8%
53 495
5.0%
54 442
4.4%
55 476
4.8%
56 460
4.6%
57 492
4.9%
58 381
3.8%
59 420
4.2%
60 394
3.9%
61 336
3.4%
ValueCountFrequency (%)
75 694
6.9%
74 666
6.7%
73 330
3.3%
72 346
3.5%
71 385
3.9%
70 349
3.5%
69 363
3.6%
68 357
3.6%
67 328
3.3%
66 358
3.6%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11474.475
Minimum3
Maximum22859
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:54:28.000057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile1130.9
Q15803.75
median11582.5
Q317193.75
95-th percentile21685.15
Maximum22859
Range22856
Interquartile range (IQR)11390

Descriptive statistics

Standard deviation6598.6811
Coefficient of variation (CV)0.57507476
Kurtosis-1.2006728
Mean11474.475
Median Absolute Deviation (MAD)5695
Skewness-0.018126775
Sum1.1474475 × 108
Variance43542592
MonotonicityNot monotonic
2023-12-12T13:54:28.156936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10191 1
 
< 0.1%
1711 1
 
< 0.1%
5891 1
 
< 0.1%
1322 1
 
< 0.1%
5966 1
 
< 0.1%
2051 1
 
< 0.1%
20407 1
 
< 0.1%
16092 1
 
< 0.1%
19019 1
 
< 0.1%
13456 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
3 1
< 0.1%
4 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
15 1
< 0.1%
ValueCountFrequency (%)
22859 1
< 0.1%
22852 1
< 0.1%
22848 1
< 0.1%
22840 1
< 0.1%
22833 1
< 0.1%
22832 1
< 0.1%
22831 1
< 0.1%
22830 1
< 0.1%
22828 1
< 0.1%
22826 1
< 0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
6801 
3199 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
6801
68.0%
3199
32.0%

Length

2023-12-12T13:54:28.327245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:28.442331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6801
68.0%
3199
32.0%

연령대
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
5731 
30
3759 
40
 
455
50
 
44
60
 
11

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30
2nd row20
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 5731
57.3%
30 3759
37.6%
40 455
 
4.5%
50 44
 
0.4%
60 11
 
0.1%

Length

2023-12-12T13:54:28.581102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:28.708882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 5731
57.3%
30 3759
37.6%
40 455
 
4.5%
50 44
 
0.4%
60 11
 
0.1%

응시지역
Categorical

IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
7812 
광주광역시
 
673
전주
 
604
부산광역시
 
344
대구광역시
 
268
Other values (3)
 
299

Length

Max length5
Median length5
Mean length4.7297
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 7812
78.1%
광주광역시 673
 
6.7%
전주 604
 
6.0%
부산광역시 344
 
3.4%
대구광역시 268
 
2.7%
원주 167
 
1.7%
성남 130
 
1.3%
대전광역시 2
 
< 0.1%

Length

2023-12-12T13:54:28.863636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:29.019183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 7812
78.1%
광주광역시 673
 
6.7%
전주 604
 
6.0%
부산광역시 344
 
3.4%
대구광역시 268
 
2.7%
원주 167
 
1.7%
성남 130
 
1.3%
대전광역시 2
 
< 0.1%

졸업여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업예정
8795 
졸업
1144 
 
61

Length

Max length4
Median length4
Mean length3.7529
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row졸업예정
3rd row졸업예정
4th row졸업예정
5th row졸업예정

Common Values

ValueCountFrequency (%)
졸업예정 8795
87.9%
졸업 1144
 
11.4%
61
 
0.6%

Length

2023-12-12T13:54:29.162412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:29.283754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업예정 8795
88.5%
졸업 1144
 
11.5%

합격여부
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
8934 
불합격
932 
결시
 
116
응시결격
 
18

Length

Max length4
Median length2
Mean length2.0968
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row합격

Common Values

ValueCountFrequency (%)
합격 8934
89.3%
불합격 932
 
9.3%
결시 116
 
1.2%
응시결격 18
 
0.2%

Length

2023-12-12T13:54:29.449558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:54:29.621152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 8934
89.3%
불합격 932
 
9.3%
결시 116
 
1.2%
응시결격 18
 
0.2%

학교소재지
Categorical

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
2839 
광주광역시
1888 
전라북도
1314 
부산광역시
907 
충청남도
891 
Other values (19)
2161 

Length

Max length5
Median length5
Mean length4.4536
Min length2

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row서울특별시
2nd row광주광역시
3rd row부산광역시
4th row광주광역시
5th row대구광역시

Common Values

ValueCountFrequency (%)
서울특별시 2839
28.4%
광주광역시 1888
18.9%
전라북도 1314
13.1%
부산광역시 907
 
9.1%
충청남도 891
 
8.9%
대구광역시 735
 
7.3%
필리핀 522
 
5.2%
강원도 460
 
4.6%
전주 236
 
2.4%
미국 110
 
1.1%
Other values (14) 98
 
1.0%

Length

2023-12-12T13:54:29.762051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 2839
28.4%
광주광역시 1888
18.9%
전라북도 1314
13.1%
부산광역시 907
 
9.1%
충청남도 891
 
8.9%
대구광역시 735
 
7.3%
필리핀 522
 
5.2%
강원도 460
 
4.6%
전주 236
 
2.4%
미국 110
 
1.1%
Other values (14) 98
 
1.0%

Interactions

2023-12-12T13:54:26.548463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.014211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.289086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.623461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.096714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.374941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.709871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.190735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:54:26.458158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:54:29.886245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0000.7501.0000.9810.0830.3530.4820.2820.2510.338
직종0.7501.0000.7500.8060.0250.0980.5030.1840.0660.186
회차1.0000.7501.0000.9820.0890.3540.4850.2940.2650.347
일련번호0.9810.8060.9821.0000.0850.3400.5280.3260.2750.376
성별0.0830.0250.0890.0851.0000.0970.0480.0710.1910.175
연령대0.3530.0980.3540.3400.0971.0000.3840.4230.3140.553
응시지역0.4820.5030.4850.5280.0480.3841.0000.1300.1470.851
졸업여부0.2820.1840.2940.3260.0710.4230.1301.0000.4580.798
합격여부0.2510.0660.2650.2750.1910.3140.1470.4581.0000.643
학교소재지0.3380.1860.3470.3760.1750.5530.8510.7980.6431.000
2023-12-12T13:54:30.067967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
직종응시지역성별합격여부학교소재지졸업여부연령대
직종1.0000.3700.0420.0620.0860.0570.073
응시지역0.3701.0000.0360.0660.4900.0820.247
성별0.0420.0361.0000.1260.1390.1170.119
합격여부0.0620.0660.1261.0000.3570.4530.261
학교소재지0.0860.4900.1390.3571.0000.5390.305
졸업여부0.0570.0820.1170.4530.5391.0000.354
연령대0.0730.2470.1190.2610.3050.3541.000
2023-12-12T13:54:30.213980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호직종성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9980.6210.0680.1550.2560.1840.1610.134
회차1.0001.0000.9980.6210.0680.1550.2560.1840.1610.134
일련번호0.9980.9981.0000.7000.0650.1480.2860.2070.1670.147
직종0.6210.6210.7001.0000.0420.0730.3700.0570.0620.086
성별0.0680.0680.0650.0421.0000.1190.0360.1170.1260.139
연령대0.1550.1550.1480.0730.1191.0000.2470.3540.2610.305
응시지역0.2560.2560.2860.3700.0360.2471.0000.0820.0660.490
졸업여부0.1840.1840.2070.0570.1170.3540.0821.0000.4530.539
합격여부0.1610.1610.1670.0620.1260.2610.0660.4531.0000.357
학교소재지0.1340.1340.1470.0860.1390.3050.4900.5390.3571.000

Missing values

2023-12-12T13:54:27.074610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:54:27.205515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
101902010치과의사621019130서울특별시졸업예정합격서울특별시
124782012치과의사641247920서울특별시졸업예정합격광주광역시
71582006치과의사58715920서울특별시졸업예정합격부산광역시
222792023치과의사(실기)752228020서울특별시졸업예정합격광주광역시
77152007치과의사59771620서울특별시졸업예정합격대구광역시
61072005치과의사57610820서울특별시졸업합격충청남도
164192017치과의사691642020전주졸업예정합격전라북도
131602013치과의사651316130서울특별시졸업예정합격전주
101022009치과의사611010320서울특별시졸업예정합격서울특별시
2802000치과의사5228140서울특별시불합격필리핀
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
190292021치과의사731903020서울특별시졸업예정합격충청남도
192622021치과의사731926320서울특별시졸업예정합격서울특별시
143052015치과의사671430620서울특별시졸업예정합격충청남도
20382001치과의사53203920서울특별시졸업예정불합격전라북도
43912004치과의사56439240서울특별시졸업결시전라북도
147102015치과의사671471130광주광역시졸업예정합격광주광역시
22052002치과의사54220630서울특별시졸업불합격필리핀
29532002치과의사54295420서울특별시졸업예정합격서울특별시
173132018치과의사701731420전주졸업예정합격전라북도
145032015치과의사671450420서울특별시졸업예정합격서울특별시