Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric3
Categorical7

Dataset

Description한의사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060457/fileData.do

Alerts

직종 has constant value ""Constant
연도 is highly overall correlated with 회차 and 1 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 1 other fieldsHigh correlation
응시지역 is highly overall correlated with 학교소재지High correlation
졸업여부 is highly overall correlated with 합격여부High correlation
합격여부 is highly overall correlated with 졸업여부High correlation
학교소재지 is highly overall correlated with 응시지역High correlation
졸업여부 is highly imbalanced (62.3%)Imbalance
합격여부 is highly imbalanced (80.2%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:32:45.078907
Analysis finished2023-12-12 04:32:47.618685
Duration2.54 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.1601
Minimum2000
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:32:47.705785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2001
Q12005
median2011
Q32017
95-th percentile2022
Maximum2023
Range23
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.7830722
Coefficient of variation (CV)0.0033727162
Kurtosis-1.1685061
Mean2011.1601
Median Absolute Deviation (MAD)6
Skewness0.099448615
Sum20111601
Variance46.010069
MonotonicityNot monotonic
2023-12-12T13:32:47.841118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2003 556
 
5.6%
2009 480
 
4.8%
2005 468
 
4.7%
2007 458
 
4.6%
2008 456
 
4.6%
2002 451
 
4.5%
2013 441
 
4.4%
2004 436
 
4.4%
2006 436
 
4.4%
2010 432
 
4.3%
Other values (14) 5386
53.9%
ValueCountFrequency (%)
2000 372
3.7%
2001 322
3.2%
2002 451
4.5%
2003 556
5.6%
2004 436
4.4%
2005 468
4.7%
2006 436
4.4%
2007 458
4.6%
2008 456
4.6%
2009 480
4.8%
ValueCountFrequency (%)
2023 388
3.9%
2022 360
3.6%
2021 391
3.9%
2020 372
3.7%
2019 368
3.7%
2018 404
4.0%
2017 406
4.1%
2016 359
3.6%
2015 416
4.2%
2014 406
4.1%

직종
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
한의사
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row한의사
2nd row한의사
3rd row한의사
4th row한의사
5th row한의사

Common Values

ValueCountFrequency (%)
한의사 10000
100.0%

Length

2023-12-12T13:32:48.003643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:48.124517image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
한의사 10000
100.0%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean66.1601
Minimum55
Maximum78
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:32:48.255556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum55
5-th percentile56
Q160
median66
Q372
95-th percentile77
Maximum78
Range23
Interquartile range (IQR)12

Descriptive statistics

Standard deviation6.7830722
Coefficient of variation (CV)0.10252512
Kurtosis-1.1685061
Mean66.1601
Median Absolute Deviation (MAD)6
Skewness0.099448615
Sum661601
Variance46.010069
MonotonicityNot monotonic
2023-12-12T13:32:48.410479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
58 556
 
5.6%
64 480
 
4.8%
60 468
 
4.7%
62 458
 
4.6%
63 456
 
4.6%
57 451
 
4.5%
68 441
 
4.4%
59 436
 
4.4%
61 436
 
4.4%
65 432
 
4.3%
Other values (14) 5386
53.9%
ValueCountFrequency (%)
55 372
3.7%
56 322
3.2%
57 451
4.5%
58 556
5.6%
59 436
4.4%
60 468
4.7%
61 436
4.4%
62 458
4.6%
63 456
4.6%
64 480
4.8%
ValueCountFrequency (%)
78 388
3.9%
77 360
3.6%
76 391
3.9%
75 372
3.7%
74 368
3.7%
73 404
4.0%
72 406
4.1%
71 359
3.6%
70 416
4.2%
69 406
4.1%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10365.039
Minimum1
Maximum20713
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T13:32:48.569073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1096.9
Q15177.75
median10318.5
Q315557.25
95-th percentile19636.15
Maximum20713
Range20712
Interquartile range (IQR)10379.5

Descriptive statistics

Standard deviation5958.2481
Coefficient of variation (CV)0.57484087
Kurtosis-1.2022473
Mean10365.039
Median Absolute Deviation (MAD)5192
Skewness0.0050898889
Sum1.0365039 × 108
Variance35500720
MonotonicityNot monotonic
2023-12-12T13:32:48.773186image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5375 1
 
< 0.1%
977 1
 
< 0.1%
10305 1
 
< 0.1%
10489 1
 
< 0.1%
4792 1
 
< 0.1%
10610 1
 
< 0.1%
8611 1
 
< 0.1%
1785 1
 
< 0.1%
13169 1
 
< 0.1%
3291 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
1 1
< 0.1%
3 1
< 0.1%
6 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
12 1
< 0.1%
15 1
< 0.1%
19 1
< 0.1%
22 1
< 0.1%
23 1
< 0.1%
ValueCountFrequency (%)
20713 1
< 0.1%
20712 1
< 0.1%
20711 1
< 0.1%
20708 1
< 0.1%
20706 1
< 0.1%
20705 1
< 0.1%
20704 1
< 0.1%
20702 1
< 0.1%
20698 1
< 0.1%
20697 1
< 0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
7162 
2838 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
7162
71.6%
2838
 
28.4%

Length

2023-12-12T13:32:48.904770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:49.009691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7162
71.6%
2838
 
28.4%

연령대
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
6761 
30
2656 
40
 
513
50
 
56
60
 
14

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row30
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 6761
67.6%
30 2656
 
26.6%
40 513
 
5.1%
50 56
 
0.6%
60 14
 
0.1%

Length

2023-12-12T13:32:49.136015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:49.287560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 6761
67.6%
30 2656
 
26.6%
40 513
 
5.1%
50 56
 
0.6%
60 14
 
0.1%

응시지역
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
7241 
전주
 
599
대구광역시
 
560
원주
 
499
부산광역시
 
460
Other values (2)
 
641

Length

Max length5
Median length5
Mean length4.6706
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row서울특별시
4th row서울특별시
5th row부산광역시

Common Values

ValueCountFrequency (%)
서울특별시 7241
72.4%
전주 599
 
6.0%
대구광역시 560
 
5.6%
원주 499
 
5.0%
부산광역시 460
 
4.6%
대전광역시 389
 
3.9%
광주광역시 252
 
2.5%

Length

2023-12-12T13:32:49.447224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:49.564197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 7241
72.4%
전주 599
 
6.0%
대구광역시 560
 
5.6%
원주 499
 
5.0%
부산광역시 460
 
4.6%
대전광역시 389
 
3.9%
광주광역시 252
 
2.5%

졸업여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업예정
9270 
졸업
 
730

Length

Max length4
Median length4
Mean length3.854
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row졸업예정
3rd row졸업예정
4th row졸업예정
5th row졸업예정

Common Values

ValueCountFrequency (%)
졸업예정 9270
92.7%
졸업 730
 
7.3%

Length

2023-12-12T13:32:49.689683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:49.803013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업예정 9270
92.7%
졸업 730
 
7.3%

합격여부
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
9327 
불합격
 
581
결시
 
91
응시결격
 
1

Length

Max length4
Median length2
Mean length2.0583
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row합격
2nd row합격
3rd row합격
4th row합격
5th row합격

Common Values

ValueCountFrequency (%)
합격 9327
93.3%
불합격 581
 
5.8%
결시 91
 
0.9%
응시결격 1
 
< 0.1%

Length

2023-12-12T13:32:49.905948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:32:50.002626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 9327
93.3%
불합격 581
 
5.8%
결시 91
 
0.9%
응시결격 1
 
< 0.1%

학교소재지
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
2520 
전라북도
1695 
경상북도
1398 
대전광역시
1044 
부산광역시
836 
Other values (6)
2507 

Length

Max length5
Median length4
Mean length4.3432
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전라북도
2nd row전라북도
3rd row서울특별시
4th row전라남도
5th row부산광역시

Common Values

ValueCountFrequency (%)
서울특별시 2520
25.2%
전라북도 1695
17.0%
경상북도 1398
14.0%
대전광역시 1044
10.4%
부산광역시 836
 
8.4%
강원도 812
 
8.1%
전라남도 625
 
6.2%
충청북도 574
 
5.7%
경기도 320
 
3.2%
대구광역시 172
 
1.7%

Length

2023-12-12T13:32:50.096972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 2520
25.2%
전라북도 1695
17.0%
경상북도 1398
14.0%
대전광역시 1044
10.4%
부산광역시 836
 
8.4%
강원도 812
 
8.1%
전라남도 625
 
6.2%
충청북도 574
 
5.7%
경기도 320
 
3.2%
대구광역시 172
 
1.7%

Interactions

2023-12-12T13:32:46.915793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.167662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.580623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:47.046954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.263744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.690721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:47.201952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.440067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:32:46.796038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:32:50.168648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9910.1930.1810.5470.1240.0950.303
회차1.0001.0000.9910.2020.1790.5490.1230.0970.304
일련번호0.9910.9911.0000.2040.1920.5440.1170.1030.326
성별0.1930.2020.2041.0000.0930.0920.1440.1430.072
연령대0.1810.1790.1920.0931.0000.1290.1950.1530.121
응시지역0.5470.5490.5440.0920.1291.0000.0710.0460.820
졸업여부0.1240.1230.1170.1440.1950.0711.0000.7140.108
합격여부0.0950.0970.1030.1430.1530.0460.7141.0000.099
학교소재지0.3030.3040.3260.0720.1210.8200.1080.0991.000
2023-12-12T13:32:50.269560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
응시지역성별합격여부학교소재지졸업여부연령대
응시지역1.0000.0980.0320.5940.0760.082
성별0.0981.0000.0950.0690.0920.114
합격여부0.0320.0951.0000.0600.5090.125
학교소재지0.5940.0690.0601.0000.1030.066
졸업여부0.0760.0920.5090.1031.0000.239
연령대0.0820.1140.1250.0660.2391.000
2023-12-12T13:32:50.361887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9990.1550.0750.3170.0940.0580.134
회차1.0001.0000.9990.1550.0750.3170.0940.0580.134
일련번호0.9990.9991.0000.1570.0810.3140.0900.0620.145
성별0.1550.1550.1571.0000.1140.0980.0920.0950.069
연령대0.0750.0750.0810.1141.0000.0820.2390.1250.066
응시지역0.3170.3170.3140.0980.0821.0000.0760.0320.594
졸업여부0.0940.0940.0900.0920.2390.0761.0000.5090.103
합격여부0.0580.0580.0620.0950.1250.0320.5091.0000.060
학교소재지0.1340.1340.1450.0690.0660.5940.1030.0601.000

Missing values

2023-12-12T13:32:47.343606image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:32:47.532483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
53742005한의사60537520서울특별시졸업예정합격전라북도
36152004한의사59361630서울특별시졸업예정합격전라북도
97642010한의사65976520서울특별시졸업예정합격서울특별시
56792006한의사61568020서울특별시졸업예정합격전라남도
138222015한의사701382320부산광역시졸업예정합격부산광역시
89342009한의사64893520서울특별시졸업예정합격전라북도
51442005한의사60514520서울특별시졸업예정합격전라북도
181562020한의사751815720전주졸업예정합격전라북도
28992003한의사58290020서울특별시졸업예정합격충청북도
159322017한의사721593340원주졸업예정합격충청북도
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
28822003한의사58288340서울특별시졸업예정합격충청북도
183912021한의사761839240서울특별시졸업불합격서울특별시
2042000한의사5520520서울특별시졸업예정합격전라남도
137802015한의사701378120서울특별시졸업예정합격서울특별시
32732003한의사58327420서울특별시졸업예정합격강원도
177162020한의사751771720서울특별시졸업예정합격서울특별시
78702008한의사63787120서울특별시졸업예정합격서울특별시
39192004한의사59392020서울특별시졸업예정합격강원도
193612022한의사771936220서울특별시졸업예정합격경기도
107822011한의사661078320서울특별시졸업예정합격전라북도