Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory898.4 KiB
Average record size in memory92.0 B

Variable types

Numeric4
Categorical6

Dataset

Description약사 국가시험 응시자의 현황을 분석할 수 있는 정보(연도, 직종, 회차, 성별, 연령대, 응시지역, 졸업여부, 합격여부, 학교소재지)를 개인을 식별할 수 없는 형태로 제공합니다.
URLhttps://www.data.go.kr/data/15060460/fileData.do

Alerts

연도 is highly overall correlated with 회차 and 2 other fieldsHigh correlation
회차 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
일련번호 is highly overall correlated with 연도 and 2 other fieldsHigh correlation
직종 is highly overall correlated with 연도 and 3 other fieldsHigh correlation
응시지역 is highly overall correlated with 직종 and 1 other fieldsHigh correlation
학교소재지 is highly overall correlated with 응시지역High correlation
응시지역 is highly imbalanced (52.3%)Imbalance
졸업여부 is highly imbalanced (55.1%)Imbalance
합격여부 is highly imbalanced (58.9%)Imbalance
일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 19:11:45.375066
Analysis finished2023-12-12 19:11:49.340506
Duration3.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

HIGH CORRELATION 

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.7886
Minimum2000
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:11:49.409396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2000
5-th percentile2000
Q12005
median2012
Q32019
95-th percentile2023
Maximum2023
Range23
Interquartile range (IQR)14

Descriptive statistics

Standard deviation7.3426359
Coefficient of variation (CV)0.0036498049
Kurtosis-1.3233147
Mean2011.7886
Median Absolute Deviation (MAD)7
Skewness-0.082270538
Sum20117886
Variance53.914301
MonotonicityNot monotonic
2023-12-13T04:11:49.585423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2000 700
 
7.0%
2020 570
 
5.7%
2023 508
 
5.1%
2018 504
 
5.0%
2019 500
 
5.0%
2021 481
 
4.8%
2017 475
 
4.8%
2012 475
 
4.8%
2022 462
 
4.6%
2016 451
 
4.5%
Other values (14) 4874
48.7%
ValueCountFrequency (%)
2000 700
7.0%
2001 368
3.7%
2002 359
3.6%
2003 371
3.7%
2004 382
3.8%
2005 365
3.6%
2006 422
4.2%
2007 393
3.9%
2008 386
3.9%
2009 401
4.0%
ValueCountFrequency (%)
2023 508
5.1%
2022 462
4.6%
2021 481
4.8%
2020 570
5.7%
2019 500
5.0%
2018 504
5.0%
2017 475
4.8%
2016 451
4.5%
2015 426
4.3%
2014 82
 
0.8%

직종
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
약사(4년제)
5677 
약사(6년제)
4323 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row약사(4년제)
2nd row약사(4년제)
3rd row약사(6년제)
4th row약사(4년제)
5th row약사(4년제)

Common Values

ValueCountFrequency (%)
약사(4년제) 5677
56.8%
약사(6년제) 4323
43.2%

Length

2023-12-13T04:11:50.076572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:11:50.176227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
약사(4년제 5677
56.8%
약사(6년제 4323
43.2%

회차
Real number (ℝ)

HIGH CORRELATION 

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.7477
Minimum50
Maximum74
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:11:50.291126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile51
Q156
median63
Q370
95-th percentile74
Maximum74
Range24
Interquartile range (IQR)14

Descriptive statistics

Standard deviation7.4106637
Coefficient of variation (CV)0.11810256
Kurtosis-1.2857631
Mean62.7477
Median Absolute Deviation (MAD)7
Skewness-0.10903165
Sum627477
Variance54.917937
MonotonicityNot monotonic
2023-12-13T04:11:50.424857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
71 570
 
5.7%
74 508
 
5.1%
69 504
 
5.0%
70 500
 
5.0%
72 481
 
4.8%
63 475
 
4.8%
68 475
 
4.8%
73 462
 
4.6%
67 451
 
4.5%
66 426
 
4.3%
Other values (15) 5148
51.5%
ValueCountFrequency (%)
50 409
4.1%
51 291
2.9%
52 368
3.7%
53 359
3.6%
54 371
3.7%
55 382
3.8%
56 365
3.6%
57 422
4.2%
58 393
3.9%
59 386
3.9%
ValueCountFrequency (%)
74 508
5.1%
73 462
4.6%
72 481
4.8%
71 570
5.7%
70 500
5.0%
69 504
5.0%
68 475
4.8%
67 451
4.5%
66 426
4.3%
65 82
 
0.8%

일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20925.55
Minimum3
Maximum41599
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:11:50.593414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile2060.8
Q110580.5
median20881
Q331438.25
95-th percentile39602.25
Maximum41599
Range41596
Interquartile range (IQR)20857.75

Descriptive statistics

Standard deviation12031.986
Coefficient of variation (CV)0.57499018
Kurtosis-1.2043298
Mean20925.55
Median Absolute Deviation (MAD)10425.5
Skewness-0.0088931455
Sum2.092555 × 108
Variance1.4476869 × 108
MonotonicityNot monotonic
2023-12-13T04:11:50.771305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12885 1
 
< 0.1%
33302 1
 
< 0.1%
40997 1
 
< 0.1%
16732 1
 
< 0.1%
30492 1
 
< 0.1%
39560 1
 
< 0.1%
15112 1
 
< 0.1%
11507 1
 
< 0.1%
32401 1
 
< 0.1%
9827 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
3 1
< 0.1%
5 1
< 0.1%
7 1
< 0.1%
10 1
< 0.1%
11 1
< 0.1%
19 1
< 0.1%
20 1
< 0.1%
22 1
< 0.1%
24 1
< 0.1%
41 1
< 0.1%
ValueCountFrequency (%)
41599 1
< 0.1%
41598 1
< 0.1%
41595 1
< 0.1%
41590 1
< 0.1%
41588 1
< 0.1%
41579 1
< 0.1%
41569 1
< 0.1%
41564 1
< 0.1%
41560 1
< 0.1%
41558 1
< 0.1%

성별
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
6023 
3977 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
6023
60.2%
3977
39.8%

Length

2023-12-13T04:11:50.927491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:11:51.040421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
6023
60.2%
3977
39.8%

연령대
Real number (ℝ)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.566
Minimum20
Maximum70
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T04:11:51.135855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile20
Q120
median20
Q330
95-th percentile30
Maximum70
Range50
Interquartile range (IQR)10

Descriptive statistics

Standard deviation5.9587914
Coefficient of variation (CV)0.25285544
Kurtosis4.5708362
Mean23.566
Median Absolute Deviation (MAD)0
Skewness1.86007
Sum235660
Variance35.507195
MonotonicityNot monotonic
2023-12-13T04:11:51.290628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
20 6941
69.4%
30 2654
 
26.5%
40 320
 
3.2%
50 70
 
0.7%
60 13
 
0.1%
70 2
 
< 0.1%
ValueCountFrequency (%)
20 6941
69.4%
30 2654
 
26.5%
40 320
 
3.2%
50 70
 
0.7%
60 13
 
0.1%
70 2
 
< 0.1%
ValueCountFrequency (%)
70 2
 
< 0.1%
60 13
 
0.1%
50 70
 
0.7%
40 320
 
3.2%
30 2654
 
26.5%
20 6941
69.4%

응시지역
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
8029 
대전광역시
 
608
광주광역시
 
469
부산광역시
 
457
대구광역시
 
437

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울특별시
2nd row서울특별시
3rd row부산광역시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 8029
80.3%
대전광역시 608
 
6.1%
광주광역시 469
 
4.7%
부산광역시 457
 
4.6%
대구광역시 437
 
4.4%

Length

2023-12-13T04:11:51.433255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:11:51.546314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울특별시 8029
80.3%
대전광역시 608
 
6.1%
광주광역시 469
 
4.7%
부산광역시 457
 
4.6%
대구광역시 437
 
4.4%

졸업여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
졸업예정
8218 
졸업
1728 
 
54

Length

Max length4
Median length4
Mean length3.6382
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row졸업예정
2nd row졸업예정
3rd row졸업예정
4th row졸업
5th row졸업예정

Common Values

ValueCountFrequency (%)
졸업예정 8218
82.2%
졸업 1728
 
17.3%
54
 
0.5%

Length

2023-12-13T04:11:51.743493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:11:51.875256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
졸업예정 8218
82.6%
졸업 1728
 
17.4%

합격여부
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
합격
8265 
불합격
1235 
결시
 
493
응시결격
 
7

Length

Max length4
Median length2
Mean length2.1249
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row합격
2nd row합격
3rd row합격
4th row결시
5th row합격

Common Values

ValueCountFrequency (%)
합격 8265
82.7%
불합격 1235
 
12.3%
결시 493
 
4.9%
응시결격 7
 
0.1%

Length

2023-12-13T04:11:52.035303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:11:52.201830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
합격 8265
82.7%
불합격 1235
 
12.3%
결시 493
 
4.9%
응시결격 7
 
0.1%

학교소재지
Categorical

HIGH CORRELATION 

Distinct30
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
서울특별시
4192 
광주광역시
917 
경상북도
840 
부산광역시
752 
경기도
707 
Other values (25)
2592 

Length

Max length5
Median length5
Mean length4.4644
Min length2

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row충청북도
2nd row대전광역시
3rd row부산광역시
4th row서울특별시
5th row서울특별시

Common Values

ValueCountFrequency (%)
서울특별시 4192
41.9%
광주광역시 917
 
9.2%
경상북도 840
 
8.4%
부산광역시 752
 
7.5%
경기도 707
 
7.1%
전라북도 600
 
6.0%
충청북도 361
 
3.6%
대전광역시 291
 
2.9%
강원도 272
 
2.7%
필리핀 204
 
2.0%
Other values (20) 864
 
8.6%

Length

2023-12-13T04:11:52.362439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울특별시 4192
41.9%
광주광역시 917
 
9.2%
경상북도 840
 
8.4%
부산광역시 752
 
7.5%
경기도 707
 
7.1%
전라북도 600
 
6.0%
충청북도 361
 
3.6%
대전광역시 291
 
2.9%
강원도 272
 
2.7%
필리핀 204
 
2.0%
Other values (20) 864
 
8.6%

Interactions

2023-12-13T04:11:48.532504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:46.947408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.459649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.005205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.658096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.063216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.585149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.138529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.781129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.184597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.705655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.272429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.901440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.319597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:47.855467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T04:11:48.382131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T04:11:52.482277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
연도1.0000.9990.9940.9850.1130.1510.5680.2470.1130.366
직종0.9991.0000.9990.9970.0080.2380.4670.0950.3020.416
회차0.9940.9991.0000.9910.1260.1600.5780.2560.3140.374
일련번호0.9850.9970.9911.0000.1210.1570.5810.2580.3300.361
성별0.1130.0080.1260.1211.0000.2880.0790.0630.1690.383
연령대0.1510.2380.1600.1570.2881.0000.1560.5180.2850.571
응시지역0.5680.4670.5780.5810.0790.1561.0000.1330.1060.914
졸업여부0.2470.0950.2560.2580.0630.5180.1331.0000.3850.634
합격여부0.1130.3020.3140.3300.1690.2850.1060.3851.0000.419
학교소재지0.3660.4160.3740.3610.3830.5710.9140.6340.4191.000
2023-12-13T04:11:52.630488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별직종학교소재지합격여부졸업여부응시지역
성별1.0000.0050.3040.1120.1040.096
직종0.0051.0000.3310.2010.1580.567
학교소재지0.3040.3311.0000.2290.3740.658
합격여부0.1120.2010.2291.0000.3760.086
졸업여부0.1040.1580.3740.3761.0000.100
응시지역0.0960.5670.6580.0860.1001.000
2023-12-13T04:11:52.771460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도회차일련번호연령대직종성별응시지역졸업여부합격여부학교소재지
연도1.0001.0000.9990.1470.9760.0970.2770.1450.1700.128
회차1.0001.0000.9990.1460.9770.0960.2770.1580.1920.127
일련번호0.9990.9991.0000.1460.9500.0930.2780.1590.2020.122
연령대0.1470.1460.1461.0000.1710.2070.1060.2490.1870.268
직종0.9760.9770.9500.1711.0000.0050.5670.1580.2010.331
성별0.0970.0960.0930.2070.0051.0000.0960.1040.1120.304
응시지역0.2770.2770.2780.1060.5670.0961.0000.1000.0860.658
졸업여부0.1450.1580.1590.2490.1580.1040.1001.0000.3760.374
합격여부0.1700.1920.2020.1870.2010.1120.0860.3761.0000.229
학교소재지0.1280.1270.1220.2680.3310.3040.6580.3740.2291.000

Missing values

2023-12-13T04:11:49.072655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:11:49.267177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
128842007약사(4년제)581288520서울특별시졸업예정합격충청북도
176202010약사(4년제)611762120서울특별시졸업예정합격대전광역시
265842016약사(6년제)672658520부산광역시졸업예정합격부산광역시
183322010약사(4년제)611833340서울특별시졸업결시서울특별시
53902002약사(4년제)53539120서울특별시졸업예정합격서울특별시
233772014약사(4년제)652337830서울특별시졸업합격서울특별시
301052018약사(6년제)693010620서울특별시졸업예정합격서울특별시
24752000약사(4년제)51247620서울특별시졸업예정합격경기도
397342023약사(6년제)743973530서울특별시졸업예정합격강원도
346452020약사(6년제)713464620서울특별시졸업예정합격서울특별시
연도직종회차일련번호성별연령대응시지역졸업여부합격여부학교소재지
140152007약사(4년제)581401620서울특별시졸업예정합격서울특별시
37302001약사(4년제)52373130서울특별시졸업예정합격강원도
302542018약사(6년제)693025520서울특별시졸업예정합격서울특별시
21042000약사(4년제)51210530서울특별시졸업합격광주광역시
288612017약사(6년제)682886220광주광역시졸업예정합격광주광역시
398622023약사(6년제)743986320서울특별시졸업예정합격충청북도
206662011약사(4년제)622066720서울특별시졸업예정합격서울특별시
135762007약사(4년제)581357730서울특별시졸업예정합격광주광역시
215242012약사(4년제)632152520서울특별시졸업예정합격서울특별시
201432011약사(4년제)622014430서울특별시졸업예정합격충청북도