Overview

Dataset statistics

Number of variables12
Number of observations2518
Missing cells4787
Missing cells (%)15.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory241.1 KiB
Average record size in memory98.1 B

Variable types

Unsupported4
Categorical5
Text3

Dataset

Description전국 전문대학의 입시관련 정보 제공
Author한국전문대학교육협의회
URLhttps://www.data.go.kr/data/3077936/fileData.do

Alerts

학과분류 is highly overall correlated with Unnamed: 5High correlation
Unnamed: 5 is highly overall correlated with 학과분류 and 1 other fieldsHigh correlation
학제 is highly overall correlated with Unnamed: 5High correlation
설립 is highly imbalanced (87.1%)Imbalance
Unnamed: 0 has 2518 (100.0%) missing valuesMissing
Unnamed: 10 has 2249 (89.3%) missing valuesMissing
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysisUnsupported
입학정원 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 21:30:24.241993
Analysis finished2023-12-12 21:30:25.180832
Duration0.94 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2518
Missing (%)100.0%
Memory size22.3 KiB

지역
Categorical

Distinct18
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
경기
703 
경북
215 
부산
175 
서울
173 
전남
150 
Other values (13)
1102 

Length

Max length4
Median length2
Mean length2.0007943
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row경북
3rd row경북
4th row경북
5th row경북

Common Values

ValueCountFrequency (%)
경기 703
27.9%
경북 215
 
8.5%
부산 175
 
6.9%
서울 173
 
6.9%
전남 150
 
6.0%
대구 147
 
5.8%
대전 130
 
5.2%
경남 126
 
5.0%
충북 121
 
4.8%
강원 120
 
4.8%
Other values (8) 458
18.2%

Length

2023-12-13T06:30:25.251081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 703
27.9%
경북 215
 
8.5%
부산 175
 
6.9%
서울 173
 
6.9%
전남 150
 
6.0%
대구 147
 
5.8%
대전 130
 
5.2%
경남 126
 
5.0%
충북 121
 
4.8%
강원 120
 
4.8%
Other values (8) 458
18.2%

설립
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
사립
2422 
공립
 
83
국립
 
12
<NA>
 
1

Length

Max length4
Median length2
Mean length2.0007943
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row사립
3rd row사립
4th row사립
5th row사립

Common Values

ValueCountFrequency (%)
사립 2422
96.2%
공립 83
 
3.3%
국립 12
 
0.5%
<NA> 1
 
< 0.1%

Length

2023-12-13T06:30:25.369951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:30:25.473807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
사립 2422
96.2%
공립 83
 
3.3%
국립 12
 
0.5%
na 1
 
< 0.1%
Distinct136
Distinct (%)5.4%
Missing1
Missing (%)< 0.1%
Memory size19.8 KiB
2023-12-13T06:30:25.703228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length6.2777116
Min length4

Characters and Unicode

Total characters15801
Distinct characters121
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.3%

Sample

1st row가톨릭상지대학교
2nd row가톨릭상지대학교
3rd row가톨릭상지대학교
4th row가톨릭상지대학교
5th row가톨릭상지대학교
ValueCountFrequency (%)
대덕대학교 36
 
1.4%
두원공과대학교 36
 
1.4%
충청대학교 33
 
1.3%
대전과학기술대학교 33
 
1.3%
신구대학교 33
 
1.3%
동서울대학교 31
 
1.2%
연성대학교 31
 
1.2%
수원과학대학교 31
 
1.2%
장안대학교 31
 
1.2%
우송정보대학 31
 
1.2%
Other values (126) 2191
87.0%
2023-12-13T06:30:26.068878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2828
17.9%
2755
17.4%
2358
 
14.9%
353
 
2.2%
323
 
2.0%
283
 
1.8%
258
 
1.6%
248
 
1.6%
242
 
1.5%
222
 
1.4%
Other values (111) 5931
37.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15801
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2828
17.9%
2755
17.4%
2358
 
14.9%
353
 
2.2%
323
 
2.0%
283
 
1.8%
258
 
1.6%
248
 
1.6%
242
 
1.5%
222
 
1.4%
Other values (111) 5931
37.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15801
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2828
17.9%
2755
17.4%
2358
 
14.9%
353
 
2.2%
323
 
2.0%
283
 
1.8%
258
 
1.6%
248
 
1.6%
242
 
1.5%
222
 
1.4%
Other values (111) 5931
37.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15801
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2828
17.9%
2755
17.4%
2358
 
14.9%
353
 
2.2%
323
 
2.0%
283
 
1.8%
258
 
1.6%
248
 
1.6%
242
 
1.5%
222
 
1.4%
Other values (111) 5931
37.5%

학과분류
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
인문사회계열
772 
자연과학계열
683 
공학계열
681 
예체능계열
381 
대분류
 
1

Length

Max length6
Median length6
Mean length5.3065925
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row대분류
2nd row공학계열
3rd row공학계열
4th row공학계열
5th row공학계열

Common Values

ValueCountFrequency (%)
인문사회계열 772
30.7%
자연과학계열 683
27.1%
공학계열 681
27.0%
예체능계열 381
15.1%
대분류 1
 
< 0.1%

Length

2023-12-13T06:30:26.183268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:30:26.266082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
인문사회계열 772
30.7%
자연과학계열 683
27.1%
공학계열 681
27.0%
예체능계열 381
15.1%
대분류 1
 
< 0.1%

Unnamed: 5
Categorical

HIGH CORRELATION 

Distinct50
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
사회과학
340 
간호보건
294 
경영,경제
232 
보건
162 
기계
146 
Other values (45)
1344 

Length

Max length10
Median length9
Mean length3.9583002
Min length2

Unique

Unique9 ?
Unique (%)0.4%

Sample

1st row중분류
2nd row기계
3rd row기계
4th row전기,전자
5th row전기,전자,컴퓨터

Common Values

ValueCountFrequency (%)
사회과학 340
13.5%
간호보건 294
11.7%
경영,경제 232
 
9.2%
보건 162
 
6.4%
기계 146
 
5.8%
컴퓨터,통신 140
 
5.6%
교육 139
 
5.5%
전기,전자,컴퓨터 128
 
5.1%
외식,영양 108
 
4.3%
응용예술 99
 
3.9%
Other values (40) 730
29.0%

Length

2023-12-13T06:30:26.363714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
사회과학 340
13.5%
간호보건 294
11.7%
경영,경제 232
 
9.2%
보건 162
 
6.4%
기계 146
 
5.8%
컴퓨터,통신 140
 
5.6%
교육 139
 
5.5%
전기,전자,컴퓨터 128
 
5.1%
외식,영양 108
 
4.3%
응용예술 99
 
3.9%
Other values (40) 730
29.0%
Distinct182
Distinct (%)7.2%
Missing4
Missing (%)0.2%
Memory size19.8 KiB
2023-12-13T06:30:26.603395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.0632458
Min length2

Characters and Unicode

Total characters10215
Distinct characters191
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)0.9%

Sample

1st row소분류
2nd row기계
3rd row자동차
4th row전기
5th row전자공학
ValueCountFrequency (%)
사회복지 112
 
4.5%
유아교육학 101
 
4.0%
임상보건 80
 
3.2%
피부미용 76
 
3.0%
관광학 63
 
2.5%
식품조리 56
 
2.2%
전기 51
 
2.0%
방송공연 51
 
2.0%
보건행정 50
 
2.0%
간호 49
 
1.9%
Other values (172) 1825
72.6%
2023-12-13T06:30:26.966205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
734
 
7.2%
355
 
3.5%
328
 
3.2%
287
 
2.8%
227
 
2.2%
219
 
2.1%
201
 
2.0%
197
 
1.9%
196
 
1.9%
189
 
1.9%
Other values (181) 7282
71.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10032
98.2%
Other Punctuation 175
 
1.7%
Lowercase Letter 4
 
< 0.1%
Dash Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
734
 
7.3%
355
 
3.5%
328
 
3.3%
287
 
2.9%
227
 
2.3%
219
 
2.2%
201
 
2.0%
197
 
2.0%
196
 
2.0%
189
 
1.9%
Other values (178) 7099
70.8%
Other Punctuation
ValueCountFrequency (%)
, 175
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10032
98.2%
Common 179
 
1.8%
Latin 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
734
 
7.3%
355
 
3.5%
328
 
3.3%
287
 
2.9%
227
 
2.3%
219
 
2.2%
201
 
2.0%
197
 
2.0%
196
 
2.0%
189
 
1.9%
Other values (178) 7099
70.8%
Common
ValueCountFrequency (%)
, 175
97.8%
- 4
 
2.2%
Latin
ValueCountFrequency (%)
e 4
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10032
98.2%
ASCII 183
 
1.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
734
 
7.3%
355
 
3.5%
328
 
3.3%
287
 
2.9%
227
 
2.3%
219
 
2.2%
201
 
2.0%
197
 
2.0%
196
 
2.0%
189
 
1.9%
Other values (178) 7099
70.8%
ASCII
ValueCountFrequency (%)
, 175
95.6%
e 4
 
2.2%
- 4
 
2.2%
Distinct1188
Distinct (%)47.2%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
2023-12-13T06:30:27.177501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length20
Mean length6.1199365
Min length3

Characters and Unicode

Total characters15410
Distinct characters342
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique939 ?
Unique (%)37.3%

Sample

1st row모집단위
2nd row철도운전시스템과
3rd row자동차모터스포츠과
4th row철도전기과
5th row부사관과
ValueCountFrequency (%)
유아교육과 87
 
3.4%
간호학과 77
 
3.0%
사회복지과 73
 
2.8%
치위생과 53
 
2.1%
물리치료과 37
 
1.4%
안경광학과 30
 
1.2%
보건행정과 29
 
1.1%
작업치료과 29
 
1.1%
세무회계과 26
 
1.0%
임상병리과 25
 
1.0%
Other values (1180) 2119
82.0%
2023-12-13T06:30:27.498195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2276
 
14.8%
397
 
2.6%
352
 
2.3%
335
 
2.2%
320
 
2.1%
302
 
2.0%
274
 
1.8%
271
 
1.8%
260
 
1.7%
256
 
1.7%
Other values (332) 10367
67.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15100
98.0%
Uppercase Letter 90
 
0.6%
Space Separator 66
 
0.4%
Open Punctuation 38
 
0.2%
Close Punctuation 38
 
0.2%
Other Punctuation 34
 
0.2%
Lowercase Letter 21
 
0.1%
Dash Punctuation 13
 
0.1%
Decimal Number 9
 
0.1%
Control 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2276
 
15.1%
397
 
2.6%
352
 
2.3%
335
 
2.2%
320
 
2.1%
302
 
2.0%
274
 
1.8%
271
 
1.8%
260
 
1.7%
256
 
1.7%
Other values (296) 10057
66.6%
Uppercase Letter
ValueCountFrequency (%)
I 21
23.3%
T 20
22.2%
P 8
 
8.9%
K 7
 
7.8%
C 6
 
6.7%
D 5
 
5.6%
S 5
 
5.6%
V 4
 
4.4%
O 3
 
3.3%
R 2
 
2.2%
Other values (6) 9
10.0%
Lowercase Letter
ValueCountFrequency (%)
e 8
38.1%
l 4
19.0%
i 2
 
9.5%
a 1
 
4.8%
c 1
 
4.8%
n 1
 
4.8%
v 1
 
4.8%
r 1
 
4.8%
u 1
 
4.8%
o 1
 
4.8%
Other Punctuation
ValueCountFrequency (%)
· 22
64.7%
& 10
29.4%
. 1
 
2.9%
; 1
 
2.9%
Space Separator
ValueCountFrequency (%)
66
100.0%
Open Punctuation
ValueCountFrequency (%)
( 38
100.0%
Close Punctuation
ValueCountFrequency (%)
) 38
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%
Decimal Number
ValueCountFrequency (%)
3 9
100.0%
Control
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15100
98.0%
Common 199
 
1.3%
Latin 111
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2276
 
15.1%
397
 
2.6%
352
 
2.3%
335
 
2.2%
320
 
2.1%
302
 
2.0%
274
 
1.8%
271
 
1.8%
260
 
1.7%
256
 
1.7%
Other values (296) 10057
66.6%
Latin
ValueCountFrequency (%)
I 21
18.9%
T 20
18.0%
P 8
 
7.2%
e 8
 
7.2%
K 7
 
6.3%
C 6
 
5.4%
D 5
 
4.5%
S 5
 
4.5%
V 4
 
3.6%
l 4
 
3.6%
Other values (16) 23
20.7%
Common
ValueCountFrequency (%)
66
33.2%
( 38
19.1%
) 38
19.1%
· 22
 
11.1%
- 13
 
6.5%
& 10
 
5.0%
3 9
 
4.5%
1
 
0.5%
. 1
 
0.5%
; 1
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15095
98.0%
ASCII 288
 
1.9%
None 22
 
0.1%
Compat Jamo 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2276
 
15.1%
397
 
2.6%
352
 
2.3%
335
 
2.2%
320
 
2.1%
302
 
2.0%
274
 
1.8%
271
 
1.8%
260
 
1.7%
256
 
1.7%
Other values (295) 10052
66.6%
ASCII
ValueCountFrequency (%)
66
22.9%
( 38
13.2%
) 38
13.2%
I 21
 
7.3%
T 20
 
6.9%
- 13
 
4.5%
& 10
 
3.5%
3 9
 
3.1%
P 8
 
2.8%
e 8
 
2.8%
Other values (25) 57
19.8%
None
ValueCountFrequency (%)
· 22
100.0%
Compat Jamo
ValueCountFrequency (%)
5
100.0%

학제
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size19.8 KiB
2
1680 
3
746 
4
 
91
<NA>
 
1

Length

Max length4
Median length1
Mean length1.0011914
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row2
3rd row2
4th row3
5th row2

Common Values

ValueCountFrequency (%)
2 1680
66.7%
3 746
29.6%
4 91
 
3.6%
<NA> 1
 
< 0.1%

Length

2023-12-13T06:30:27.618050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:30:27.704248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 1680
66.7%
3 746
29.6%
4 91
 
3.6%
na 1
 
< 0.1%

입학정원
Unsupported

REJECTED  UNSUPPORTED 

Missing15
Missing (%)0.6%
Memory size19.8 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2249
Missing (%)89.3%
Memory size19.8 KiB

Unnamed: 11
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size19.8 KiB

Correlations

2023-12-13T06:30:27.757771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역설립학과분류Unnamed: 5학제
지역1.0000.3570.3120.4680.289
설립0.3571.0000.0700.2630.266
학과분류0.3120.0701.0000.9990.292
Unnamed: 50.4680.2630.9991.0000.821
학제0.2890.2660.2920.8211.000
2023-12-13T06:30:27.833157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설립학과분류지역Unnamed: 5학제
설립1.0000.0660.2060.1240.087
학과분류0.0661.0000.1780.9820.280
지역0.2060.1781.0000.1400.162
Unnamed: 50.1240.9820.1401.0000.569
학제0.0870.2800.1620.5691.000
2023-12-13T06:30:27.914959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역설립학과분류Unnamed: 5학제
지역1.0000.2060.1780.1400.162
설립0.2061.0000.0660.1240.087
학과분류0.1780.0661.0000.9820.280
Unnamed: 50.1400.1240.9821.0000.569
학제0.1620.0870.2800.5691.000

Missing values

2023-12-13T06:30:24.837857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:30:24.984658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T06:30:25.109834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0지역설립대학명학과분류Unnamed: 5Unnamed: 6Unnamed: 7학제입학정원Unnamed: 10Unnamed: 11
0<NA><NA><NA><NA>대분류중분류소분류모집단위<NA>
1<NA>경북사립가톨릭상지대학교공학계열기계기계철도운전시스템과230NaN30
2<NA>경북사립가톨릭상지대학교공학계열기계자동차자동차모터스포츠과230NaN30
3<NA>경북사립가톨릭상지대학교공학계열전기,전자전기철도전기과330NaN30
4<NA>경북사립가톨릭상지대학교공학계열전기,전자,컴퓨터전자공학부사관과230NaN30
5<NA>경북사립가톨릭상지대학교공학계열컴퓨터,통신정보통신전자통신과230NaN30
6<NA>경북사립가톨릭상지대학교인문사회계열경영,경제경영경영과2NaN2020
7<NA>경북사립가톨릭상지대학교인문사회계열경영,경제세무회계전산세무회계과225NaN25
8<NA>경북사립가톨릭상지대학교인문사회계열교육유아교육학유아교육과383NaN83
9<NA>경북사립가톨릭상지대학교인문사회계열사회과학사회복지사회복지과28025105
Unnamed: 0지역설립대학명학과분류Unnamed: 5Unnamed: 6Unnamed: 7학제입학정원Unnamed: 10Unnamed: 11
2508<NA>경북사립호산대학교예체능계열뷰티피부헤어뷰티디자인과230NaN30
2509<NA>경북사립호산대학교예체능계열응용예술방송공연뮤지컬과230NaN30
2510<NA>경북사립호산대학교예체능계열응용예술방송공연연기과320NaN20
2511<NA>경북사립호산대학교인문사회계열교육복지휴먼복지서비스학부275130205
2512<NA>경북사립호산대학교인문사회계열교육유아교육학유아교육과345NaN45
2513<NA>경북사립호산대학교자연과학계열간호간호학간호학과4148NaN148
2514<NA>경북사립호산대학교자연과학계열보건보건관리의료융합학부330NaN30
2515<NA>경북사립호산대학교자연과학계열보건임상보건물리치료과330NaN30
2516<NA>경북사립호산대학교자연과학계열보건임상보건방사선과326NaN26
2517<NA>경북사립호산대학교자연과학계열외식,영양식품조리호텔외식조리과235NaN35