Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory712.9 KiB
Average record size in memory73.0 B

Variable types

Text2
Numeric1
Categorical5

Dataset

Description전국 초중등학교를 대상으로 학교코드, 학교명, 설립년도, 남녀공항구분 등 학교정보에 대한 데이터를 제공하고 있습니다.
Author한국교육학술정보원
URLhttps://www.data.go.kr/data/15123572/fileData.do

Alerts

설립년도 is highly overall correlated with 남녀공학구분명 and 1 other fieldsHigh correlation
남녀공학구분명 is highly overall correlated with 설립년도 and 1 other fieldsHigh correlation
주야구분명 is highly overall correlated with 설립년도 and 1 other fieldsHigh correlation
남녀공학구분명 is highly imbalanced (61.9%)Imbalance
주야구분명 is highly imbalanced (87.4%)Imbalance
학교종류구분명 is highly imbalanced (52.8%)Imbalance
설립구분명 is highly imbalanced (64.9%)Imbalance
설립년도 is highly skewed (γ1 = 30.87181237)Skewed
학교코드 has unique valuesUnique

Reproduction

Analysis started2023-12-12 02:20:29.313458
Analysis finished2023-12-12 02:20:30.768616
Duration1.46 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

학교코드
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T11:20:31.024623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters100000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowA000016591
2nd rowA000014938
3rd rowA000003725
4th rowA000005104
5th rowA000022131
ValueCountFrequency (%)
a000016591 1
 
< 0.1%
a000007495 1
 
< 0.1%
a000019308 1
 
< 0.1%
a000020780 1
 
< 0.1%
a000012532 1
 
< 0.1%
a000017183 1
 
< 0.1%
a000006739 1
 
< 0.1%
a000017431 1
 
< 0.1%
a000013133 1
 
< 0.1%
a000003797 1
 
< 0.1%
Other values (9990) 9990
99.9%
2023-12-12T11:20:31.483998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 47151
47.2%
A 10000
 
10.0%
1 8766
 
8.8%
2 5914
 
5.9%
3 4226
 
4.2%
6 4036
 
4.0%
8 4030
 
4.0%
5 4002
 
4.0%
9 3978
 
4.0%
7 3964
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 90000
90.0%
Uppercase Letter 10000
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 47151
52.4%
1 8766
 
9.7%
2 5914
 
6.6%
3 4226
 
4.7%
6 4036
 
4.5%
8 4030
 
4.5%
5 4002
 
4.4%
9 3978
 
4.4%
7 3964
 
4.4%
4 3933
 
4.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 90000
90.0%
Latin 10000
 
10.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 47151
52.4%
1 8766
 
9.7%
2 5914
 
6.6%
3 4226
 
4.7%
6 4036
 
4.5%
8 4030
 
4.5%
5 4002
 
4.4%
9 3978
 
4.4%
7 3964
 
4.4%
4 3933
 
4.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 47151
47.2%
A 10000
 
10.0%
1 8766
 
8.8%
2 5914
 
5.9%
3 4226
 
4.2%
6 4036
 
4.0%
8 4030
 
4.0%
5 4002
 
4.0%
9 3978
 
4.0%
7 3964
 
4.0%
Distinct8936
Distinct (%)89.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T11:20:31.834167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.6599
Min length4

Characters and Unicode

Total characters76599
Distinct characters564
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8201 ?
Unique (%)82.0%

Sample

1st row봉덕초등학교병설유치원
2nd row대진고등학교
3rd row태릉고등학교
4th row수암유치원
5th row해오름유치원
ValueCountFrequency (%)
자연유치원 14
 
0.1%
학력인정 13
 
0.1%
이화유치원 10
 
0.1%
중앙유치원 10
 
0.1%
예일유치원 9
 
0.1%
사랑유치원 9
 
0.1%
무지개유치원 9
 
0.1%
성모유치원 8
 
0.1%
한빛유치원 7
 
0.1%
소화유치원 7
 
0.1%
Other values (8928) 9922
99.0%
2023-12-12T11:20:32.340160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8669
 
11.3%
8648
 
11.3%
6599
 
8.6%
5381
 
7.0%
4216
 
5.5%
3895
 
5.1%
3844
 
5.0%
2264
 
3.0%
2202
 
2.9%
1925
 
2.5%
Other values (554) 28956
37.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 76461
99.8%
Open Punctuation 32
 
< 0.1%
Close Punctuation 32
 
< 0.1%
Decimal Number 25
 
< 0.1%
Space Separator 18
 
< 0.1%
Uppercase Letter 15
 
< 0.1%
Lowercase Letter 14
 
< 0.1%
Other Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8669
 
11.3%
8648
 
11.3%
6599
 
8.6%
5381
 
7.0%
4216
 
5.5%
3895
 
5.1%
3844
 
5.0%
2264
 
3.0%
2202
 
2.9%
1925
 
2.5%
Other values (525) 28818
37.7%
Uppercase Letter
ValueCountFrequency (%)
M 2
13.3%
Y 2
13.3%
C 2
13.3%
A 2
13.3%
B 2
13.3%
E 1
6.7%
G 1
6.7%
L 1
6.7%
I 1
6.7%
T 1
6.7%
Lowercase Letter
ValueCountFrequency (%)
s 4
28.6%
i 2
14.3%
e 2
14.3%
n 2
14.3%
g 1
 
7.1%
l 1
 
7.1%
h 1
 
7.1%
u 1
 
7.1%
Decimal Number
ValueCountFrequency (%)
2 14
56.0%
3 4
 
16.0%
1 2
 
8.0%
0 2
 
8.0%
4 2
 
8.0%
5 1
 
4.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%
Close Punctuation
ValueCountFrequency (%)
) 32
100.0%
Space Separator
ValueCountFrequency (%)
18
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 76461
99.8%
Common 109
 
0.1%
Latin 29
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8669
 
11.3%
8648
 
11.3%
6599
 
8.6%
5381
 
7.0%
4216
 
5.5%
3895
 
5.1%
3844
 
5.0%
2264
 
3.0%
2202
 
2.9%
1925
 
2.5%
Other values (525) 28818
37.7%
Latin
ValueCountFrequency (%)
s 4
13.8%
M 2
 
6.9%
Y 2
 
6.9%
C 2
 
6.9%
A 2
 
6.9%
B 2
 
6.9%
i 2
 
6.9%
e 2
 
6.9%
n 2
 
6.9%
E 1
 
3.4%
Other values (8) 8
27.6%
Common
ValueCountFrequency (%)
( 32
29.4%
) 32
29.4%
18
16.5%
2 14
12.8%
3 4
 
3.7%
1 2
 
1.8%
0 2
 
1.8%
4 2
 
1.8%
. 1
 
0.9%
- 1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 76461
99.8%
ASCII 138
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8669
 
11.3%
8648
 
11.3%
6599
 
8.6%
5381
 
7.0%
4216
 
5.5%
3895
 
5.1%
3844
 
5.0%
2264
 
3.0%
2202
 
2.9%
1925
 
2.5%
Other values (525) 28818
37.7%
ASCII
ValueCountFrequency (%)
( 32
23.2%
) 32
23.2%
18
13.0%
2 14
10.1%
3 4
 
2.9%
s 4
 
2.9%
M 2
 
1.4%
Y 2
 
1.4%
C 2
 
1.4%
A 2
 
1.4%
Other values (19) 26
18.8%

설립년도
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct137
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1989.3251
Minimum1882
Maximum9999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T11:20:32.500052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1882
5-th percentile1925
Q11965
median1985
Q32003
95-th percentile2017
Maximum9999
Range8117
Interquartile range (IQR)38

Descriptive statistics

Standard deviation255.37903
Coefficient of variation (CV)0.12837471
Kurtosis965.37694
Mean1989.3251
Median Absolute Deviation (MAD)19
Skewness30.871812
Sum19893251
Variance65218.451
MonotonicityNot monotonic
2023-12-12T11:20:32.677495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1981 647
 
6.5%
2002 356
 
3.6%
1984 269
 
2.7%
1982 223
 
2.2%
1985 214
 
2.1%
2005 183
 
1.8%
2004 180
 
1.8%
2007 179
 
1.8%
1983 175
 
1.8%
1994 170
 
1.7%
Other values (127) 7404
74.0%
ValueCountFrequency (%)
1882 1
 
< 0.1%
1885 2
 
< 0.1%
1887 2
 
< 0.1%
1891 1
 
< 0.1%
1892 1
 
< 0.1%
1894 2
 
< 0.1%
1895 6
0.1%
1896 6
0.1%
1897 4
< 0.1%
1898 1
 
< 0.1%
ValueCountFrequency (%)
9999 10
 
0.1%
2999 2
 
< 0.1%
2027 3
 
< 0.1%
2023 26
 
0.3%
2022 57
0.6%
2021 85
0.9%
2020 114
1.1%
2019 102
1.0%
2018 75
0.8%
2017 111
1.1%

남녀공학구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
남여공학
8402 
원천코드오류
 
736
 
486
 
366
원천코드없음
 
10

Length

Max length6
Median length4
Mean length3.8936
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row남여공학
2nd row남여공학
3rd row남여공학
4th row남여공학
5th row남여공학

Common Values

ValueCountFrequency (%)
남여공학 8402
84.0%
원천코드오류 736
 
7.4%
486
 
4.9%
366
 
3.7%
원천코드없음 10
 
0.1%

Length

2023-12-12T11:20:32.849056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:20:32.981656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
남여공학 8402
84.0%
원천코드오류 736
 
7.4%
486
 
4.9%
366
 
3.7%
원천코드없음 10
 
0.1%

주야구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
주간
9566 
원천코드오류
 
358
주야간
 
63
원천코드없음
 
10
야간
 
3

Length

Max length6
Median length2
Mean length2.1535
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row주간
2nd row주간
3rd row주간
4th row주간
5th row주간

Common Values

ValueCountFrequency (%)
주간 9566
95.7%
원천코드오류 358
 
3.6%
주야간 63
 
0.6%
원천코드없음 10
 
0.1%
야간 3
 
< 0.1%

Length

2023-12-12T11:20:33.149406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:20:33.278487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
주간 9566
95.7%
원천코드오류 358
 
3.6%
주야간 63
 
0.6%
원천코드없음 10
 
0.1%
야간 3
 
< 0.1%
Distinct19
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
경기도
2246 
서울특별시
1076 
경상북도
791 
경상남도
738 
전라남도
671 
Other values (14)
4478 

Length

Max length7
Median length6
Mean length4.136
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row충청북도
2nd row강원도
3rd row서울특별시
4th row서울특별시
5th row경상북도

Common Values

ValueCountFrequency (%)
경기도 2246
22.5%
서울특별시 1076
10.8%
경상북도 791
 
7.9%
경상남도 738
 
7.4%
전라남도 671
 
6.7%
충청남도 595
 
5.9%
전라북도 582
 
5.8%
부산광역시 509
 
5.1%
강원도 460
 
4.6%
인천광역시 456
 
4.6%
Other values (9) 1876
18.8%

Length

2023-12-12T11:20:33.419815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 2246
22.5%
서울특별시 1076
10.8%
경상북도 791
 
7.9%
경상남도 738
 
7.4%
전라남도 671
 
6.7%
충청남도 595
 
5.9%
전라북도 582
 
5.8%
부산광역시 509
 
5.1%
강원도 460
 
4.6%
인천광역시 456
 
4.6%
Other values (9) 1876
18.8%

학교종류구분명
Categorical

IMBALANCE 

Distinct20
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
유치원
3818 
초등학교
3135 
중학교
1642 
고등학교
1172 
특수학교
 
100
Other values (15)
 
133

Length

Max length14
Median length3
Mean length3.5109
Min length3

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row유치원
2nd row고등학교
3rd row고등학교
4th row유치원
5th row유치원

Common Values

ValueCountFrequency (%)
유치원 3818
38.2%
초등학교 3135
31.4%
중학교 1642
16.4%
고등학교 1172
 
11.7%
특수학교 100
 
1.0%
각종학교(고) 25
 
0.2%
각종학교(중) 22
 
0.2%
방송통신고등학교 19
 
0.2%
외국인학교 17
 
0.2%
평생학교(고)-2년6학기 12
 
0.1%
Other values (10) 38
 
0.4%

Length

2023-12-12T11:20:33.559271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
유치원 3818
38.2%
초등학교 3135
31.4%
중학교 1642
16.4%
고등학교 1172
 
11.7%
특수학교 100
 
1.0%
각종학교(고 25
 
0.2%
각종학교(중 22
 
0.2%
방송통신고등학교 19
 
0.2%
외국인학교 17
 
0.2%
평생학교(고)-2년6학기 12
 
0.1%
Other values (10) 38
 
0.4%

설립구분명
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
공립
7651 
사립
2312 
국립
 
33
기타
 
3
원천코드없음
 
1

Length

Max length6
Median length2
Mean length2.0004
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row공립
2nd row공립
3rd row공립
4th row사립
5th row사립

Common Values

ValueCountFrequency (%)
공립 7651
76.5%
사립 2312
 
23.1%
국립 33
 
0.3%
기타 3
 
< 0.1%
원천코드없음 1
 
< 0.1%

Length

2023-12-12T11:20:33.696865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:20:33.806126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공립 7651
76.5%
사립 2312
 
23.1%
국립 33
 
0.3%
기타 3
 
< 0.1%
원천코드없음 1
 
< 0.1%

Interactions

2023-12-12T11:20:30.362934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:20:33.884053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설립년도남녀공학구분명주야구분명우편번호시도명학교종류구분명설립구분명
설립년도1.0000.7180.7180.0000.0720.000
남녀공학구분명0.7181.0000.9180.3380.5620.331
주야구분명0.7180.9181.0000.3800.7540.226
우편번호시도명0.0000.3380.3801.0000.1410.252
학교종류구분명0.0720.5620.7540.1411.0000.548
설립구분명0.0000.3310.2260.2520.5481.000
2023-12-12T11:20:34.008428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주야구분명우편번호시도명남녀공학구분명학교종류구분명설립구분명
주야구분명1.0000.1990.6050.4310.086
우편번호시도명0.1991.0000.1750.0410.128
남녀공학구분명0.6050.1751.0000.2770.129
학교종류구분명0.4310.0410.2771.0000.268
설립구분명0.0860.1280.1290.2681.000
2023-12-12T11:20:34.125343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설립년도남녀공학구분명주야구분명우편번호시도명학교종류구분명설립구분명
설립년도1.0000.7070.7070.0000.0370.000
남녀공학구분명0.7071.0000.6050.1750.2770.129
주야구분명0.7070.6051.0000.1990.4310.086
우편번호시도명0.0000.1750.1991.0000.0410.128
학교종류구분명0.0370.2770.4310.0411.0000.268
설립구분명0.0000.1290.0860.1280.2681.000

Missing values

2023-12-12T11:20:30.539310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:20:30.699296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

학교코드학교명설립년도남녀공학구분명주야구분명우편번호시도명학교종류구분명설립구분명
13096A000016591봉덕초등학교병설유치원2004남여공학주간충청북도유치원공립
11443A000014938대진고등학교1980남여공학주간강원도고등학교공립
230A000003725태릉고등학교1984남여공학주간서울특별시고등학교공립
1609A000005104수암유치원2000남여공학주간서울특별시유치원사립
18636A000022131해오름유치원2017남여공학주간경상북도유치원사립
10979A000014474능실유치원2017남여공학주간경기도유치원공립
2827A000006322중현초등학교1996남여공학주간부산광역시초등학교공립
18748A000022243경남간호고등학교1972남여공학주간경상남도고등학교사립
6623A000010118어진중학교2014남여공학주간세종특별자치시중학교공립
18387A000021882상희학교1986남여공학주간경상북도특수학교공립
학교코드학교명설립년도남녀공학구분명주야구분명우편번호시도명학교종류구분명설립구분명
42A000003537춘천교육대학교부설초등학교1939남여공학주간강원도초등학교국립
10059A000013554흥덕고등학교2010남여공학주간경기도고등학교공립
2092A000005587정암유치원2016남여공학주간서울특별시유치원사립
2047A000005542서울용답초등학교병설유치원2014남여공학주간서울특별시유치원공립
2067A000005562아현중학교부설방송통신중학교2015남여공학주간서울특별시방송통신중학교공립
2302A000005797학력인정부산미용고등학교1999남여공학주간부산광역시평생학교(고)-3년6학기사립
18772A000022267진주기계공업고등학교1962남여공학주간경상남도고등학교공립
2253A000005748금정전자고등학교1973남여공학주간부산광역시고등학교사립
645A000004140서울등양초등학교1994남여공학주간서울특별시초등학교공립
16413A000019908대덕초등학교병설유치원1981남여공학주간전라남도유치원공립