Overview

Dataset statistics

Number of variables6
Number of observations2987
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory151.8 KiB
Average record size in memory52.0 B

Variable types

Numeric4
Categorical1
Text1

Dataset

Description김해시에서 통계기반 도시현황 파악을 위해 개발한 통계지수 중 하나로서, 통계연도, 시도명, 시군구명, 고등학교 졸업자 진학률(%), 졸업자 수(명), 진학자 수(명)로 구성되어 있습니다. 김해시 중심의 통계지수로서, 데이터 수집, 가공 등의 어려움으로 김해시 외 지역의 정보는 누락될 수 있습니다.
Author경상남도 김해시
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15110181

Alerts

졸업자 수(명) is highly overall correlated with 진학자 수(명)High correlation
진학자 수(명) is highly overall correlated with 졸업자 수(명)High correlation

Reproduction

Analysis started2023-12-10 22:45:46.177176
Analysis finished2023-12-10 22:45:47.887733
Duration1.71 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

통계연도
Real number (ℝ)

Distinct13
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2014.9856
Minimum2009
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.4 KiB
2023-12-11T07:45:47.929201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12012
median2015
Q32018
95-th percentile2021
Maximum2021
Range12
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.7455211
Coefficient of variation (CV)0.0018588327
Kurtosis-1.2160447
Mean2014.9856
Median Absolute Deviation (MAD)3
Skewness0.0043759646
Sum6018762
Variance14.028929
MonotonicityNot monotonic
2023-12-11T07:45:48.017833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2009 232
 
7.8%
2010 232
 
7.8%
2014 230
 
7.7%
2011 230
 
7.7%
2013 230
 
7.7%
2012 230
 
7.7%
2019 229
 
7.7%
2016 229
 
7.7%
2021 229
 
7.7%
2015 229
 
7.7%
Other values (3) 687
23.0%
ValueCountFrequency (%)
2009 232
7.8%
2010 232
7.8%
2011 230
7.7%
2012 230
7.7%
2013 230
7.7%
2014 230
7.7%
2015 229
7.7%
2016 229
7.7%
2017 229
7.7%
2018 229
7.7%
ValueCountFrequency (%)
2021 229
7.7%
2020 229
7.7%
2019 229
7.7%
2018 229
7.7%
2017 229
7.7%
2016 229
7.7%
2015 229
7.7%
2014 230
7.7%
2013 230
7.7%
2012 230
7.7%

시도명
Categorical

Distinct17
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size23.5 KiB
경기도
403 
서울특별시
325 
경상북도
299 
전라남도
286 
경상남도
238 
Other values (12)
1436 

Length

Max length7
Median length5
Mean length4.1439571
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기도
2nd row인천광역시
3rd row서울특별시
4th row제주특별자치도
5th row경기도

Common Values

ValueCountFrequency (%)
경기도 403
13.5%
서울특별시 325
10.9%
경상북도 299
10.0%
전라남도 286
9.6%
경상남도 238
8.0%
강원도 234
7.8%
부산광역시 208
7.0%
충청남도 199
6.7%
전라북도 182
 
6.1%
충청북도 149
 
5.0%
Other values (7) 464
15.5%

Length

2023-12-11T07:45:48.126015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 403
13.5%
서울특별시 325
10.9%
경상북도 299
10.0%
전라남도 286
9.6%
경상남도 238
8.0%
강원도 234
7.8%
부산광역시 208
7.0%
충청남도 199
6.7%
전라북도 182
 
6.1%
충청북도 149
 
5.0%
Other values (7) 464
15.5%
Distinct213
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Memory size23.5 KiB
2023-12-11T07:45:48.410247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length2.9280214
Min length2

Characters and Unicode

Total characters8746
Distinct characters133
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부천시
2nd row계양구
3rd row서초구
4th row제주시
5th row의정부시
ValueCountFrequency (%)
동구 78
 
2.6%
중구 78
 
2.6%
서구 65
 
2.2%
남구 62
 
2.1%
북구 52
 
1.7%
강서구 26
 
0.9%
고성군 26
 
0.9%
영등포구 13
 
0.4%
음성군 13
 
0.4%
양산시 13
 
0.4%
Other values (203) 2561
85.7%
2023-12-11T07:45:49.010611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1126
 
12.9%
1016
 
11.6%
962
 
11.0%
286
 
3.3%
260
 
3.0%
234
 
2.7%
234
 
2.7%
221
 
2.5%
210
 
2.4%
169
 
1.9%
Other values (123) 4028
46.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8746
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1126
 
12.9%
1016
 
11.6%
962
 
11.0%
286
 
3.3%
260
 
3.0%
234
 
2.7%
234
 
2.7%
221
 
2.5%
210
 
2.4%
169
 
1.9%
Other values (123) 4028
46.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8746
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1126
 
12.9%
1016
 
11.6%
962
 
11.0%
286
 
3.3%
260
 
3.0%
234
 
2.7%
234
 
2.7%
221
 
2.5%
210
 
2.4%
169
 
1.9%
Other values (123) 4028
46.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8746
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1126
 
12.9%
1016
 
11.6%
962
 
11.0%
286
 
3.3%
260
 
3.0%
234
 
2.7%
234
 
2.7%
221
 
2.5%
210
 
2.4%
169
 
1.9%
Other values (123) 4028
46.1%
Distinct2001
Distinct (%)67.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74.910402
Minimum34.81
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.4 KiB
2023-12-11T07:45:49.126045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum34.81
5-th percentile56.622
Q168.83
median75.98
Q381.93
95-th percentile89.224
Maximum100
Range65.19
Interquartile range (IQR)13.1

Descriptive statistics

Standard deviation9.8234938
Coefficient of variation (CV)0.13113658
Kurtosis0.12907255
Mean74.910402
Median Absolute Deviation (MAD)6.43
Skewness-0.51728079
Sum223757.37
Variance96.501031
MonotonicityNot monotonic
2023-12-11T07:45:49.237337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
75.0 8
 
0.3%
78.76 6
 
0.2%
81.77 6
 
0.2%
75.48 6
 
0.2%
80.71 6
 
0.2%
77.45 5
 
0.2%
79.56 5
 
0.2%
72.61 5
 
0.2%
75.27 5
 
0.2%
78.57 5
 
0.2%
Other values (1991) 2930
98.1%
ValueCountFrequency (%)
34.81 1
< 0.1%
34.87 1
< 0.1%
36.19 1
< 0.1%
41.09 1
< 0.1%
41.12 1
< 0.1%
41.94 1
< 0.1%
42.96 1
< 0.1%
43.72 1
< 0.1%
44.57 1
< 0.1%
45.04 1
< 0.1%
ValueCountFrequency (%)
100.0 1
< 0.1%
99.25 1
< 0.1%
98.97 1
< 0.1%
98.15 1
< 0.1%
97.54 1
< 0.1%
97.24 1
< 0.1%
96.21 1
< 0.1%
96.2 1
< 0.1%
96.07 1
< 0.1%
95.66 1
< 0.1%

졸업자 수(명)
Real number (ℝ)

HIGH CORRELATION 

Distinct2188
Distinct (%)73.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2549.7744
Minimum21
Maximum17486
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.4 KiB
2023-12-11T07:45:49.349588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile196
Q1546
median1558
Q33627.5
95-th percentile8284.5
Maximum17486
Range17465
Interquartile range (IQR)3081.5

Descriptive statistics

Standard deviation2741.984
Coefficient of variation (CV)1.075383
Kurtosis4.5302329
Mean2549.7744
Median Absolute Deviation (MAD)1190
Skewness1.9285531
Sum7616176
Variance7518476.4
MonotonicityNot monotonic
2023-12-11T07:45:49.458261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
328 8
 
0.3%
385 6
 
0.2%
534 6
 
0.2%
196 6
 
0.2%
457 6
 
0.2%
206 6
 
0.2%
188 6
 
0.2%
522 6
 
0.2%
221 6
 
0.2%
215 5
 
0.2%
Other values (2178) 2926
98.0%
ValueCountFrequency (%)
21 1
< 0.1%
31 1
< 0.1%
37 1
< 0.1%
41 1
< 0.1%
42 1
< 0.1%
48 1
< 0.1%
51 1
< 0.1%
52 1
< 0.1%
53 1
< 0.1%
54 2
0.1%
ValueCountFrequency (%)
17486 1
< 0.1%
16977 1
< 0.1%
16807 1
< 0.1%
16797 1
< 0.1%
16622 1
< 0.1%
16471 1
< 0.1%
16247 1
< 0.1%
15875 1
< 0.1%
15726 1
< 0.1%
15525 1
< 0.1%

진학자 수(명)
Real number (ℝ)

HIGH CORRELATION 

Distinct2023
Distinct (%)67.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1852.4279
Minimum14
Maximum13890
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size26.4 KiB
2023-12-11T07:45:49.566684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile150
Q1422
median1183
Q32600
95-th percentile5752.6
Maximum13890
Range13876
Interquartile range (IQR)2178

Descriptive statistics

Standard deviation1974.575
Coefficient of variation (CV)1.0659389
Kurtosis5.0846362
Mean1852.4279
Median Absolute Deviation (MAD)883
Skewness1.9896093
Sum5533202
Variance3898946.3
MonotonicityNot monotonic
2023-12-11T07:45:49.673350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
143 8
 
0.3%
139 7
 
0.2%
151 7
 
0.2%
315 7
 
0.2%
357 7
 
0.2%
907 6
 
0.2%
150 6
 
0.2%
237 6
 
0.2%
231 6
 
0.2%
451 6
 
0.2%
Other values (2013) 2921
97.8%
ValueCountFrequency (%)
14 1
< 0.1%
25 1
< 0.1%
32 1
< 0.1%
33 1
< 0.1%
35 1
< 0.1%
42 2
0.1%
44 1
< 0.1%
45 1
< 0.1%
47 1
< 0.1%
48 2
0.1%
ValueCountFrequency (%)
13890 1
< 0.1%
13115 1
< 0.1%
13112 1
< 0.1%
12996 1
< 0.1%
12665 1
< 0.1%
12407 1
< 0.1%
12319 1
< 0.1%
11969 1
< 0.1%
11852 1
< 0.1%
11620 1
< 0.1%

Interactions

2023-12-11T07:45:47.443593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.496171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.821100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.124419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.516466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.566792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.892613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.204646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.595351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.648398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.967309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.284217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.674198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:46.727301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.044614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:45:47.363750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T07:45:49.749334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통계연도시도명고등학교 졸업자 진학률(퍼센트)졸업자 수(명)진학자 수(명)
통계연도1.0000.0000.3410.0000.000
시도명0.0001.0000.5980.5520.523
고등학교 졸업자 진학률(퍼센트)0.3410.5981.0000.3270.218
졸업자 수(명)0.0000.5520.3271.0000.962
진학자 수(명)0.0000.5230.2180.9621.000
2023-12-11T07:45:49.853446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통계연도고등학교 졸업자 진학률(퍼센트)졸업자 수(명)진학자 수(명)시도명
통계연도1.000-0.238-0.061-0.0860.000
고등학교 졸업자 진학률(퍼센트)-0.2381.000-0.232-0.1260.281
졸업자 수(명)-0.061-0.2321.0000.9930.250
진학자 수(명)-0.086-0.1260.9931.0000.233
시도명0.0000.2810.2500.2331.000

Missing values

2023-12-11T07:45:47.772207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:45:47.853477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

통계연도시도명시군구명고등학교 졸업자 진학률(퍼센트)졸업자 수(명)진학자 수(명)
02014경기도부천시66.36110787351
12019인천광역시계양구70.4831502220
22016서울특별시서초구53.6146412488
32021제주특별자치도제주시78.4446763668
42009경기도의정부시81.6151894235
52011부산광역시강서구82.05830681
62015울산광역시남구79.0358764644
72015부산광역시수영구73.6715461139
82017경상북도문경시76.18760579
92016전라북도익산시77.0437812913
통계연도시도명시군구명고등학교 졸업자 진학률(퍼센트)졸업자 수(명)진학자 수(명)
29772020인천광역시동구51.47746384
29782012서울특별시동작구54.8630221658
29792015강원도평창군87.01354308
29802009경상남도고성군90.28535483
29812014경기도양평군67.851123762
29822019서울특별시서대문구59.8521421282
29832016전라북도고창군74.3852633
29842019서울특별시마포구53.4323921278
29852021경기도의왕시72.891055769
29862014전라북도완주군70.071146803