Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows3
Duplicate rows (%)< 0.1%
Total size in memory673.8 KiB
Average record size in memory69.0 B

Variable types

Numeric5
Categorical1
Text1

Alerts

Dataset has 3 (< 0.1%) duplicate rowsDuplicates
남자 평균연령 is highly overall correlated with 여자 평균연령 and 1 other fieldsHigh correlation
여자 평균연령 is highly overall correlated with 남자 평균연령 and 1 other fieldsHigh correlation
평균연령 is highly overall correlated with 남자 평균연령 and 1 other fieldsHigh correlation
행정구역구분명 is highly imbalanced (75.5%)Imbalance

Reproduction

Analysis started2024-04-11 02:51:30.354429
Analysis finished2024-04-11 02:51:36.789617
Duration6.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.967
Minimum2010
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:51:36.855562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2011
Q12014
median2017
Q32020
95-th percentile2023
Maximum2024
Range14
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.944473
Coefficient of variation (CV)0.0019556458
Kurtosis-1.1543117
Mean2016.967
Median Absolute Deviation (MAD)3
Skewness-0.014727699
Sum20169670
Variance15.558867
MonotonicityNot monotonic
2024-04-11T11:51:36.989846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
2018 777
 
7.8%
2017 763
 
7.6%
2022 757
 
7.6%
2016 756
 
7.6%
2014 755
 
7.5%
2013 740
 
7.4%
2023 740
 
7.4%
2019 732
 
7.3%
2020 725
 
7.2%
2021 695
 
7.0%
Other values (5) 2560
25.6%
ValueCountFrequency (%)
2010 308
 
3.1%
2011 695
7.0%
2012 683
6.8%
2013 740
7.4%
2014 755
7.5%
2015 683
6.8%
2016 756
7.6%
2017 763
7.6%
2018 777
7.8%
2019 732
7.3%
ValueCountFrequency (%)
2024 191
 
1.9%
2023 740
7.4%
2022 757
7.6%
2021 695
7.0%
2020 725
7.2%
2019 732
7.3%
2018 777
7.8%
2017 763
7.6%
2016 756
7.6%
2015 683
6.8%


Real number (ℝ)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.5309
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:51:37.146421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.466792
Coefficient of variation (CV)0.53082914
Kurtosis-1.2405914
Mean6.5309
Median Absolute Deviation (MAD)3
Skewness-0.019012059
Sum65309
Variance12.018647
MonotonicityNot monotonic
2024-04-11T11:51:37.285151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 961
9.6%
8 865
8.6%
9 865
8.6%
11 854
8.5%
7 849
8.5%
10 848
8.5%
12 834
8.3%
1 824
8.2%
2 804
8.0%
5 780
7.8%
Other values (2) 1516
15.2%
ValueCountFrequency (%)
1 824
8.2%
2 804
8.0%
3 961
9.6%
4 766
7.7%
5 780
7.8%
6 750
7.5%
7 849
8.5%
8 865
8.6%
9 865
8.6%
10 848
8.5%
ValueCountFrequency (%)
12 834
8.3%
11 854
8.5%
10 848
8.5%
9 865
8.6%
8 865
8.6%
7 849
8.5%
6 750
7.5%
5 780
7.8%
4 766
7.7%
3 961
9.6%

행정구역구분명
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
읍면동
9197 
시군
 
486
 
302
 
15

Length

Max length3
Median length3
Mean length2.888
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row읍면동
2nd row읍면동
3rd row읍면동
4th row읍면동
5th row

Common Values

ValueCountFrequency (%)
읍면동 9197
92.0%
시군 486
 
4.9%
302
 
3.0%
15
 
0.1%

Length

2024-04-11T11:51:37.444618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-11T11:51:37.579625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
읍면동 9197
92.0%
시군 486
 
4.9%
302
 
3.0%
15
 
0.1%
Distinct962
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-11T11:51:37.903680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length16
Mean length12.8903
Min length3

Characters and Unicode

Total characters128903
Distinct characters214
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique156 ?
Unique (%)1.6%

Sample

1st row경기도 안성시 안성3동
2nd row경기도 양평군 양서면
3rd row경기도 고양시 덕양구 주교동
4th row경기도 성남시 중원구 상대원1동
5th row경기도 성남시 중원구
ValueCountFrequency (%)
경기도 10000
30.0%
성남시 864
 
2.6%
수원시 774
 
2.3%
고양시 715
 
2.1%
안양시 566
 
1.7%
용인시 561
 
1.7%
부천시 488
 
1.5%
안산시 485
 
1.5%
화성시 462
 
1.4%
분당구 382
 
1.1%
Other values (668) 18017
54.1%
2024-04-11T11:51:38.403683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24287
18.8%
10226
 
7.9%
10157
 
7.9%
10020
 
7.8%
9778
 
7.6%
7827
 
6.1%
4425
 
3.4%
2819
 
2.2%
2493
 
1.9%
1923
 
1.5%
Other values (204) 44948
34.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 101225
78.5%
Space Separator 24287
 
18.8%
Decimal Number 3391
 
2.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10226
 
10.1%
10157
 
10.0%
10020
 
9.9%
9778
 
9.7%
7827
 
7.7%
4425
 
4.4%
2819
 
2.8%
2493
 
2.5%
1923
 
1.9%
1732
 
1.7%
Other values (194) 39825
39.3%
Decimal Number
ValueCountFrequency (%)
1 1293
38.1%
2 1209
35.7%
3 558
16.5%
4 167
 
4.9%
6 46
 
1.4%
5 41
 
1.2%
7 36
 
1.1%
8 26
 
0.8%
9 15
 
0.4%
Space Separator
ValueCountFrequency (%)
24287
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 101225
78.5%
Common 27678
 
21.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10226
 
10.1%
10157
 
10.0%
10020
 
9.9%
9778
 
9.7%
7827
 
7.7%
4425
 
4.4%
2819
 
2.8%
2493
 
2.5%
1923
 
1.9%
1732
 
1.7%
Other values (194) 39825
39.3%
Common
ValueCountFrequency (%)
24287
87.7%
1 1293
 
4.7%
2 1209
 
4.4%
3 558
 
2.0%
4 167
 
0.6%
6 46
 
0.2%
5 41
 
0.1%
7 36
 
0.1%
8 26
 
0.1%
9 15
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 101225
78.5%
ASCII 27678
 
21.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24287
87.7%
1 1293
 
4.7%
2 1209
 
4.4%
3 558
 
2.0%
4 167
 
0.6%
6 46
 
0.2%
5 41
 
0.1%
7 36
 
0.1%
8 26
 
0.1%
9 15
 
0.1%
Hangul
ValueCountFrequency (%)
10226
 
10.1%
10157
 
10.0%
10020
 
9.9%
9778
 
9.7%
7827
 
7.7%
4425
 
4.4%
2819
 
2.8%
2493
 
2.5%
1923
 
1.9%
1732
 
1.7%
Other values (194) 39825
39.3%

남자 평균연령
Real number (ℝ)

HIGH CORRELATION 

Distinct283
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.55281
Minimum29.2
Maximum61.3
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:51:38.593884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum29.2
5-th percentile34
Q137.1
median39.8
Q343.225
95-th percentile49.7
Maximum61.3
Range32.1
Interquartile range (IQR)6.125

Descriptive statistics

Standard deviation4.7816107
Coefficient of variation (CV)0.11791071
Kurtosis0.43535603
Mean40.55281
Median Absolute Deviation (MAD)3
Skewness0.74527776
Sum405528.1
Variance22.8638
MonotonicityNot monotonic
2024-04-11T11:51:38.771700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.4 111
 
1.1%
38.0 105
 
1.1%
37.4 103
 
1.0%
40.8 103
 
1.0%
39.7 102
 
1.0%
36.8 102
 
1.0%
37.8 101
 
1.0%
36.9 98
 
1.0%
40.0 98
 
1.0%
36.5 98
 
1.0%
Other values (273) 8979
89.8%
ValueCountFrequency (%)
29.2 1
 
< 0.1%
29.3 2
< 0.1%
29.4 2
< 0.1%
29.6 1
 
< 0.1%
29.7 1
 
< 0.1%
29.8 1
 
< 0.1%
30.1 4
< 0.1%
30.2 3
< 0.1%
30.4 4
< 0.1%
30.5 3
< 0.1%
ValueCountFrequency (%)
61.3 1
< 0.1%
61.0 1
< 0.1%
60.5 1
< 0.1%
60.3 1
< 0.1%
59.3 1
< 0.1%
59.0 1
< 0.1%
58.3 1
< 0.1%
58.0 1
< 0.1%
57.9 1
< 0.1%
57.8 1
< 0.1%

여자 평균연령
Real number (ℝ)

HIGH CORRELATION 

Distinct315
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42.64002
Minimum29.9
Maximum63.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:51:38.945820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum29.9
5-th percentile35.3
Q138.7
median41.7
Q345.7
95-th percentile53.4
Maximum63.5
Range33.6
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.4652749
Coefficient of variation (CV)0.12817243
Kurtosis0.38115579
Mean42.64002
Median Absolute Deviation (MAD)3.4
Skewness0.7618138
Sum426400.2
Variance29.869229
MonotonicityNot monotonic
2024-04-11T11:51:39.497914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41.1 101
 
1.0%
40.4 101
 
1.0%
39.6 96
 
1.0%
40.8 93
 
0.9%
40.9 92
 
0.9%
38.2 92
 
0.9%
39.5 92
 
0.9%
38.7 91
 
0.9%
39.9 89
 
0.9%
39.3 89
 
0.9%
Other values (305) 9064
90.6%
ValueCountFrequency (%)
29.9 1
 
< 0.1%
30.2 1
 
< 0.1%
30.4 1
 
< 0.1%
30.5 2
< 0.1%
30.6 3
< 0.1%
30.7 4
< 0.1%
30.8 1
 
< 0.1%
30.9 2
< 0.1%
31.0 1
 
< 0.1%
31.1 3
< 0.1%
ValueCountFrequency (%)
63.5 1
 
< 0.1%
62.6 1
 
< 0.1%
62.5 2
< 0.1%
62.2 2
< 0.1%
61.9 1
 
< 0.1%
61.8 1
 
< 0.1%
61.5 3
< 0.1%
61.3 1
 
< 0.1%
61.2 1
 
< 0.1%
61.1 2
< 0.1%

평균연령
Real number (ℝ)

HIGH CORRELATION 

Distinct296
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.57275
Minimum29.8
Maximum61.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:51:39.698200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum29.8
5-th percentile34.7
Q137.9
median40.8
Q344.4
95-th percentile51.5
Maximum61.1
Range31.3
Interquartile range (IQR)6.5

Descriptive statistics

Standard deviation5.0740086
Coefficient of variation (CV)0.12205131
Kurtosis0.35391857
Mean41.57275
Median Absolute Deviation (MAD)3.2
Skewness0.73656361
Sum415727.5
Variance25.745563
MonotonicityNot monotonic
2024-04-11T11:51:39.885139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
39.4 112
 
1.1%
40.0 106
 
1.1%
38.7 102
 
1.0%
39.7 100
 
1.0%
37.4 99
 
1.0%
38.8 99
 
1.0%
39.1 96
 
1.0%
38.6 96
 
1.0%
40.3 96
 
1.0%
41.1 95
 
0.9%
Other values (286) 8999
90.0%
ValueCountFrequency (%)
29.8 2
< 0.1%
29.9 3
< 0.1%
30.0 2
< 0.1%
30.2 1
 
< 0.1%
30.3 2
< 0.1%
30.4 1
 
< 0.1%
30.5 1
 
< 0.1%
30.6 1
 
< 0.1%
30.7 3
< 0.1%
30.8 2
< 0.1%
ValueCountFrequency (%)
61.1 1
 
< 0.1%
59.8 1
 
< 0.1%
59.3 1
 
< 0.1%
59.2 2
< 0.1%
59.1 2
< 0.1%
59.0 2
< 0.1%
58.9 2
< 0.1%
58.7 1
 
< 0.1%
58.6 3
< 0.1%
58.5 2
< 0.1%

Interactions

2024-04-11T11:51:35.926145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:33.341096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.035727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.644748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.277576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:36.045980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:33.535549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.160720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.783235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.409829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:36.157424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:33.663748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.277545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.907124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.529481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:36.283634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:33.790754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.398721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.026744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.668407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:36.422856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:33.920132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:34.523854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.160259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:51:35.802134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-11T11:51:39.995769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도행정구역구분명남자 평균연령여자 평균연령평균연령
연도1.0000.1240.0000.4980.4610.479
0.1241.0000.0210.0000.0000.015
행정구역구분명0.0000.0211.0000.1160.1300.127
남자 평균연령0.4980.0000.1161.0000.9690.978
여자 평균연령0.4610.0000.1300.9691.0000.995
평균연령0.4790.0150.1270.9780.9951.000
2024-04-11T11:51:40.124734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도남자 평균연령여자 평균연령평균연령행정구역구분명
연도1.000-0.0940.4680.4250.4480.000
-0.0941.000-0.012-0.012-0.0120.012
남자 평균연령0.468-0.0121.0000.9890.9970.070
여자 평균연령0.425-0.0120.9891.0000.9970.078
평균연령0.448-0.0120.9970.9971.0000.076
행정구역구분명0.0000.0120.0700.0780.0761.000

Missing values

2024-04-11T11:51:36.566445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-11T11:51:36.716070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도행정구역구분명행정구역명남자 평균연령여자 평균연령평균연령
61870201511읍면동경기도 안성시 안성3동37.339.638.5
4424620184읍면동경기도 양평군 양서면47.048.948.0
96987201012읍면동경기도 고양시 덕양구 주교동36.939.138.0
8169420132읍면동경기도 성남시 중원구 상대원1동37.238.838.0
46515201712경기도 성남시 중원구41.042.841.9
40663201810읍면동경기도 의정부시 송산2동36.137.836.9
4201620187읍면동경기도 가평군 청평면46.749.748.2
2454220211읍면동경기도 하남시 감북동47.850.348.9
5673320167읍면동경기도 군포시 금정동40.542.241.3
3441020198읍면동경기도 안산시 상록구 해양동37.938.938.4
연도행정구역구분명행정구역명남자 평균연령여자 평균연령평균연령
6533520155읍면동경기도 부천시 소사구 송내1동38.840.839.8
6050920161읍면동경기도 성남시 분당구 정자1동38.638.738.7
6643120153읍면동경기도 고양시 일산동구 식사동35.035.935.5
1682920221읍면동경기도 광명시 광명5동43.846.845.3
1979020219읍면동경기도 화성시 향남읍36.937.637.2
7763720139읍면동경기도 안양시 만안구 석수3동36.238.337.2
32468201911읍면동경기도 남양주시 화도읍40.042.041.0
5784420166읍면동경기도 화성시 양감면47.351.449.0
10665202211읍면동경기도 고양시 덕양구 흥도동39.640.440.0
8608720127읍면동경기도 용인시 기흥구 동백동32.133.732.9

Duplicate rows

Most frequently occurring

연도행정구역구분명행정구역명남자 평균연령여자 평균연령평균연령# duplicates
020233읍면동경기도 고양시 일산서구 주엽1동43.246.144.72
120233읍면동경기도 파주시 금촌1동45.348.346.72
220233읍면동경기도 화성시 매송면49.953.451.62