Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells3456
Missing cells (%)4.9%
Duplicate rows131
Duplicate rows (%)1.3%
Total size in memory673.8 KiB
Average record size in memory69.0 B

Variable types

Numeric5
Categorical1
Text1

Alerts

Dataset has 131 (1.3%) duplicate rowsDuplicates
총 말소 인구수 is highly overall correlated with 남자 말소 인구수 and 2 other fieldsHigh correlation
남자 말소 인구수 is highly overall correlated with 총 말소 인구수 and 2 other fieldsHigh correlation
여자 말소 인구수 is highly overall correlated with 총 말소 인구수 and 2 other fieldsHigh correlation
행정구역구분명 is highly overall correlated with 총 말소 인구수 and 2 other fieldsHigh correlation
행정구역구분명 is highly imbalanced (75.0%)Imbalance
남자 말소 인구수 has 1728 (17.3%) missing valuesMissing
여자 말소 인구수 has 1728 (17.3%) missing valuesMissing
총 말소 인구수 is highly skewed (γ1 = 24.80766038)Skewed
남자 말소 인구수 is highly skewed (γ1 = 24.85112386)Skewed
여자 말소 인구수 is highly skewed (γ1 = 24.7628446)Skewed
총 말소 인구수 has 173 (1.7%) zerosZeros
남자 말소 인구수 has 424 (4.2%) zerosZeros
여자 말소 인구수 has 504 (5.0%) zerosZeros

Reproduction

Analysis started2024-04-11 02:12:20.589029
Analysis finished2024-04-11 02:12:25.035371
Duration4.45 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2017.7166
Minimum2011
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:12:25.088213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2011
5-th percentile2012
Q12015
median2018
Q32021
95-th percentile2023
Maximum2024
Range13
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.7641867
Coefficient of variation (CV)0.0018655676
Kurtosis-1.2222806
Mean2017.7166
Median Absolute Deviation (MAD)3
Skewness-0.13539365
Sum20177166
Variance14.169101
MonotonicityNot monotonic
2024-04-11T11:12:25.205485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2022 1335
13.4%
2015 781
 
7.8%
2023 771
 
7.7%
2021 770
 
7.7%
2018 748
 
7.5%
2019 729
 
7.3%
2017 726
 
7.3%
2020 723
 
7.2%
2016 717
 
7.2%
2014 708
 
7.1%
Other values (4) 1992
19.9%
ValueCountFrequency (%)
2011 405
4.0%
2012 686
6.9%
2013 694
6.9%
2014 708
7.1%
2015 781
7.8%
2016 717
7.2%
2017 726
7.3%
2018 748
7.5%
2019 729
7.3%
2020 723
7.2%
ValueCountFrequency (%)
2024 207
 
2.1%
2023 771
7.7%
2022 1335
13.4%
2021 770
7.7%
2020 723
7.2%
2019 729
7.3%
2018 748
7.5%
2017 726
7.3%
2016 717
7.2%
2015 781
7.8%


Real number (ℝ)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.7467
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:12:25.309205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.5555595
Coefficient of variation (CV)0.5270072
Kurtosis-1.2925189
Mean6.7467
Median Absolute Deviation (MAD)3
Skewness-0.10304788
Sum67467
Variance12.642003
MonotonicityNot monotonic
2024-04-11T11:12:25.398886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
11 1411
14.1%
1 875
8.8%
6 813
8.1%
12 811
8.1%
4 798
8.0%
9 791
7.9%
2 766
7.7%
7 759
7.6%
3 753
7.5%
8 747
7.5%
Other values (2) 1476
14.8%
ValueCountFrequency (%)
1 875
8.8%
2 766
7.7%
3 753
7.5%
4 798
8.0%
5 738
7.4%
6 813
8.1%
7 759
7.6%
8 747
7.5%
9 791
7.9%
10 738
7.4%
ValueCountFrequency (%)
12 811
8.1%
11 1411
14.1%
10 738
7.4%
9 791
7.9%
8 747
7.5%
7 759
7.6%
6 813
8.1%
5 738
7.4%
4 798
8.0%
3 753
7.5%

행정구역구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
읍면동
9175 
시군
 
499
 
311
 
15

Length

Max length3
Median length3
Mean length2.8849
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row읍면동
2nd row읍면동
3rd row읍면동
4th row읍면동
5th row읍면동

Common Values

ValueCountFrequency (%)
읍면동 9175
91.8%
시군 499
 
5.0%
311
 
3.1%
15
 
0.1%

Length

2024-04-11T11:12:25.507911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-11T11:12:25.599046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
읍면동 9175
91.8%
시군 499
 
5.0%
311
 
3.1%
15
 
0.1%
Distinct1012
Distinct (%)10.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-11T11:12:25.867826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length16
Mean length12.9082
Min length3

Characters and Unicode

Total characters129082
Distinct characters213
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique152 ?
Unique (%)1.5%

Sample

1st row경기도 시흥시 매화동
2nd row경기도 광명시 광명4동
3rd row경기도 고양시 일산서구 일산3동
4th row경기도 성남시 분당구 야탑2동
5th row경기도 성남시 분당구 정자동
ValueCountFrequency (%)
경기도 10000
30.0%
성남시 845
 
2.5%
수원시 772
 
2.3%
고양시 752
 
2.3%
용인시 609
 
1.8%
안양시 571
 
1.7%
안산시 465
 
1.4%
부천시 464
 
1.4%
평택시 433
 
1.3%
화성시 421
 
1.3%
Other values (675) 17971
54.0%
2024-04-11T11:12:26.279643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
24428
18.9%
10267
 
8.0%
10139
 
7.9%
10019
 
7.8%
9709
 
7.5%
7760
 
6.0%
4413
 
3.4%
2866
 
2.2%
2491
 
1.9%
1893
 
1.5%
Other values (203) 45097
34.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 101273
78.5%
Space Separator 24428
 
18.9%
Decimal Number 3381
 
2.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10267
 
10.1%
10139
 
10.0%
10019
 
9.9%
9709
 
9.6%
7760
 
7.7%
4413
 
4.4%
2866
 
2.8%
2491
 
2.5%
1893
 
1.9%
1735
 
1.7%
Other values (193) 39981
39.5%
Decimal Number
ValueCountFrequency (%)
2 1262
37.3%
1 1209
35.8%
3 584
17.3%
4 151
 
4.5%
5 48
 
1.4%
7 47
 
1.4%
6 39
 
1.2%
8 24
 
0.7%
9 17
 
0.5%
Space Separator
ValueCountFrequency (%)
24428
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 101273
78.5%
Common 27809
 
21.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10267
 
10.1%
10139
 
10.0%
10019
 
9.9%
9709
 
9.6%
7760
 
7.7%
4413
 
4.4%
2866
 
2.8%
2491
 
2.5%
1893
 
1.9%
1735
 
1.7%
Other values (193) 39981
39.5%
Common
ValueCountFrequency (%)
24428
87.8%
2 1262
 
4.5%
1 1209
 
4.3%
3 584
 
2.1%
4 151
 
0.5%
5 48
 
0.2%
7 47
 
0.2%
6 39
 
0.1%
8 24
 
0.1%
9 17
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 101273
78.5%
ASCII 27809
 
21.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
24428
87.8%
2 1262
 
4.5%
1 1209
 
4.3%
3 584
 
2.1%
4 151
 
0.5%
5 48
 
0.2%
7 47
 
0.2%
6 39
 
0.1%
8 24
 
0.1%
9 17
 
0.1%
Hangul
ValueCountFrequency (%)
10267
 
10.1%
10139
 
10.0%
10019
 
9.9%
9709
 
9.6%
7760
 
7.7%
4413
 
4.4%
2866
 
2.8%
2491
 
2.5%
1893
 
1.9%
1735
 
1.7%
Other values (193) 39981
39.5%

총 말소 인구수
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct324
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.76
Minimum0
Maximum6579
Zeros173
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:12:26.414114image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median9
Q314
95-th percentile104
Maximum6579
Range6579
Interquartile range (IQR)9

Descriptive statistics

Standard deviation212.15549
Coefficient of variation (CV)7.3767555
Kurtosis658.50342
Mean28.76
Median Absolute Deviation (MAD)4
Skewness24.80766
Sum287600
Variance45009.951
MonotonicityNot monotonic
2024-04-11T11:12:26.529958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 749
 
7.5%
5 722
 
7.2%
7 716
 
7.2%
8 710
 
7.1%
4 663
 
6.6%
9 606
 
6.1%
10 584
 
5.8%
3 545
 
5.5%
11 518
 
5.2%
12 414
 
4.1%
Other values (314) 3773
37.7%
ValueCountFrequency (%)
0 173
 
1.7%
1 269
 
2.7%
2 388
3.9%
3 545
5.5%
4 663
6.6%
5 722
7.2%
6 749
7.5%
7 716
7.2%
8 710
7.1%
9 606
6.1%
ValueCountFrequency (%)
6579 1
< 0.1%
6482 1
< 0.1%
6298 1
< 0.1%
6297 1
< 0.1%
6139 1
< 0.1%
5381 1
< 0.1%
5375 1
< 0.1%
5174 1
< 0.1%
5128 1
< 0.1%
4797 1
< 0.1%

남자 말소 인구수
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED  ZEROS 

Distinct211
Distinct (%)2.6%
Missing1728
Missing (%)17.3%
Infinite0
Infinite (%)0.0%
Mean15.895551
Minimum0
Maximum3573
Zeros424
Zeros (%)4.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:12:26.677024image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median5
Q38
95-th percentile58
Maximum3573
Range3573
Interquartile range (IQR)5

Descriptive statistics

Standard deviation118.32707
Coefficient of variation (CV)7.4440367
Kurtosis654.75042
Mean15.895551
Median Absolute Deviation (MAD)3
Skewness24.851124
Sum131488
Variance14001.295
MonotonicityNot monotonic
2024-04-11T11:12:26.822106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 1044
10.4%
4 935
9.3%
2 859
8.6%
5 758
7.6%
1 685
 
6.9%
6 677
 
6.8%
7 569
 
5.7%
8 440
 
4.4%
0 424
 
4.2%
9 288
 
2.9%
Other values (201) 1593
15.9%
(Missing) 1728
17.3%
ValueCountFrequency (%)
0 424
4.2%
1 685
6.9%
2 859
8.6%
3 1044
10.4%
4 935
9.3%
5 758
7.6%
6 677
6.8%
7 569
5.7%
8 440
4.4%
9 288
 
2.9%
ValueCountFrequency (%)
3573 1
< 0.1%
3443 1
< 0.1%
3411 1
< 0.1%
3370 1
< 0.1%
3336 1
< 0.1%
2948 1
< 0.1%
2908 1
< 0.1%
2834 1
< 0.1%
2759 1
< 0.1%
2599 1
< 0.1%

여자 말소 인구수
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED  ZEROS 

Distinct188
Distinct (%)2.3%
Missing1728
Missing (%)17.3%
Infinite0
Infinite (%)0.0%
Mean13.699105
Minimum0
Maximum3039
Zeros504
Zeros (%)5.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T11:12:26.950864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile49
Maximum3039
Range3039
Interquartile range (IQR)5

Descriptive statistics

Standard deviation101.03012
Coefficient of variation (CV)7.3749429
Kurtosis652.06874
Mean13.699105
Median Absolute Deviation (MAD)2
Skewness24.762845
Sum113319
Variance10207.085
MonotonicityNot monotonic
2024-04-11T11:12:27.269549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3 1145
11.5%
2 1064
10.6%
4 994
9.9%
1 856
8.6%
5 802
8.0%
6 610
 
6.1%
0 504
 
5.0%
7 446
 
4.5%
8 336
 
3.4%
9 234
 
2.3%
Other values (178) 1281
12.8%
(Missing) 1728
17.3%
ValueCountFrequency (%)
0 504
5.0%
1 856
8.6%
2 1064
10.6%
3 1145
11.5%
4 994
9.9%
5 802
8.0%
6 610
6.1%
7 446
 
4.5%
8 336
 
3.4%
9 234
 
2.3%
ValueCountFrequency (%)
3039 1
< 0.1%
3006 1
< 0.1%
2928 1
< 0.1%
2886 1
< 0.1%
2803 1
< 0.1%
2473 1
< 0.1%
2427 1
< 0.1%
2369 1
< 0.1%
2340 1
< 0.1%
2209 1
< 0.1%

Interactions

2024-04-11T11:12:24.359882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:22.653614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.107390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.527026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.925060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.441147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:22.771435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.187970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.605580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.002947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.516292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:22.845248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.277068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.679559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.082711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.600216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:22.932345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.367843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.757293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.177937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.685416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.030033image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.448982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:23.841044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:12:24.269548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-11T11:12:27.348589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도행정구역구분명총 말소 인구수남자 말소 인구수여자 말소 인구수
연도1.0000.3380.0000.0320.0160.021
0.3381.0000.0000.0000.0180.027
행정구역구분명0.0000.0001.0000.7090.7410.648
총 말소 인구수0.0320.0000.7091.0001.0000.997
남자 말소 인구수0.0160.0180.7411.0001.0000.944
여자 말소 인구수0.0210.0270.6480.9970.9441.000
2024-04-11T11:12:27.435687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도총 말소 인구수남자 말소 인구수여자 말소 인구수행정구역구분명
연도1.000-0.0250.1590.1360.1610.000
-0.0251.0000.0100.0070.0050.000
총 말소 인구수0.1590.0101.0000.9130.8770.577
남자 말소 인구수0.1360.0070.9131.0000.6480.578
여자 말소 인구수0.1610.0050.8770.6481.0000.578
행정구역구분명0.0000.0000.5770.5780.5781.000

Missing values

2024-04-11T11:12:24.792499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-11T11:12:24.899413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-11T11:12:24.990633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연도행정구역구분명행정구역명총 말소 인구수남자 말소 인구수여자 말소 인구수
7250720153읍면동경기도 시흥시 매화동514
30423202012읍면동경기도 광명시 광명4동844
46020201810읍면동경기도 고양시 일산서구 일산3동13310
38892201910읍면동경기도 성남시 분당구 야탑2동954
4679420189읍면동경기도 성남시 분당구 정자동101
67648201511시군경기도 안산시1969898
44756201812읍면동경기도 가평군 청평면1367
67577201511읍면동경기도 성남시 중원구 상대원2동1082
6188420168시군경기도 과천시1688
4673420189읍면동경기도 동두천시 중앙동861
연도행정구역구분명행정구역명총 말소 인구수남자 말소 인구수여자 말소 인구수
4682220189읍면동경기도 성남시 중원구 은행1동312
3556020204읍면동경기도 연천군 청산면220
3020243읍면동경기도 고양시 덕양구 흥도동954
5924920171읍면동경기도 용인시 수지구 풍덕천1동1156
6173320169읍면동경기도 이천시 중리동523
81339201312읍면동경기도 과천시 별양동817
45674201811읍면동경기도 안산시 단원구 선부1동413
1896920226읍면동경기도 성남시 수정구 신흥1동1596
4343520193읍면동경기도 파주시 운정1동1486
45360201812읍면동경기도 화성시 화산동853

Duplicate rows

Most frequently occurring

연도행정구역구분명행정구역명총 말소 인구수남자 말소 인구수여자 말소 인구수# duplicates
82202211읍면동경기도 안양시 동안구 귀인동4134
7202211경기도 성남시 수정구12870583
10202211시군경기도 성남시4362232133
21202211읍면동경기도 고양시 일산동구 백석2동11563
23202211읍면동경기도 고양시 일산서구 덕이동16793
27202211읍면동경기도 광주시 경안동161153
38202211읍면동경기도 김포시 풍무동2613133
44202211읍면동경기도 성남시 분당구 수내2동1013
57202211읍면동경기도 수원시 영통구 광교2동6423
68202211읍면동경기도 시흥시 정왕본동12843