Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells4836
Missing cells (%)6.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory673.8 KiB
Average record size in memory69.0 B

Variable types

Numeric5
Categorical1
Text1

Alerts

총 등록 인구수 is highly overall correlated with 남자 등록 인구수 and 2 other fieldsHigh correlation
남자 등록 인구수 is highly overall correlated with 총 등록 인구수 and 2 other fieldsHigh correlation
여자 등록 인구수 is highly overall correlated with 총 등록 인구수 and 2 other fieldsHigh correlation
행정구역구분명 is highly overall correlated with 총 등록 인구수 and 2 other fieldsHigh correlation
행정구역구분명 is highly imbalanced (74.5%)Imbalance
남자 등록 인구수 has 2418 (24.2%) missing valuesMissing
여자 등록 인구수 has 2418 (24.2%) missing valuesMissing
총 등록 인구수 is highly skewed (γ1 = 22.50062442)Skewed
남자 등록 인구수 is highly skewed (γ1 = 21.27294731)Skewed
여자 등록 인구수 is highly skewed (γ1 = 21.34806904)Skewed
총 등록 인구수 has 539 (5.4%) zerosZeros
남자 등록 인구수 has 835 (8.3%) zerosZeros
여자 등록 인구수 has 831 (8.3%) zerosZeros

Reproduction

Analysis started2024-04-11 04:52:03.172632
Analysis finished2024-04-11 04:52:07.410315
Duration4.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.929
Minimum2010
Maximum2024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T13:52:07.461865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2011
Q12014
median2017
Q32020
95-th percentile2023
Maximum2024
Range14
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.9779569
Coefficient of variation (CV)0.0019722841
Kurtosis-1.1666139
Mean2016.929
Median Absolute Deviation (MAD)3
Skewness-0.015576905
Sum20169290
Variance15.824141
MonotonicityNot monotonic
2024-04-11T13:52:07.568729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
2022 767
 
7.7%
2018 749
 
7.5%
2019 747
 
7.5%
2016 742
 
7.4%
2014 736
 
7.4%
2015 732
 
7.3%
2020 731
 
7.3%
2023 722
 
7.2%
2011 716
 
7.2%
2017 712
 
7.1%
Other values (5) 2646
26.5%
ValueCountFrequency (%)
2010 350
3.5%
2011 716
7.2%
2012 704
7.0%
2013 699
7.0%
2014 736
7.4%
2015 732
7.3%
2016 742
7.4%
2017 712
7.1%
2018 749
7.5%
2019 747
7.5%
ValueCountFrequency (%)
2024 197
 
2.0%
2023 722
7.2%
2022 767
7.7%
2021 696
7.0%
2020 731
7.3%
2019 747
7.5%
2018 749
7.5%
2017 712
7.1%
2016 742
7.4%
2015 732
7.3%


Real number (ℝ)

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.5598
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T13:52:07.664802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.4834519
Coefficient of variation (CV)0.5310302
Kurtosis-1.2439961
Mean6.5598
Median Absolute Deviation (MAD)3
Skewness-0.033797065
Sum65598
Variance12.134437
MonotonicityNot monotonic
2024-04-11T13:52:07.745257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 920
9.2%
10 872
8.7%
12 868
8.7%
9 863
8.6%
8 857
8.6%
11 846
8.5%
1 838
8.4%
7 836
8.4%
2 820
8.2%
6 792
7.9%
Other values (2) 1488
14.9%
ValueCountFrequency (%)
1 838
8.4%
2 820
8.2%
3 920
9.2%
4 749
7.5%
5 739
7.4%
6 792
7.9%
7 836
8.4%
8 857
8.6%
9 863
8.6%
10 872
8.7%
ValueCountFrequency (%)
12 868
8.7%
11 846
8.5%
10 872
8.7%
9 863
8.6%
8 857
8.6%
7 836
8.4%
6 792
7.9%
5 739
7.4%
4 749
7.5%
3 920
9.2%

행정구역구분명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
읍면동
9154 
시군
 
530
 
297
 
19

Length

Max length3
Median length3
Mean length2.8838
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row읍면동
2nd row읍면동
3rd row읍면동
4th row읍면동
5th row시군

Common Values

ValueCountFrequency (%)
읍면동 9154
91.5%
시군 530
 
5.3%
297
 
3.0%
19
 
0.2%

Length

2024-04-11T13:52:07.849112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-11T13:52:07.932636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
읍면동 9154
91.5%
시군 530
 
5.3%
297
 
3.0%
19
 
0.2%
Distinct1004
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-11T13:52:08.121324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length16
Mean length12.8475
Min length3

Characters and Unicode

Total characters128475
Distinct characters212
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)1.3%

Sample

1st row경기도 고양시 덕양구 신도동
2nd row경기도 고양시 일산동구 마두2동
3rd row경기도 고양시 일산동구 마두1동
4th row경기도 파주시 운정2동
5th row경기도 이천시
ValueCountFrequency (%)
경기도 10000
30.0%
성남시 889
 
2.7%
수원시 777
 
2.3%
고양시 767
 
2.3%
용인시 603
 
1.8%
안양시 551
 
1.7%
부천시 482
 
1.4%
화성시 425
 
1.3%
안산시 412
 
1.2%
분당구 401
 
1.2%
Other values (678) 17978
54.0%
2024-04-11T13:52:08.437581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
23865
18.6%
10279
 
8.0%
10141
 
7.9%
10011
 
7.8%
9673
 
7.5%
7743
 
6.0%
4431
 
3.4%
2897
 
2.3%
2392
 
1.9%
1890
 
1.5%
Other values (202) 45153
35.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 101190
78.8%
Space Separator 23865
 
18.6%
Decimal Number 3420
 
2.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10279
 
10.2%
10141
 
10.0%
10011
 
9.9%
9673
 
9.6%
7743
 
7.7%
4431
 
4.4%
2897
 
2.9%
2392
 
2.4%
1890
 
1.9%
1758
 
1.7%
Other values (192) 39975
39.5%
Decimal Number
ValueCountFrequency (%)
2 1261
36.9%
1 1255
36.7%
3 563
16.5%
4 163
 
4.8%
5 46
 
1.3%
7 45
 
1.3%
6 42
 
1.2%
9 24
 
0.7%
8 21
 
0.6%
Space Separator
ValueCountFrequency (%)
23865
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 101190
78.8%
Common 27285
 
21.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10279
 
10.2%
10141
 
10.0%
10011
 
9.9%
9673
 
9.6%
7743
 
7.7%
4431
 
4.4%
2897
 
2.9%
2392
 
2.4%
1890
 
1.9%
1758
 
1.7%
Other values (192) 39975
39.5%
Common
ValueCountFrequency (%)
23865
87.5%
2 1261
 
4.6%
1 1255
 
4.6%
3 563
 
2.1%
4 163
 
0.6%
5 46
 
0.2%
7 45
 
0.2%
6 42
 
0.2%
9 24
 
0.1%
8 21
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 101190
78.8%
ASCII 27285
 
21.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
23865
87.5%
2 1261
 
4.6%
1 1255
 
4.6%
3 563
 
2.1%
4 163
 
0.6%
5 46
 
0.2%
7 45
 
0.2%
6 42
 
0.2%
9 24
 
0.1%
8 21
 
0.1%
Hangul
ValueCountFrequency (%)
10279
 
10.2%
10141
 
10.0%
10011
 
9.9%
9673
 
9.6%
7743
 
7.7%
4431
 
4.4%
2897
 
2.9%
2392
 
2.4%
1890
 
1.9%
1758
 
1.7%
Other values (192) 39975
39.5%

총 등록 인구수
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct472
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.2429
Minimum0
Maximum11501
Zeros539
Zeros (%)5.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T13:52:08.570251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median12
Q325
95-th percentile165
Maximum11501
Range11501
Interquartile range (IQR)20

Descriptive statistics

Standard deviation373.06409
Coefficient of variation (CV)7.5759975
Kurtosis547.01789
Mean49.2429
Median Absolute Deviation (MAD)9
Skewness22.500624
Sum492429
Variance139176.81
MonotonicityNot monotonic
2024-04-11T13:52:08.679659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 539
 
5.4%
1 520
 
5.2%
2 485
 
4.9%
3 404
 
4.0%
5 388
 
3.9%
4 388
 
3.9%
6 381
 
3.8%
9 347
 
3.5%
8 339
 
3.4%
10 335
 
3.4%
Other values (462) 5874
58.7%
ValueCountFrequency (%)
0 539
5.4%
1 520
5.2%
2 485
4.9%
3 404
4.0%
4 388
3.9%
5 388
3.9%
6 381
3.8%
7 334
3.3%
8 339
3.4%
9 347
3.5%
ValueCountFrequency (%)
11501 1
< 0.1%
10378 1
< 0.1%
10275 1
< 0.1%
9696 1
< 0.1%
9555 1
< 0.1%
9519 1
< 0.1%
9234 1
< 0.1%
9044 1
< 0.1%
8453 1
< 0.1%
8401 1
< 0.1%

남자 등록 인구수
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED  ZEROS 

Distinct269
Distinct (%)3.5%
Missing2418
Missing (%)24.2%
Infinite0
Infinite (%)0.0%
Mean24.041414
Minimum0
Maximum4992
Zeros835
Zeros (%)8.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T13:52:08.797293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median6
Q312
95-th percentile75
Maximum4992
Range4992
Interquartile range (IQR)10

Descriptive statistics

Standard deviation187.88749
Coefficient of variation (CV)7.8151596
Kurtosis481.41772
Mean24.041414
Median Absolute Deviation (MAD)4
Skewness21.272947
Sum182282
Variance35301.708
MonotonicityNot monotonic
2024-04-11T13:52:09.107504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 835
 
8.3%
1 715
 
7.1%
3 593
 
5.9%
2 567
 
5.7%
4 515
 
5.1%
5 484
 
4.8%
6 452
 
4.5%
7 383
 
3.8%
8 346
 
3.5%
9 299
 
3.0%
Other values (259) 2393
23.9%
(Missing) 2418
24.2%
ValueCountFrequency (%)
0 835
8.3%
1 715
7.1%
2 567
5.7%
3 593
5.9%
4 515
5.1%
5 484
4.8%
6 452
4.5%
7 383
3.8%
8 346
3.5%
9 299
 
3.0%
ValueCountFrequency (%)
4992 1
< 0.1%
4859 1
< 0.1%
4842 1
< 0.1%
4653 1
< 0.1%
4640 1
< 0.1%
4337 1
< 0.1%
4253 1
< 0.1%
4213 1
< 0.1%
3763 1
< 0.1%
3675 1
< 0.1%

여자 등록 인구수
Real number (ℝ)

HIGH CORRELATION  MISSING  SKEWED  ZEROS 

Distinct266
Distinct (%)3.5%
Missing2418
Missing (%)24.2%
Infinite0
Infinite (%)0.0%
Mean22.935109
Minimum0
Maximum4713
Zeros831
Zeros (%)8.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-11T13:52:09.207120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median5
Q311
95-th percentile71
Maximum4713
Range4713
Interquartile range (IQR)9

Descriptive statistics

Standard deviation179.36262
Coefficient of variation (CV)7.8204388
Kurtosis485.88298
Mean22.935109
Median Absolute Deviation (MAD)4
Skewness21.348069
Sum173894
Variance32170.949
MonotonicityNot monotonic
2024-04-11T13:52:09.308767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 831
 
8.3%
1 736
 
7.4%
2 643
 
6.4%
4 577
 
5.8%
3 557
 
5.6%
5 490
 
4.9%
6 463
 
4.6%
7 389
 
3.9%
8 346
 
3.5%
9 314
 
3.1%
Other values (256) 2236
22.4%
(Missing) 2418
24.2%
ValueCountFrequency (%)
0 831
8.3%
1 736
7.4%
2 643
6.4%
3 557
5.6%
4 577
5.8%
5 490
4.9%
6 463
4.6%
7 389
3.9%
8 346
3.5%
9 314
 
3.1%
ValueCountFrequency (%)
4713 1
< 0.1%
4704 1
< 0.1%
4660 1
< 0.1%
4581 1
< 0.1%
4404 1
< 0.1%
4200 1
< 0.1%
4064 1
< 0.1%
4002 1
< 0.1%
3654 1
< 0.1%
3427 1
< 0.1%

Interactions

2024-04-11T13:52:06.794303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.263039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.668454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.051903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.412146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.864429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.385094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.757332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.134726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.483589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.937234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.457886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.832200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.205691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.561271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:07.015942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.533238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.906730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.277015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.647809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:07.084324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.602595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:05.979854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.345357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T13:52:06.723494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-11T13:52:09.383446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도행정구역구분명총 등록 인구수남자 등록 인구수여자 등록 인구수
연도1.0000.1210.0110.0410.0190.019
0.1211.0000.0000.0000.0000.000
행정구역구분명0.0110.0001.0000.8870.7090.709
총 등록 인구수0.0410.0000.8871.0000.9460.946
남자 등록 인구수0.0190.0000.7090.9461.0001.000
여자 등록 인구수0.0190.0000.7090.9461.0001.000
2024-04-11T13:52:09.467575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도총 등록 인구수남자 등록 인구수여자 등록 인구수행정구역구분명
연도1.000-0.081-0.184-0.134-0.1350.010
-0.0811.000-0.009-0.015-0.0210.000
총 등록 인구수-0.184-0.0091.0000.9670.9580.577
남자 등록 인구수-0.134-0.0150.9671.0000.8650.578
여자 등록 인구수-0.135-0.0210.9580.8651.0000.578
행정구역구분명0.0100.0000.5770.5780.5781.000

Missing values

2024-04-11T13:52:07.170824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-11T13:52:07.280093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-11T13:52:07.362901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연도행정구역구분명행정구역명총 등록 인구수남자 등록 인구수여자 등록 인구수
7452720141읍면동경기도 고양시 덕양구 신도동523
5508420169읍면동경기도 고양시 일산동구 마두2동422
3320243읍면동경기도 고양시 일산동구 마두1동761
3481120197읍면동경기도 파주시 운정2동643133
7380320143시군경기도 이천시1597980
155520241읍면동경기도 성남시 수정구 수진2동312
8103120133읍면동경기도 파주시 월롱면4<NA><NA>
9200220118읍면동경기도 남양주시 호평동38<NA><NA>
4421920183읍면동경기도 남양주시 화도읍동부출장소633
482520238읍면동경기도 안성시 양성면000
연도행정구역구분명행정구역명총 등록 인구수남자 등록 인구수여자 등록 인구수
5629420167읍면동경기도 고양시 덕양구 화전동101
83614201210읍면동경기도 군포시 오금동48<NA><NA>
2793820206읍면동경기도 부천시 대산동402218
33032201910읍면동경기도 이천시 창전동853
9595720112읍면동경기도 평택시 고덕면14<NA><NA>
898320231읍면동경기도 김포시 양촌읍1495
61786201510읍면동경기도 가평군 하면1147
1212020228읍면동경기도 부천시 상동492227
7085020148읍면동경기도 포천시 일동면532
69010201411읍면동경기도 평택시 신장2동404