Overview

Dataset statistics

Number of variables7
Number of observations2494
Missing cells63
Missing cells (%)0.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory143.8 KiB
Average record size in memory59.1 B

Variable types

Numeric3
Categorical1
Text3

Alerts

시도명 has constant value ""Constant
행정동코드 is highly overall correlated with 법정동코드High correlation
법정동코드 is highly overall correlated with 행정동코드High correlation
읍면동명 has 57 (2.3%) missing valuesMissing

Reproduction

Analysis started2024-04-11 02:40:36.827152
Analysis finished2024-04-11 02:40:41.152194
Duration4.33 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

행정동코드
Real number (ℝ)

HIGH CORRELATION 

Distinct659
Distinct (%)26.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1489014 × 109
Minimum4.1 × 109
Maximum4.183041 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.0 KiB
2024-04-11T11:40:41.279450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4.1 × 109
5-th percentile4.1135543 × 109
Q14.1310533 × 109
median4.150036 × 109
Q34.1625152 × 109
95-th percentile4.182035 × 109
Maximum4.183041 × 109
Range83041000
Interquartile range (IQR)31461975

Descriptive statistics

Standard deviation20245213
Coefficient of variation (CV)0.0048796564
Kurtosis-0.82514999
Mean4.1489014 × 109
Median Absolute Deviation (MAD)14002000
Skewness-0.18337714
Sum1.034736 × 1013
Variance4.0986865 × 1014
MonotonicityIncreasing
2024-04-11T11:40:41.469100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4148039000 28
 
1.1%
4122025000 26
 
1.0%
4180033000 22
 
0.9%
4159025900 22
 
0.9%
4167025000 21
 
0.8%
4167035000 20
 
0.8%
4159041000 20
 
0.8%
4155031000 20
 
0.8%
4155034000 19
 
0.8%
4180035000 19
 
0.8%
Other values (649) 2277
91.3%
ValueCountFrequency (%)
4100000000 1
< 0.1%
4111000000 1
< 0.1%
4111100000 1
< 0.1%
4111156000 2
0.1%
4111156600 2
0.1%
4111157100 1
< 0.1%
4111157200 1
< 0.1%
4111157300 2
0.1%
4111158000 1
< 0.1%
4111159100 1
< 0.1%
ValueCountFrequency (%)
4183041000 14
0.6%
4183040000 14
0.6%
4183039500 11
0.4%
4183038000 9
0.4%
4183037000 10
0.4%
4183036000 10
0.4%
4183035000 9
0.4%
4183034000 5
 
0.2%
4183033000 12
0.5%
4183032000 7
0.3%

시도명
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
경기도
2494 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기도
2nd row경기도
3rd row경기도
4th row경기도
5th row경기도

Common Values

ValueCountFrequency (%)
경기도 2494
100.0%

Length

2024-04-11T11:40:41.644377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-11T11:40:41.772394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
경기도 2494
100.0%
Distinct56
Distinct (%)2.2%
Missing1
Missing (%)< 0.1%
Memory size19.6 KiB
2024-04-11T11:40:42.033752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length3
Mean length3.8604091
Min length3

Characters and Unicode

Total characters9624
Distinct characters66
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.5%

Sample

1st row수원시
2nd row수원시 장안구
3rd row수원시 장안구
4th row수원시 장안구
5th row수원시 장안구
ValueCountFrequency (%)
화성시 212
 
7.1%
안성시 204
 
6.8%
평택시 162
 
5.4%
파주시 159
 
5.3%
여주시 156
 
5.2%
이천시 143
 
4.8%
용인시 126
 
4.2%
양평군 123
 
4.1%
연천군 114
 
3.8%
포천시 101
 
3.4%
Other values (46) 1488
49.8%
2024-04-11T11:40:42.484458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2227
23.1%
539
 
5.6%
511
 
5.3%
495
 
5.1%
492
 
5.1%
441
 
4.6%
433
 
4.5%
353
 
3.7%
339
 
3.5%
325
 
3.4%
Other values (56) 3469
36.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9129
94.9%
Space Separator 495
 
5.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2227
24.4%
539
 
5.9%
511
 
5.6%
492
 
5.4%
441
 
4.8%
433
 
4.7%
353
 
3.9%
339
 
3.7%
325
 
3.6%
216
 
2.4%
Other values (55) 3253
35.6%
Space Separator
ValueCountFrequency (%)
495
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9129
94.9%
Common 495
 
5.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2227
24.4%
539
 
5.9%
511
 
5.6%
492
 
5.4%
441
 
4.8%
433
 
4.7%
353
 
3.9%
339
 
3.7%
325
 
3.6%
216
 
2.4%
Other values (55) 3253
35.6%
Common
ValueCountFrequency (%)
495
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9129
94.9%
ASCII 495
 
5.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2227
24.4%
539
 
5.9%
511
 
5.6%
492
 
5.4%
441
 
4.8%
433
 
4.7%
353
 
3.9%
339
 
3.7%
325
 
3.6%
216
 
2.4%
Other values (55) 3253
35.6%
ASCII
ValueCountFrequency (%)
495
100.0%

읍면동명
Text

MISSING 

Distinct577
Distinct (%)23.7%
Missing57
Missing (%)2.3%
Memory size19.6 KiB
2024-04-11T11:40:42.916961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length3
Mean length3.1764465
Min length2

Characters and Unicode

Total characters7741
Distinct characters213
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique251 ?
Unique (%)10.3%

Sample

1st row파장동
2nd row파장동
3rd row율천동
4th row율천동
5th row정자1동
ValueCountFrequency (%)
장단면 28
 
1.1%
팽성읍 26
 
1.1%
향남읍 22
 
0.9%
백학면 22
 
0.9%
가남읍 21
 
0.9%
대신면 20
 
0.8%
보개면 20
 
0.8%
정남면 20
 
0.8%
미양면 19
 
0.8%
왕징면 19
 
0.8%
Other values (567) 2220
91.1%
2024-04-11T11:40:43.523042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1124
 
14.5%
968
 
12.5%
431
 
5.6%
157
 
2.0%
156
 
2.0%
1 151
 
2.0%
147
 
1.9%
2 133
 
1.7%
130
 
1.7%
115
 
1.5%
Other values (203) 4229
54.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7361
95.1%
Decimal Number 380
 
4.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1124
 
15.3%
968
 
13.2%
431
 
5.9%
157
 
2.1%
156
 
2.1%
147
 
2.0%
130
 
1.8%
115
 
1.6%
110
 
1.5%
106
 
1.4%
Other values (194) 3917
53.2%
Decimal Number
ValueCountFrequency (%)
1 151
39.7%
2 133
35.0%
3 56
 
14.7%
4 12
 
3.2%
5 10
 
2.6%
6 9
 
2.4%
7 4
 
1.1%
9 3
 
0.8%
8 2
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7361
95.1%
Common 380
 
4.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1124
 
15.3%
968
 
13.2%
431
 
5.9%
157
 
2.1%
156
 
2.1%
147
 
2.0%
130
 
1.8%
115
 
1.6%
110
 
1.5%
106
 
1.4%
Other values (194) 3917
53.2%
Common
ValueCountFrequency (%)
1 151
39.7%
2 133
35.0%
3 56
 
14.7%
4 12
 
3.2%
5 10
 
2.6%
6 9
 
2.4%
7 4
 
1.1%
9 3
 
0.8%
8 2
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7361
95.1%
ASCII 380
 
4.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1124
 
15.3%
968
 
13.2%
431
 
5.9%
157
 
2.1%
156
 
2.1%
147
 
2.0%
130
 
1.8%
115
 
1.6%
110
 
1.5%
106
 
1.4%
Other values (194) 3917
53.2%
ASCII
ValueCountFrequency (%)
1 151
39.7%
2 133
35.0%
3 56
 
14.7%
4 12
 
3.2%
5 10
 
2.6%
6 9
 
2.4%
7 4
 
1.1%
9 3
 
0.8%
8 2
 
0.5%

법정동코드
Real number (ℝ)

HIGH CORRELATION 

Distinct2218
Distinct (%)88.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.1488851 × 109
Minimum4.1 × 109
Maximum4.183041 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.0 KiB
2024-04-11T11:40:43.747050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4.1 × 109
5-th percentile4.1135103 × 109
Q14.1310104 × 109
median4.150035 × 109
Q34.1625093 × 109
95-th percentile4.182035 × 109
Maximum4.183041 × 109
Range83041033
Interquartile range (IQR)31498857

Descriptive statistics

Standard deviation20258449
Coefficient of variation (CV)0.0048828657
Kurtosis-0.82537335
Mean4.1488851 × 109
Median Absolute Deviation (MAD)14008669
Skewness-0.18375657
Sum1.0347319 × 1013
Variance4.1040474 × 1014
MonotonicityNot monotonic
2024-04-11T11:40:43.955352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4117110100 11
 
0.4%
4121010100 7
 
0.3%
4141010400 6
 
0.2%
4139013200 6
 
0.2%
4117310400 6
 
0.2%
4119210800 6
 
0.2%
4117310100 5
 
0.2%
4119210900 5
 
0.2%
4131010400 5
 
0.2%
4122011200 4
 
0.2%
Other values (2208) 2433
97.6%
ValueCountFrequency (%)
4100000000 1
 
< 0.1%
4111000000 1
 
< 0.1%
4111100000 1
 
< 0.1%
4111112900 1
 
< 0.1%
4111113000 3
0.1%
4111113100 1
 
< 0.1%
4111113200 1
 
< 0.1%
4111113300 2
0.1%
4111113400 1
 
< 0.1%
4111113500 1
 
< 0.1%
ValueCountFrequency (%)
4183041033 1
< 0.1%
4183041032 1
< 0.1%
4183041031 1
< 0.1%
4183041030 1
< 0.1%
4183041029 1
< 0.1%
4183041028 1
< 0.1%
4183041027 1
< 0.1%
4183041026 1
< 0.1%
4183041025 1
< 0.1%
4183041024 1
< 0.1%
Distinct1961
Distinct (%)78.8%
Missing5
Missing (%)0.2%
Memory size19.6 KiB
2024-04-11T11:40:44.345994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.0265167
Min length2

Characters and Unicode

Total characters7533
Distinct characters317
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1630 ?
Unique (%)65.5%

Sample

1st row경기도
2nd row수원시
3rd row수원시장안구
4th row파장동
5th row이목동
ValueCountFrequency (%)
안양동 11
 
0.4%
중동 10
 
0.4%
광명동 7
 
0.3%
정자동 7
 
0.3%
호계동 6
 
0.2%
상동 6
 
0.2%
산본동 6
 
0.2%
내리 6
 
0.2%
정왕동 6
 
0.2%
도곡리 6
 
0.2%
Other values (1951) 2418
97.1%
2024-04-11T11:40:44.909916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1424
 
18.9%
961
 
12.8%
182
 
2.4%
133
 
1.8%
113
 
1.5%
104
 
1.4%
104
 
1.4%
96
 
1.3%
91
 
1.2%
87
 
1.2%
Other values (307) 4238
56.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7527
99.9%
Decimal Number 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1424
 
18.9%
961
 
12.8%
182
 
2.4%
133
 
1.8%
113
 
1.5%
104
 
1.4%
104
 
1.4%
96
 
1.3%
91
 
1.2%
87
 
1.2%
Other values (304) 4232
56.2%
Decimal Number
ValueCountFrequency (%)
2 2
33.3%
3 2
33.3%
1 2
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7527
99.9%
Common 6
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1424
 
18.9%
961
 
12.8%
182
 
2.4%
133
 
1.8%
113
 
1.5%
104
 
1.4%
104
 
1.4%
96
 
1.3%
91
 
1.2%
87
 
1.2%
Other values (304) 4232
56.2%
Common
ValueCountFrequency (%)
2 2
33.3%
3 2
33.3%
1 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7527
99.9%
ASCII 6
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1424
 
18.9%
961
 
12.8%
182
 
2.4%
133
 
1.8%
113
 
1.5%
104
 
1.4%
104
 
1.4%
96
 
1.3%
91
 
1.2%
87
 
1.2%
Other values (304) 4232
56.2%
ASCII
ValueCountFrequency (%)
2 2
33.3%
3 2
33.3%
1 2
33.3%

생성일자
Real number (ℝ)

Distinct148
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20013546
Minimum19880423
Maximum20240101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size22.0 KiB
2024-04-11T11:40:45.123256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19880423
5-th percentile19880423
Q119950510
median20010321
Q320060102
95-th percentile20211231
Maximum20240101
Range359678
Interquartile range (IQR)109592

Descriptive statistics

Standard deviation99511.356
Coefficient of variation (CV)0.0049722001
Kurtosis-0.38235451
Mean20013546
Median Absolute Deviation (MAD)59811
Skewness0.60959943
Sum4.9913784 × 1010
Variance9.9025099 × 109
MonotonicityNot monotonic
2024-04-11T11:40:45.340776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19880423 322
 
12.9%
19960301 273
 
10.9%
19980401 231
 
9.3%
20010321 164
 
6.6%
20031019 159
 
6.4%
20130923 142
 
5.7%
19950510 115
 
4.6%
20051031 80
 
3.2%
19890101 80
 
3.2%
20240101 72
 
2.9%
Other values (138) 856
34.3%
ValueCountFrequency (%)
19880423 322
12.9%
19880701 2
 
0.1%
19890101 80
 
3.2%
19890401 6
 
0.2%
19890501 30
 
1.2%
19900101 1
 
< 0.1%
19900520 2
 
0.1%
19910107 1
 
< 0.1%
19910907 4
 
0.2%
19910916 7
 
0.3%
ValueCountFrequency (%)
20240101 72
2.9%
20231010 4
 
0.2%
20230703 1
 
< 0.1%
20230701 2
 
0.1%
20230501 1
 
< 0.1%
20230429 2
 
0.1%
20230109 11
 
0.4%
20220901 7
 
0.3%
20220103 13
 
0.5%
20220102 3
 
0.1%

Interactions

2024-04-11T11:40:40.288778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:39.278209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:39.809385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:40.426744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:39.497503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:39.972830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:40.562383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:39.645796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-11T11:40:40.127507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-11T11:40:45.485788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행정동코드시군구명법정동코드생성일자
행정동코드1.0001.0001.0000.778
시군구명1.0001.0001.0000.934
법정동코드1.0001.0001.0000.778
생성일자0.7780.9340.7781.000
2024-04-11T11:40:45.623281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행정동코드법정동코드생성일자
행정동코드1.0000.998-0.062
법정동코드0.9981.000-0.069
생성일자-0.062-0.0691.000

Missing values

2024-04-11T11:40:40.754079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-11T11:40:40.916882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-11T11:40:41.072660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

행정동코드시도명시군구명읍면동명법정동코드동리명생성일자
04100000000경기도<NA><NA>4100000000경기도19880423
14111000000경기도수원시<NA>4111000000수원시19880423
24111100000경기도수원시 장안구<NA>4111100000수원시장안구19880701
34111156000경기도수원시 장안구파장동4111112900파장동20031124
44111156000경기도수원시 장안구파장동4111113100이목동20031124
54111156600경기도수원시 장안구율천동4111113200율전동20031124
64111156600경기도수원시 장안구율천동4111113300천천동20031124
74111157100경기도수원시 장안구정자1동4111113000정자동20031124
84111157200경기도수원시 장안구정자2동4111113000정자동20031124
94111157300경기도수원시 장안구정자3동4111113000정자동20031124
행정동코드시도명시군구명읍면동명법정동코드동리명생성일자
24844183041000경기도양평군개군면4183041031주읍리19880423
24854183041000경기도양평군개군면4183041032계전리19880423
24864183041000경기도양평군개군면4183041033상자포리19880423
24874183041000경기도양평군개군면4183041000개군면19880423
24884183041000경기도양평군개군면4183041021하자포리19880423
24894183041000경기도양평군개군면4183041022구미리19880423
24904183041000경기도양평군개군면4183041023앙덕리19880423
24914183041000경기도양평군개군면4183041024석장리19880423
24924183041000경기도양평군개군면4183041025공세리19880423
24934183041000경기도양평군개군면4183041026불곡리19880423