Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells1065
Missing cells (%)2.7%
Duplicate rows2125
Duplicate rows (%)21.2%
Total size in memory419.9 KiB
Average record size in memory43.0 B

Variable types

Categorical2
Text1
Numeric1

Dataset

Description경기도 경기통계시스템 주석
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=IIFR8SCIUSBQJF8B2C1S33484661&infSeq=1

Alerts

조직번호 has constant value ""Constant
Dataset has 2125 (21.2%) duplicate rowsDuplicates
표항목인식번호 is highly imbalanced (99.9%)Imbalance
최종변경일 has 1065 (10.7%) missing valuesMissing

Reproduction

Analysis started2023-12-10 21:50:34.844346
Analysis finished2023-12-10 21:50:35.451538
Duration0.61 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

조직번호
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
210
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row210
2nd row210
3rd row210
4th row210
5th row210

Common Values

ValueCountFrequency (%)
210 10000
100.0%

Length

2023-12-11T06:50:35.504918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:50:35.583428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
210 10000
100.0%
Distinct3410
Distinct (%)34.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T06:50:35.774437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length22
Mean length13.0853
Min length6

Characters and Unicode

Total characters130853
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1332 ?
Unique (%)13.3%

Sample

1st rowTX_210040196_BAK12
2nd rowTX_210050414
3rd rowDT_1N00003
4th rowDT_GGSTAT_A066
5th rowDT_210011_075
ValueCountFrequency (%)
dt_21002_m022 35
 
0.4%
dt_gg0001 35
 
0.4%
dt_21002_m021 26
 
0.3%
dt_1j00002_bk 25
 
0.2%
dt_21002_m021_bk 24
 
0.2%
dt_21002_m022_bk 21
 
0.2%
dt_1r00003 21
 
0.2%
dt_1r00005_2 21
 
0.2%
dt_1p00005 20
 
0.2%
dt_1p00009 20
 
0.2%
Other values (3400) 9752
97.5%
2023-12-11T06:50:36.119416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 32496
24.8%
_ 17968
13.7%
1 15184
11.6%
2 13931
10.6%
T 10714
 
8.2%
D 8690
 
6.6%
K 4197
 
3.2%
B 3663
 
2.8%
4 2241
 
1.7%
3 2224
 
1.7%
Other values (25) 19545
14.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 74725
57.1%
Uppercase Letter 38160
29.2%
Connector Punctuation 17968
 
13.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 10714
28.1%
D 8690
22.8%
K 4197
 
11.0%
B 3663
 
9.6%
X 1554
 
4.1%
A 1454
 
3.8%
G 1218
 
3.2%
E 1057
 
2.8%
M 974
 
2.6%
P 822
 
2.2%
Other values (14) 3817
 
10.0%
Decimal Number
ValueCountFrequency (%)
0 32496
43.5%
1 15184
20.3%
2 13931
18.6%
4 2241
 
3.0%
3 2224
 
3.0%
5 2093
 
2.8%
7 1928
 
2.6%
8 1892
 
2.5%
6 1624
 
2.2%
9 1112
 
1.5%
Connector Punctuation
ValueCountFrequency (%)
_ 17968
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 92693
70.8%
Latin 38160
29.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 10714
28.1%
D 8690
22.8%
K 4197
 
11.0%
B 3663
 
9.6%
X 1554
 
4.1%
A 1454
 
3.8%
G 1218
 
3.2%
E 1057
 
2.8%
M 974
 
2.6%
P 822
 
2.2%
Other values (14) 3817
 
10.0%
Common
ValueCountFrequency (%)
0 32496
35.1%
_ 17968
19.4%
1 15184
16.4%
2 13931
15.0%
4 2241
 
2.4%
3 2224
 
2.4%
5 2093
 
2.3%
7 1928
 
2.1%
8 1892
 
2.0%
6 1624
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 130853
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 32496
24.8%
_ 17968
13.7%
1 15184
11.6%
2 13931
10.6%
T 10714
 
8.2%
D 8690
 
6.6%
K 4197
 
3.2%
B 3663
 
2.8%
4 2241
 
1.7%
3 2224
 
1.7%
Other values (25) 19545
14.9%

표항목인식번호
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9999 
16
 
1

Length

Max length4
Median length4
Mean length3.9998
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9999
> 99.9%
16 1
 
< 0.1%

Length

2023-12-11T06:50:36.244644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:50:36.329329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9999
> 99.9%
16 1
 
< 0.1%

최종변경일
Real number (ℝ)

MISSING 

Distinct508
Distinct (%)5.7%
Missing1065
Missing (%)10.7%
Infinite0
Infinite (%)0.0%
Mean20167477
Minimum20090128
Maximum20230913
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T06:50:36.419795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20090128
5-th percentile20110108
Q120150324
median20150803
Q320210208
95-th percentile20230118
Maximum20230913
Range140785
Interquartile range (IQR)59884

Descriptive statistics

Standard deviation38672.393
Coefficient of variation (CV)0.0019175623
Kurtosis-1.2755686
Mean20167477
Median Absolute Deviation (MAD)30183
Skewness0.040355397
Sum1.8019641 × 1011
Variance1.495554 × 109
MonotonicityNot monotonic
2023-12-11T06:50:36.544420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20150625 865
 
8.6%
20210209 826
 
8.3%
20210208 481
 
4.8%
20101203 363
 
3.6%
20150624 278
 
2.8%
20121122 251
 
2.5%
20150626 243
 
2.4%
20121121 168
 
1.7%
20200421 146
 
1.5%
20150728 136
 
1.4%
Other values (498) 5178
51.8%
(Missing) 1065
 
10.7%
ValueCountFrequency (%)
20090128 5
 
0.1%
20101203 363
3.6%
20101220 5
 
0.1%
20101230 3
 
< 0.1%
20101231 2
 
< 0.1%
20110103 3
 
< 0.1%
20110104 2
 
< 0.1%
20110105 2
 
< 0.1%
20110106 28
 
0.3%
20110107 21
 
0.2%
ValueCountFrequency (%)
20230913 33
0.3%
20230912 25
0.2%
20230911 15
0.1%
20230908 6
 
0.1%
20230901 30
0.3%
20230831 18
0.2%
20230824 19
0.2%
20230822 8
 
0.1%
20230810 15
0.1%
20230809 7
 
0.1%

Interactions

2023-12-11T06:50:35.214485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:50:36.630324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
최종변경일
최종변경일1.000
2023-12-11T06:50:36.695580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
최종변경일표항목인식번호
최종변경일1.000NaN
표항목인식번호NaN1.000

Missing values

2023-12-11T06:50:35.330886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:50:35.415099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

조직번호테이블아이디표항목인식번호최종변경일
3979210TX_210040196_BAK12<NA>20121122
7674210TX_210050414<NA><NA>
10614210DT_1N00003<NA>20150805
14933210DT_GGSTAT_A066<NA>20150924
15776210DT_210011_075<NA>20220207
19552210DT_20114_2018015<NA>20200421
17282210DT_21002I001_BK<NA>20210208
8099210DT_21002_J006_BK<NA>20210209
12366210DT_1M00007<NA>20150806
14756210DT_1R00003<NA>20170829
조직번호테이블아이디표항목인식번호최종변경일
10330210DT_1C00007_BK<NA>20150625
2038210TX_210020647<NA><NA>
22040210DT_21002_M028_BK<NA>20210209
4197210DT_1K00082<NA>20150806
9890210DT_1G00009<NA>20150713
16733210DT_21002_P024_1<NA>20180329
5000210DT_GSIA007<NA>20140109
284210DT_1N00017<NA>20101203
4521210TX_210020554<NA><NA>
20721210DT_220007_I008<NA>20220328

Duplicate rows

Most frequently occurring

조직번호테이블아이디표항목인식번호최종변경일# duplicates
247210DT_1J00002_BK<NA>2015062525
1611210DT_GG0001<NA>2011041125
1380210DT_21002_M021_BK<NA>2021020924
1388210DT_21002_M022_BK<NA>2021020921
498210DT_1P00004A_BK<NA>2015071618
499210DT_1P00004_BK<NA>2015062518
585210DT_1Q00013_4<NA>2018032818
874210DT_21002B003_BK<NA>2021020818
501210DT_1P00005<NA>2015080317
610210DT_1R00005_2<NA>2017082917