Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells4470
Missing cells (%)8.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description경기도 경기통계시스템 주기
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=9JRU77ZKD8EZ22XTHZ8A33463563&infSeq=1

Alerts

조직번호 has constant value ""Constant
수록시점 is highly overall correlated with 최종변경일High correlation
최종변경일 is highly overall correlated with 수록시점High correlation
주기구분 is highly imbalanced (57.2%)Imbalance
최종변경일 has 4470 (44.7%) missing valuesMissing
수록시점 is highly skewed (γ1 = 36.415637)Skewed

Reproduction

Analysis started2023-12-10 21:23:26.551835
Analysis finished2023-12-10 21:23:27.427539
Duration0.88 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

조직번호
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
210
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row210
2nd row210
3rd row210
4th row210
5th row210

Common Values

ValueCountFrequency (%)
210 10000
100.0%

Length

2023-12-11T06:23:27.696437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:23:27.774372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
210 10000
100.0%
Distinct3532
Distinct (%)35.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T06:23:27.959099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length22
Mean length12.6128
Min length6

Characters and Unicode

Total characters126128
Distinct characters35
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1503 ?
Unique (%)15.0%

Sample

1st rowDT_21002_P013_BK
2nd rowDT_1K00004_BK
3rd rowDT_1E00032
4th rowDT_1C00003_BK
5th rowDT_21002_P012_BK
ValueCountFrequency (%)
dt_1a00004 69
 
0.7%
dt_21002h005_4_bk 61
 
0.6%
dt_21002h006_4_bk 54
 
0.5%
dt_21002h010_bk 48
 
0.5%
dt_2020037_006 48
 
0.5%
dt_21002b011_bk 47
 
0.5%
dt_21002h003_bk 46
 
0.5%
dt_21002h009_bk 44
 
0.4%
dt_21002_j001_bk 44
 
0.4%
dt_1f00007 44
 
0.4%
Other values (3522) 9495
95.0%
2023-12-11T06:23:28.298899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 31122
24.7%
_ 16575
13.1%
1 14544
11.5%
2 11783
 
9.3%
T 11002
 
8.7%
D 9834
 
7.8%
B 3799
 
3.0%
K 3753
 
3.0%
3 2368
 
1.9%
4 2279
 
1.8%
Other values (25) 19069
15.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 68347
54.2%
Uppercase Letter 41206
32.7%
Connector Punctuation 16575
 
13.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 11002
26.7%
D 9834
23.9%
B 3799
 
9.2%
K 3753
 
9.1%
A 1792
 
4.3%
S 1182
 
2.9%
G 1174
 
2.8%
E 1059
 
2.6%
I 936
 
2.3%
M 923
 
2.2%
Other values (14) 5752
14.0%
Decimal Number
ValueCountFrequency (%)
0 31122
45.5%
1 14544
21.3%
2 11783
 
17.2%
3 2368
 
3.5%
4 2279
 
3.3%
5 1594
 
2.3%
6 1255
 
1.8%
7 1253
 
1.8%
8 1215
 
1.8%
9 934
 
1.4%
Connector Punctuation
ValueCountFrequency (%)
_ 16575
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 84922
67.3%
Latin 41206
32.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 11002
26.7%
D 9834
23.9%
B 3799
 
9.2%
K 3753
 
9.1%
A 1792
 
4.3%
S 1182
 
2.9%
G 1174
 
2.8%
E 1059
 
2.6%
I 936
 
2.3%
M 923
 
2.2%
Other values (14) 5752
14.0%
Common
ValueCountFrequency (%)
0 31122
36.6%
_ 16575
19.5%
1 14544
17.1%
2 11783
 
13.9%
3 2368
 
2.8%
4 2279
 
2.7%
5 1594
 
1.9%
6 1255
 
1.5%
7 1253
 
1.5%
8 1215
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 126128
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 31122
24.7%
_ 16575
13.1%
1 14544
11.5%
2 11783
 
9.3%
T 11002
 
8.7%
D 9834
 
7.8%
B 3799
 
3.0%
K 3753
 
3.0%
3 2368
 
1.9%
4 2279
 
1.8%
Other values (25) 19069
15.1%

주기구분
Categorical

IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Y
7517 
M
2117 
F
 
243
Q
 
119
H
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowY
3rd rowY
4th rowY
5th rowM

Common Values

ValueCountFrequency (%)
Y 7517
75.2%
M 2117
 
21.2%
F 243
 
2.4%
Q 119
 
1.2%
H 4
 
< 0.1%

Length

2023-12-11T06:23:28.438610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:23:28.551176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
y 7517
75.2%
m 2117
 
21.2%
f 243
 
2.4%
q 119
 
1.2%
h 4
 
< 0.1%

수록시점
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct567
Distinct (%)5.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60545.299
Minimum1925
Maximum20050401
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T06:23:28.654972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1925
5-th percentile1971
Q11997
median2007
Q32020
95-th percentile201608
Maximum20050401
Range20048476
Interquartile range (IQR)23

Descriptive statistics

Standard deviation535168.44
Coefficient of variation (CV)8.8391412
Kurtosis1356.6712
Mean60545.299
Median Absolute Deviation (MAD)11
Skewness36.415637
Sum6.0545299 × 108
Variance2.8640526 × 1011
MonotonicityNot monotonic
2023-12-11T06:23:28.780550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2003 316
 
3.2%
2001 306
 
3.1%
2004 298
 
3.0%
2002 297
 
3.0%
2010 293
 
2.9%
2008 273
 
2.7%
2007 271
 
2.7%
2009 260
 
2.6%
2005 257
 
2.6%
2000 245
 
2.5%
Other values (557) 7184
71.8%
ValueCountFrequency (%)
1925 1
 
< 0.1%
1951 1
 
< 0.1%
1957 1
 
< 0.1%
1958 1
 
< 0.1%
1960 29
0.3%
1961 38
0.4%
1962 41
0.4%
1963 53
0.5%
1964 40
0.4%
1965 49
0.5%
ValueCountFrequency (%)
20050401 2
 
< 0.1%
20040101 1
 
< 0.1%
20030401 3
< 0.1%
20020401 1
 
< 0.1%
202306 1
 
< 0.1%
202305 2
 
< 0.1%
202304 5
0.1%
202303 5
0.1%
202302 4
< 0.1%
202301 5
0.1%

최종변경일
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct380
Distinct (%)6.9%
Missing4470
Missing (%)44.7%
Infinite0
Infinite (%)0.0%
Mean20174121
Minimum20070830
Maximum20230912
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T06:23:28.916793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20070830
5-th percentile20101203
Q120150324
median20200924
Q320210209
95-th percentile20220728
Maximum20230912
Range160082
Interquartile range (IQR)59885

Descriptive statistics

Standard deviation44594.109
Coefficient of variation (CV)0.0022104611
Kurtosis-1.2682978
Mean20174121
Median Absolute Deviation (MAD)19601
Skewness-0.56885699
Sum1.1156289 × 1011
Variance1.9886346 × 109
MonotonicityNot monotonic
2023-12-11T06:23:29.041715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20210208 1130
 
11.3%
20101203 834
 
8.3%
20210209 662
 
6.6%
20150625 461
 
4.6%
20150624 166
 
1.7%
20150626 141
 
1.4%
20121121 120
 
1.2%
20211015 111
 
1.1%
20121122 105
 
1.1%
20150324 71
 
0.7%
Other values (370) 1729
 
17.3%
(Missing) 4470
44.7%
ValueCountFrequency (%)
20070830 5
 
0.1%
20070902 1
 
< 0.1%
20080227 6
 
0.1%
20080228 3
 
< 0.1%
20090309 1
 
< 0.1%
20090327 15
0.1%
20100217 1
 
< 0.1%
20100326 1
 
< 0.1%
20100330 4
 
< 0.1%
20100408 5
 
0.1%
ValueCountFrequency (%)
20230912 2
 
< 0.1%
20230908 2
 
< 0.1%
20230901 1
 
< 0.1%
20230830 1
 
< 0.1%
20230824 3
 
< 0.1%
20230809 3
 
< 0.1%
20230728 3
 
< 0.1%
20230710 9
0.1%
20230707 4
< 0.1%
20230706 1
 
< 0.1%

Interactions

2023-12-11T06:23:27.044829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:23:26.786367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:23:27.141296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:23:26.902807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:23:29.146037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주기구분수록시점최종변경일
주기구분1.0000.1360.551
수록시점0.1361.000NaN
최종변경일0.551NaN1.000
2023-12-11T06:23:29.230413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수록시점최종변경일주기구분
수록시점1.0000.6200.167
최종변경일0.6201.0000.260
주기구분0.1670.2601.000

Missing values

2023-12-11T06:23:27.283040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:23:27.377456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

조직번호통계표ID주기구분수록시점최종변경일
50472210DT_21002_P013_BKM20100120210209
48325210DT_1K00004_BKY199620150625
5929210DT_1E00032Y198720101203
12526210DT_1C00003_BKY201020150625
28002210DT_21002_P012_BKM20121120210209
60162210DT_21002H006_4_BKM20041120210208
5231210TX_210020633Y2004<NA>
34501210DT_2020037_005M20201220220824
6629210DT_1K00331Y199720101203
35233210DT_1E00033_BKY199420150728
조직번호통계표ID주기구분수록시점최종변경일
57409210DT_21002H005_BKM20100320210208
54417210DT_21002I009_BKY201620210209
36165210DT_21002_K002Y2014<NA>
37496210DT_2021057_1_8F201520210405
60748210DT_21002_J016_BKY201520210209
56020210DT_21002H005_4_BKM19990520210208
36405210DT_210J0013Y2000<NA>
49535210DT_21002C002_BKQ20060120210208
48797210DT_21002E048Y201920210426
49257210DT_1MB0001Y1975<NA>