Overview

Dataset statistics

Number of variables7
Number of observations209
Missing cells10
Missing cells (%)0.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory12.6 KiB
Average record size in memory61.6 B

Variable types

Categorical4
Text1
Numeric2

Dataset

Description경기도 경기통계시스템 추출 통계표수록지점
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=TCCJ2MYDU86J86GP1V1133521570&infSeq=1

Alerts

조직번호 has constant value ""Constant
공표구분 is highly overall correlated with 수록시점 and 3 other fieldsHigh correlation
수집유형 is highly overall correlated with 수록시점 and 3 other fieldsHigh correlation
주기구분 is highly overall correlated with 수록시점 and 2 other fieldsHigh correlation
수록시점 is highly overall correlated with 주기구분 and 2 other fieldsHigh correlation
최종수정일 is highly overall correlated with 수집유형 and 1 other fieldsHigh correlation
수집유형 is highly imbalanced (83.7%)Imbalance
공표구분 is highly imbalanced (83.7%)Imbalance
최종수정일 has 10 (4.8%) missing valuesMissing

Reproduction

Analysis started2023-12-10 21:16:15.164808
Analysis finished2023-12-10 21:16:15.801203
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

조직번호
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
210
209 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row210
2nd row210
3rd row210
4th row210
5th row210

Common Values

ValueCountFrequency (%)
210 209
100.0%

Length

2023-12-11T06:16:15.854656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:16:15.937911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
210 209
100.0%
Distinct112
Distinct (%)53.6%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2023-12-11T06:16:16.139506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length13.736842
Min length11

Characters and Unicode

Total characters2871
Distinct characters30
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)43.5%

Sample

1st rowDT_21002_M023
2nd rowDT_21002_M023
3rd rowDT_21002_M023
4th rowDT_21002_M023
5th rowDT_21002_M023
ValueCountFrequency (%)
dt_21002a004_1 28
 
13.4%
dt_21002_k010 9
 
4.3%
dt_21002_m023 8
 
3.8%
dt_21002_p001 8
 
3.8%
dt_statm_0008 8
 
3.8%
dt_statm_0021 8
 
3.8%
dt_21002_j010 7
 
3.3%
dt_2020037_005 5
 
2.4%
dt_21002_j001_1 5
 
2.4%
dt_2020037_006 5
 
2.4%
Other values (102) 118
56.5%
2023-12-11T06:16:16.484872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 761
26.5%
_ 450
15.7%
2 401
14.0%
1 327
11.4%
T 241
 
8.4%
D 214
 
7.5%
7 73
 
2.5%
4 48
 
1.7%
5 48
 
1.7%
A 46
 
1.6%
Other values (20) 262
 
9.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1754
61.1%
Uppercase Letter 667
 
23.2%
Connector Punctuation 450
 
15.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 241
36.1%
D 214
32.1%
A 46
 
6.9%
M 36
 
5.4%
K 20
 
3.0%
S 16
 
2.4%
J 16
 
2.4%
P 15
 
2.2%
N 13
 
1.9%
E 9
 
1.3%
Other values (9) 41
 
6.1%
Decimal Number
ValueCountFrequency (%)
0 761
43.4%
2 401
22.9%
1 327
18.6%
7 73
 
4.2%
4 48
 
2.7%
5 48
 
2.7%
3 42
 
2.4%
8 23
 
1.3%
6 20
 
1.1%
9 11
 
0.6%
Connector Punctuation
ValueCountFrequency (%)
_ 450
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2204
76.8%
Latin 667
 
23.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 241
36.1%
D 214
32.1%
A 46
 
6.9%
M 36
 
5.4%
K 20
 
3.0%
S 16
 
2.4%
J 16
 
2.4%
P 15
 
2.2%
N 13
 
1.9%
E 9
 
1.3%
Other values (9) 41
 
6.1%
Common
ValueCountFrequency (%)
0 761
34.5%
_ 450
20.4%
2 401
18.2%
1 327
14.8%
7 73
 
3.3%
4 48
 
2.2%
5 48
 
2.2%
3 42
 
1.9%
8 23
 
1.0%
6 20
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2871
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 761
26.5%
_ 450
15.7%
2 401
14.0%
1 327
11.4%
T 241
 
8.4%
D 214
 
7.5%
7 73
 
2.5%
4 48
 
1.7%
5 48
 
1.7%
A 46
 
1.6%
Other values (20) 262
 
9.1%

주기구분
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
Y
115 
F
48 
M
42 
Q
 
3
H
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st rowY
2nd rowY
3rd rowY
4th rowY
5th rowY

Common Values

ValueCountFrequency (%)
Y 115
55.0%
F 48
23.0%
M 42
 
20.1%
Q 3
 
1.4%
H 1
 
0.5%

Length

2023-12-11T06:16:16.614973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:16:16.710308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
y 115
55.0%
f 48
23.0%
m 42
 
20.1%
q 3
 
1.4%
h 1
 
0.5%

수록시점
Real number (ℝ)

HIGH CORRELATION 

Distinct61
Distinct (%)29.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45934.081
Minimum2001
Maximum202211
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB
2023-12-11T06:16:16.826922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2001
5-th percentile2004.4
Q12018
median2020
Q32021
95-th percentile202011.6
Maximum202211
Range200210
Interquartile range (IQR)3

Descriptive statistics

Standard deviation82869.759
Coefficient of variation (CV)1.8041018
Kurtosis-0.14914667
Mean45934.081
Median Absolute Deviation (MAD)2
Skewness1.3609996
Sum9600223
Variance6.867397 × 109
MonotonicityNot monotonic
2023-12-11T06:16:16.975953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2020 48
23.0%
2021 24
 
11.5%
2018 24
 
11.5%
2019 13
 
6.2%
2010 5
 
2.4%
2011 5
 
2.4%
2022 5
 
2.4%
2017 5
 
2.4%
2005 4
 
1.9%
2004 4
 
1.9%
Other values (51) 72
34.4%
ValueCountFrequency (%)
2001 2
 
1.0%
2002 2
 
1.0%
2003 3
1.4%
2004 4
1.9%
2005 4
1.9%
2006 1
 
0.5%
2007 2
 
1.0%
2008 4
1.9%
2009 3
1.4%
2010 5
2.4%
ValueCountFrequency (%)
202211 2
1.0%
202210 2
1.0%
202209 2
1.0%
202208 2
1.0%
202207 2
1.0%
202012 1
0.5%
202011 1
0.5%
202010 1
0.5%
202009 1
0.5%
202008 1
0.5%

수집유형
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
1212713
204 
<NA>
 
5

Length

Max length7
Median length7
Mean length6.9282297
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1212713
2nd row1212713
3rd row1212713
4th row1212713
5th row1212713

Common Values

ValueCountFrequency (%)
1212713 204
97.6%
<NA> 5
 
2.4%

Length

2023-12-11T06:16:17.117888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:16:17.207568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1212713 204
97.6%
na 5
 
2.4%

공표구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
1210110
204 
<NA>
 
5

Length

Max length7
Median length7
Mean length6.9282297
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1210110
2nd row1210110
3rd row1210110
4th row1210110
5th row1210110

Common Values

ValueCountFrequency (%)
1210110 204
97.6%
<NA> 5
 
2.4%

Length

2023-12-11T06:16:17.302663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:16:17.397705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1210110 204
97.6%
na 5
 
2.4%

최종수정일
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct46
Distinct (%)23.1%
Missing10
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean20205068
Minimum20131231
Maximum20230412
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 KiB
2023-12-11T06:16:17.524377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20131231
5-th percentile20131231
Q120200509
median20220603
Q320221168
95-th percentile20230406
Maximum20230412
Range99181
Interquartile range (IQR)20659

Descriptive statistics

Standard deviation31744.737
Coefficient of variation (CV)0.0015711274
Kurtosis1.3334663
Mean20205068
Median Absolute Deviation (MAD)9720
Skewness-1.6709603
Sum4.0208086 × 109
Variance1.0077283 × 109
MonotonicityNot monotonic
2023-12-11T06:16:17.660048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
20131231 28
 
13.4%
20220603 18
 
8.6%
20230406 17
 
8.1%
20200509 12
 
5.7%
20220602 11
 
5.3%
20211221 9
 
4.3%
20230323 9
 
4.3%
20220919 8
 
3.8%
20200508 8
 
3.8%
20221212 7
 
3.3%
Other values (36) 72
34.4%
(Missing) 10
 
4.8%
ValueCountFrequency (%)
20131231 28
13.4%
20170216 2
 
1.0%
20190417 1
 
0.5%
20200508 8
 
3.8%
20200509 12
5.7%
20200511 2
 
1.0%
20200515 1
 
0.5%
20200623 1
 
0.5%
20200831 6
 
2.9%
20201113 2
 
1.0%
ValueCountFrequency (%)
20230412 1
 
0.5%
20230406 17
8.1%
20230329 4
 
1.9%
20230328 1
 
0.5%
20230323 9
4.3%
20230103 1
 
0.5%
20221220 1
 
0.5%
20221219 1
 
0.5%
20221213 1
 
0.5%
20221212 7
3.3%

Interactions

2023-12-11T06:16:15.478729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:16:15.321864image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:16:15.570453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:16:15.404496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:16:17.739898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주기구분수록시점최종수정일
주기구분1.0001.0000.553
수록시점1.0001.0000.766
최종수정일0.5530.7661.000
2023-12-11T06:16:17.823599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공표구분수집유형주기구분
공표구분1.0001.0001.000
수집유형1.0001.0001.000
주기구분1.0001.0001.000
2023-12-11T06:16:17.905420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수록시점최종수정일주기구분수집유형공표구분
수록시점1.0000.0250.9931.0001.000
최종수정일0.0251.0000.4951.0001.000
주기구분0.9930.4951.0001.0001.000
수집유형1.0001.0001.0001.0001.000
공표구분1.0001.0001.0001.0001.000

Missing values

2023-12-11T06:16:15.669933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:16:15.763597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

조직번호통계표 테이블 ID주기구분수록시점수집유형공표구분최종수정일
0210DT_21002_M023Y20051212713121011020221102
1210DT_21002_M023Y20081212713121011020221102
2210DT_21002_M023Y20101212713121011020221103
3210DT_21002_M023Y20111212713121011020221103
4210DT_21002_M023Y20191212713121011020221210
5210DT_21002_M023Y20201212713121011020221210
6210DT_21002_N001Y20081212713121011020221104
7210DT_2021057_2_1F20211212713121011020230406
8210DT_21002_N001Y20191212713121011020221104
9210DT_21002_N001Y20201212713121011020221210
조직번호통계표 테이블 ID주기구분수록시점수집유형공표구분최종수정일
199210DT_21002_J001_1Y20091212713121011020220919
200210DT_21002_J001_1Y20101212713121011020220919
201210DT_21002_O005Y20181212713121011020221212
202210DT_21002_J001_1Y20111212713121011020220919
203210DT_21002_J001_1Y20121212713121011020220919
204210DT_21002_J001_1Y20131212713121011020220919
205210DT_21002_O005Y20191212713121011020221212
206210DT_21002_O005Y20201212713121011020221212
207210DT_21002_M023Y20031212713121011020221102
208210DT_21002_M023Y20041212713121011020221102