Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 10000 |
Missing cells | 1065 |
Missing cells (%) | 2.7% |
Duplicate rows | 2125 |
Duplicate rows (%) | 21.2% |
Total size in memory | 419.9 KiB |
Average record size in memory | 43.0 B |
Variable types
Categorical | 2 |
---|---|
Text | 1 |
Numeric | 1 |
Dataset
Description | 경기도 경기통계시스템 주석 |
---|---|
Author | 경기도 |
URL | https://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=IIFR8SCIUSBQJF8B2C1S33484661&infSeq=1 |
조직번호 has constant value "" | Constant |
Dataset has 2125 (21.2%) duplicate rows | Duplicates |
표항목인식번호 is highly imbalanced (99.9%) | Imbalance |
최종변경일 has 1065 (10.7%) missing values | Missing |
Reproduction
Analysis started | 2023-12-10 21:50:34.844346 |
---|---|
Analysis finished | 2023-12-10 21:50:35.451538 |
Duration | 0.61 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
조직번호
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
210 |
---|
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 3 |
Min length | 3 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 210 |
---|---|
2nd row | 210 |
3rd row | 210 |
4th row | 210 |
5th row | 210 |
Common Values
Value | Count | Frequency (%) |
210 | 10000 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
210 | 10000 |
테이블아이디
Text
Distinct | 3410 |
---|---|
Distinct (%) | 34.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Length
Max length | 26 |
---|---|
Median length | 22 |
Mean length | 13.0853 |
Min length | 6 |
Characters and Unicode
Total characters | 130853 |
---|---|
Distinct characters | 35 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 1332 ? |
---|---|
Unique (%) | 13.3% |
Sample
1st row | TX_210040196_BAK12 |
---|---|
2nd row | TX_210050414 |
3rd row | DT_1N00003 |
4th row | DT_GGSTAT_A066 |
5th row | DT_210011_075 |
Value | Count | Frequency (%) |
dt_21002_m022 | 35 | 0.4% |
dt_gg0001 | 35 | 0.4% |
dt_21002_m021 | 26 | 0.3% |
dt_1j00002_bk | 25 | 0.2% |
dt_21002_m021_bk | 24 | 0.2% |
dt_21002_m022_bk | 21 | 0.2% |
dt_1r00003 | 21 | 0.2% |
dt_1r00005_2 | 21 | 0.2% |
dt_1p00005 | 20 | 0.2% |
dt_1p00009 | 20 | 0.2% |
Other values (3400) | 9752 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 32496 | |
_ | 17968 | |
1 | 15184 | |
2 | 13931 | |
T | 10714 | 8.2% |
D | 8690 | 6.6% |
K | 4197 | 3.2% |
B | 3663 | 2.8% |
4 | 2241 | 1.7% |
3 | 2224 | 1.7% |
Other values (25) | 19545 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 74725 | |
Uppercase Letter | 38160 | |
Connector Punctuation | 17968 | 13.7% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
T | 10714 | |
D | 8690 | |
K | 4197 | 11.0% |
B | 3663 | 9.6% |
X | 1554 | 4.1% |
A | 1454 | 3.8% |
G | 1218 | 3.2% |
E | 1057 | 2.8% |
M | 974 | 2.6% |
P | 822 | 2.2% |
Other values (14) | 3817 | 10.0% |
Decimal Number
Value | Count | Frequency (%) |
0 | 32496 | |
1 | 15184 | |
2 | 13931 | |
4 | 2241 | 3.0% |
3 | 2224 | 3.0% |
5 | 2093 | 2.8% |
7 | 1928 | 2.6% |
8 | 1892 | 2.5% |
6 | 1624 | 2.2% |
9 | 1112 | 1.5% |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 17968 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 92693 | |
Latin | 38160 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
T | 10714 | |
D | 8690 | |
K | 4197 | 11.0% |
B | 3663 | 9.6% |
X | 1554 | 4.1% |
A | 1454 | 3.8% |
G | 1218 | 3.2% |
E | 1057 | 2.8% |
M | 974 | 2.6% |
P | 822 | 2.2% |
Other values (14) | 3817 | 10.0% |
Common
Value | Count | Frequency (%) |
0 | 32496 | |
_ | 17968 | |
1 | 15184 | |
2 | 13931 | |
4 | 2241 | 2.4% |
3 | 2224 | 2.4% |
5 | 2093 | 2.3% |
7 | 1928 | 2.1% |
8 | 1892 | 2.0% |
6 | 1624 | 1.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 130853 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 32496 | |
_ | 17968 | |
1 | 15184 | |
2 | 13931 | |
T | 10714 | 8.2% |
D | 8690 | 6.6% |
K | 4197 | 3.2% |
B | 3663 | 2.8% |
4 | 2241 | 1.7% |
3 | 2224 | 1.7% |
Other values (25) | 19545 |
표항목인식번호
Categorical
IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
<NA> | |
---|---|
16 | 1 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 3.9998 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | <NA> |
---|---|
2nd row | <NA> |
3rd row | <NA> |
4th row | <NA> |
5th row | <NA> |
Common Values
Value | Count | Frequency (%) |
<NA> | 9999 | |
16 | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
na | 9999 | |
16 | 1 | < 0.1% |
최종변경일
Real number (ℝ)
MISSING
 
Distinct | 508 |
---|---|
Distinct (%) | 5.7% |
Missing | 1065 |
Missing (%) | 10.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20167477 |
Minimum | 20090128 |
---|---|
Maximum | 20230913 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 20090128 |
---|---|
5-th percentile | 20110108 |
Q1 | 20150324 |
median | 20150803 |
Q3 | 20210208 |
95-th percentile | 20230118 |
Maximum | 20230913 |
Range | 140785 |
Interquartile range (IQR) | 59884 |
Descriptive statistics
Standard deviation | 38672.393 |
---|---|
Coefficient of variation (CV) | 0.0019175623 |
Kurtosis | -1.2755686 |
Mean | 20167477 |
Median Absolute Deviation (MAD) | 30183 |
Skewness | 0.040355397 |
Sum | 1.8019641 × 1011 |
Variance | 1.495554 × 109 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20150625 | 865 | 8.6% |
20210209 | 826 | 8.3% |
20210208 | 481 | 4.8% |
20101203 | 363 | 3.6% |
20150624 | 278 | 2.8% |
20121122 | 251 | 2.5% |
20150626 | 243 | 2.4% |
20121121 | 168 | 1.7% |
20200421 | 146 | 1.5% |
20150728 | 136 | 1.4% |
Other values (498) | 5178 | |
(Missing) | 1065 | 10.7% |
Value | Count | Frequency (%) |
20090128 | 5 | 0.1% |
20101203 | 363 | |
20101220 | 5 | 0.1% |
20101230 | 3 | < 0.1% |
20101231 | 2 | < 0.1% |
20110103 | 3 | < 0.1% |
20110104 | 2 | < 0.1% |
20110105 | 2 | < 0.1% |
20110106 | 28 | 0.3% |
20110107 | 21 | 0.2% |
Value | Count | Frequency (%) |
20230913 | 33 | |
20230912 | 25 | |
20230911 | 15 | |
20230908 | 6 | 0.1% |
20230901 | 30 | |
20230831 | 18 | |
20230824 | 19 | |
20230822 | 8 | 0.1% |
20230810 | 15 | |
20230809 | 7 | 0.1% |
최종변경일 | |
---|---|
최종변경일 | 1.000 |
최종변경일 | 표항목인식번호 | |
---|---|---|
최종변경일 | 1.000 | NaN |
표항목인식번호 | NaN | 1.000 |
조직번호 | 테이블아이디 | 표항목인식번호 | 최종변경일 | |
---|---|---|---|---|
3979 | 210 | TX_210040196_BAK12 | <NA> | 20121122 |
7674 | 210 | TX_210050414 | <NA> | <NA> |
10614 | 210 | DT_1N00003 | <NA> | 20150805 |
14933 | 210 | DT_GGSTAT_A066 | <NA> | 20150924 |
15776 | 210 | DT_210011_075 | <NA> | 20220207 |
19552 | 210 | DT_20114_2018015 | <NA> | 20200421 |
17282 | 210 | DT_21002I001_BK | <NA> | 20210208 |
8099 | 210 | DT_21002_J006_BK | <NA> | 20210209 |
12366 | 210 | DT_1M00007 | <NA> | 20150806 |
14756 | 210 | DT_1R00003 | <NA> | 20170829 |
조직번호 | 테이블아이디 | 표항목인식번호 | 최종변경일 | |
---|---|---|---|---|
10330 | 210 | DT_1C00007_BK | <NA> | 20150625 |
2038 | 210 | TX_210020647 | <NA> | <NA> |
22040 | 210 | DT_21002_M028_BK | <NA> | 20210209 |
4197 | 210 | DT_1K00082 | <NA> | 20150806 |
9890 | 210 | DT_1G00009 | <NA> | 20150713 |
16733 | 210 | DT_21002_P024_1 | <NA> | 20180329 |
5000 | 210 | DT_GSIA007 | <NA> | 20140109 |
284 | 210 | DT_1N00017 | <NA> | 20101203 |
4521 | 210 | TX_210020554 | <NA> | <NA> |
20721 | 210 | DT_220007_I008 | <NA> | 20220328 |
Most frequently occurring
조직번호 | 테이블아이디 | 표항목인식번호 | 최종변경일 | # duplicates | |
---|---|---|---|---|---|
247 | 210 | DT_1J00002_BK | <NA> | 20150625 | 25 |
1611 | 210 | DT_GG0001 | <NA> | 20110411 | 25 |
1380 | 210 | DT_21002_M021_BK | <NA> | 20210209 | 24 |
1388 | 210 | DT_21002_M022_BK | <NA> | 20210209 | 21 |
498 | 210 | DT_1P00004A_BK | <NA> | 20150716 | 18 |
499 | 210 | DT_1P00004_BK | <NA> | 20150625 | 18 |
585 | 210 | DT_1Q00013_4 | <NA> | 20180328 | 18 |
874 | 210 | DT_21002B003_BK | <NA> | 20210208 | 18 |
501 | 210 | DT_1P00005 | <NA> | 20150803 | 17 |
610 | 210 | DT_1R00005_2 | <NA> | 20170829 | 17 |