Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 100 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 5.2 KiB |
Average record size in memory | 53.3 B |
Variable types
Categorical | 4 |
---|---|
Text | 1 |
Numeric | 1 |
Dataset
Description | Sample |
---|---|
Author | 국립중앙도서관 |
URL | https://www.bigdata-culture.kr/bigdata/user/data_market/detail.do?id=043fb520-1525-11ec-bbc0-d7035fffebeb |
anals_trget_day is highly overall correlated with anals_trget_year and 2 other fields | High correlation |
anals_trget_mt is highly overall correlated with anals_trget_year and 2 other fields | High correlation |
anals_trget_year is highly overall correlated with anals_trget_mt and 2 other fields | High correlation |
one_area_nm is highly overall correlated with anals_trget_year and 2 other fields | High correlation |
anals_trget_year is highly imbalanced (80.6%) | Imbalance |
anals_trget_mt is highly imbalanced (80.6%) | Imbalance |
anals_trget_day is highly imbalanced (80.6%) | Imbalance |
Reproduction
Analysis started | 2023-12-10 10:08:04.397987 |
---|---|
Analysis finished | 2023-12-10 10:08:05.656715 |
Duration | 1.26 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
anals_trget_year
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
2018 | |
---|---|
2020 | 3 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2018 |
---|---|
2nd row | 2020 |
3rd row | 2018 |
4th row | 2018 |
5th row | 2018 |
Common Values
Value | Count | Frequency (%) |
2018 | 97 | |
2020 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
2018 | 97 | |
2020 | 3 | 3.0% |
anals_trget_mt
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 | |
---|---|
12 | 3 |
Length
Max length | 2 |
---|---|
Median length | 1 |
Mean length | 1.03 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 12 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 97 | |
12 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 97 | |
12 | 3 | 3.0% |
anals_trget_day
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
31 | |
---|---|
30 | 3 |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 31 |
---|---|
2nd row | 30 |
3rd row | 31 |
4th row | 31 |
5th row | 31 |
Common Values
Value | Count | Frequency (%) |
31 | 97 | |
30 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
31 | 97 | |
30 | 3 | 3.0% |
one_area_nm
Categorical
HIGH CORRELATION
 
Distinct | 9 |
---|---|
Distinct (%) | 9.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
경기도 | |
---|---|
강원도 | |
부산광역시 | |
경상남도 | |
경상북도 | |
Other values (4) |
Length
Max length | 5 |
---|---|
Median length | 3 |
Mean length | 3.7 |
Min length | 3 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 2.0% |
Sample
1st row | 강원도 |
---|---|
2nd row | 충청북도 |
3rd row | 강원도 |
4th row | 강원도 |
5th row | 강원도 |
Common Values
Value | Count | Frequency (%) |
경기도 | 36 | |
강원도 | 16 | |
부산광역시 | 15 | |
경상남도 | 13 | 13.0% |
경상북도 | 10 | 10.0% |
광주광역시 | 5 | 5.0% |
충청북도 | 3 | 3.0% |
대구광역시 | 1 | 1.0% |
서울특별시 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
경기도 | 36 | |
강원도 | 16 | |
부산광역시 | 15 | |
경상남도 | 13 | 13.0% |
경상북도 | 10 | 10.0% |
광주광역시 | 5 | 5.0% |
충청북도 | 3 | 3.0% |
대구광역시 | 1 | 1.0% |
서울특별시 | 1 | 1.0% |
two_area_nm
Text
Distinct | 95 |
---|---|
Distinct (%) | 95.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
창원시 | 5 | 4.0% |
수원시 | 4 | 3.2% |
고양시 | 3 | 2.4% |
용인시 | 3 | 2.4% |
북구 | 3 | 2.4% |
남구 | 3 | 2.4% |
안양시 | 2 | 1.6% |
안산시 | 2 | 1.6% |
성남시 | 2 | 1.6% |
청주시 | 2 | 1.6% |
Other values (92) | 96 |
Most occurring characters
Value | Count | Frequency (%) |
시 | 61 | 15.3% |
구 | 49 | 12.3% |
25 | 6.3% | |
군 | 18 | 4.5% |
원 | 15 | 3.8% |
양 | 14 | 3.5% |
산 | 12 | 3.0% |
주 | 11 | 2.8% |
천 | 9 | 2.3% |
창 | 8 | 2.0% |
Other values (84) | 177 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 374 | |
Space Separator | 25 | 6.3% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
시 | 61 | 16.3% |
구 | 49 | 13.1% |
군 | 18 | 4.8% |
원 | 15 | 4.0% |
양 | 14 | 3.7% |
산 | 12 | 3.2% |
주 | 11 | 2.9% |
천 | 9 | 2.4% |
창 | 8 | 2.1% |
안 | 8 | 2.1% |
Other values (83) | 169 |
Space Separator
Value | Count | Frequency (%) |
25 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 374 | |
Common | 25 | 6.3% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
시 | 61 | 16.3% |
구 | 49 | 13.1% |
군 | 18 | 4.8% |
원 | 15 | 4.0% |
양 | 14 | 3.7% |
산 | 12 | 3.2% |
주 | 11 | 2.9% |
천 | 9 | 2.4% |
창 | 8 | 2.1% |
안 | 8 | 2.1% |
Other values (83) | 169 |
Common
Value | Count | Frequency (%) |
25 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 374 | |
ASCII | 25 | 6.3% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
시 | 61 | 16.3% |
구 | 49 | 13.1% |
군 | 18 | 4.8% |
원 | 15 | 4.0% |
양 | 14 | 3.7% |
산 | 12 | 3.2% |
주 | 11 | 2.9% |
천 | 9 | 2.4% |
창 | 8 | 2.1% |
안 | 8 | 2.1% |
Other values (83) | 169 |
ASCII
Value | Count | Frequency (%) |
25 |
lon_co
Real number (ℝ)
Distinct | 99 |
---|---|
Distinct (%) | 99.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2287.56 |
Minimum | 1 |
---|---|
Maximum | 11049 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 136.9 |
Q1 | 581 |
median | 1422.5 |
Q3 | 3253.5 |
95-th percentile | 8550.2 |
Maximum | 11049 |
Range | 11048 |
Interquartile range (IQR) | 2672.5 |
Descriptive statistics
Standard deviation | 2482.1448 |
---|---|
Coefficient of variation (CV) | 1.0850622 |
Kurtosis | 2.6925771 |
Mean | 2287.56 |
Median Absolute Deviation (MAD) | 1060 |
Skewness | 1.7307779 |
Sum | 228756 |
Variance | 6161043 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3243 | 2 | 2.0% |
587 | 1 | 1.0% |
1004 | 1 | 1.0% |
233 | 1 | 1.0% |
238 | 1 | 1.0% |
679 | 1 | 1.0% |
973 | 1 | 1.0% |
3101 | 1 | 1.0% |
1881 | 1 | 1.0% |
842 | 1 | 1.0% |
Other values (89) | 89 |
Value | Count | Frequency (%) |
1 | 1 | |
67 | 1 | |
79 | 1 | |
86 | 1 | |
97 | 1 | |
139 | 1 | |
157 | 1 | |
166 | 1 | |
168 | 1 | |
217 | 1 |
Value | Count | Frequency (%) |
11049 | 1 | |
10361 | 1 | |
9220 | 1 | |
8937 | 1 | |
8782 | 1 | |
8538 | 1 | |
7758 | 1 | |
7227 | 1 | |
6211 | 1 | |
6177 | 1 |
anals_trget_year | anals_trget_mt | anals_trget_day | one_area_nm | two_area_nm | lon_co | |
---|---|---|---|---|---|---|
anals_trget_year | 1.000 | 0.963 | 0.963 | 1.000 | 1.000 | 0.000 |
anals_trget_mt | 0.963 | 1.000 | 0.963 | 1.000 | 1.000 | 0.000 |
anals_trget_day | 0.963 | 0.963 | 1.000 | 1.000 | 1.000 | 0.000 |
one_area_nm | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.377 |
two_area_nm | 1.000 | 1.000 | 1.000 | 0.000 | 1.000 | 1.000 |
lon_co | 0.000 | 0.000 | 0.000 | 0.377 | 1.000 | 1.000 |
anals_trget_day | anals_trget_mt | anals_trget_year | one_area_nm | |
---|---|---|---|---|
anals_trget_day | 1.000 | 0.826 | 0.826 | 0.964 |
anals_trget_mt | 0.826 | 1.000 | 0.826 | 0.964 |
anals_trget_year | 0.826 | 0.826 | 1.000 | 0.964 |
one_area_nm | 0.964 | 0.964 | 0.964 | 1.000 |
lon_co | anals_trget_year | anals_trget_mt | anals_trget_day | one_area_nm | |
---|---|---|---|---|---|
lon_co | 1.000 | 0.000 | 0.000 | 0.000 | 0.178 |
anals_trget_year | 0.000 | 1.000 | 0.826 | 0.826 | 0.964 |
anals_trget_mt | 0.000 | 0.826 | 1.000 | 0.826 | 0.964 |
anals_trget_day | 0.000 | 0.826 | 0.826 | 1.000 | 0.964 |
one_area_nm | 0.178 | 0.964 | 0.964 | 0.964 | 1.000 |
anals_trget_year | anals_trget_mt | anals_trget_day | one_area_nm | two_area_nm | lon_co | |
---|---|---|---|---|---|---|
0 | 2018 | 1 | 31 | 강원도 | 강릉시 | 587 |
1 | 2020 | 12 | 30 | 충청북도 | 청주시 청원구 | 1943 |
2 | 2018 | 1 | 31 | 강원도 | 동해시 | 990 |
3 | 2018 | 1 | 31 | 강원도 | 삼척시 | 446 |
4 | 2018 | 1 | 31 | 강원도 | 속초시 | 802 |
5 | 2018 | 1 | 31 | 강원도 | 양구군 | 270 |
6 | 2018 | 1 | 31 | 강원도 | 양양군 | 79 |
7 | 2020 | 12 | 30 | 충청북도 | 청주시 흥덕구 | 1761 |
8 | 2018 | 1 | 31 | 강원도 | 원주시 | 3316 |
9 | 2018 | 1 | 31 | 강원도 | 인제군 | 168 |
anals_trget_year | anals_trget_mt | anals_trget_day | one_area_nm | two_area_nm | lon_co | |
---|---|---|---|---|---|---|
90 | 2018 | 1 | 31 | 부산광역시 | 부산진구 | 3314 |
91 | 2018 | 1 | 31 | 부산광역시 | 북구 | 2957 |
92 | 2018 | 1 | 31 | 부산광역시 | 사상구 | 495 |
93 | 2018 | 1 | 31 | 부산광역시 | 사하구 | 391 |
94 | 2018 | 1 | 31 | 부산광역시 | 서구 | 352 |
95 | 2018 | 1 | 31 | 부산광역시 | 연제구 | 688 |
96 | 2018 | 1 | 31 | 부산광역시 | 영도구 | 783 |
97 | 2018 | 1 | 31 | 부산광역시 | 중구 | 387 |
98 | 2018 | 1 | 31 | 부산광역시 | 해운대구 | 4922 |
99 | 2018 | 1 | 31 | 서울특별시 | 강남구 | 8782 |