Dataset statistics
Number of variables | 10 |
---|---|
Number of observations | 100 |
Missing cells | 28 |
Missing cells (%) | 2.8% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 8.5 KiB |
Average record size in memory | 87.3 B |
Variable types
Categorical | 8 |
---|---|
Text | 1 |
Numeric | 1 |
Dataset
Description | Sample |
---|---|
Author | 부산정보산업진흥원 |
URL | https://www.bigdata-culture.kr/bigdata/user/data_market/detail.do?id=ed611b55-4ee2-4345-bb1e-6f801589c7a8 |
base_month has constant value "" | Constant |
ticket_dt is highly overall correlated with base_year and 2 other fields | High correlation |
base_day is highly overall correlated with ticket_dt and 2 other fields | High correlation |
base_year is highly overall correlated with ticket_dt and 2 other fields | High correlation |
sido_dc is highly overall correlated with age_dc and 1 other fields | High correlation |
gugun_dc is highly overall correlated with ticket_dt and 3 other fields | High correlation |
age_dc is highly overall correlated with sido_dc | High correlation |
ticket_dt is highly imbalanced (80.6%) | Imbalance |
base_year is highly imbalanced (80.6%) | Imbalance |
base_day is highly imbalanced (80.6%) | Imbalance |
sido_dc is highly imbalanced (75.8%) | Imbalance |
ticket_cnt is highly imbalanced (64.2%) | Imbalance |
dong_dc has 28 (28.0%) missing values | Missing |
Reproduction
Analysis started | 2023-12-10 10:00:45.333598 |
---|---|
Analysis finished | 2023-12-10 10:00:47.177762 |
Duration | 1.84 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
ticket_dt
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
20180118 | |
---|---|
20200121 | 3 |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 20180118 |
---|---|
2nd row | 20200121 |
3rd row | 20180118 |
4th row | 20180118 |
5th row | 20180118 |
Common Values
Value | Count | Frequency (%) |
20180118 | 97 | |
20200121 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
20180118 | 97 | |
20200121 | 3 | 3.0% |
base_year
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
2018 | |
---|---|
2020 | 3 |
Length
Max length | 4 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2018 |
---|---|
2nd row | 2020 |
3rd row | 2018 |
4th row | 2018 |
5th row | 2018 |
Common Values
Value | Count | Frequency (%) |
2018 | 97 | |
2020 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
2018 | 97 | |
2020 | 3 | 3.0% |
base_month
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 |
---|
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 100 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 100 |
base_day
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
18 | |
---|---|
21 | 3 |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 18 |
---|---|
2nd row | 21 |
3rd row | 18 |
4th row | 18 |
5th row | 18 |
Common Values
Value | Count | Frequency (%) |
18 | 97 | |
21 | 3 | 3.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
18 | 97 | |
21 | 3 | 3.0% |
sido_dc
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
부산광역시 | |
---|---|
경상남도 | 4 |
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 4.96 |
Min length | 4 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 경상남도 |
---|---|
2nd row | 부산광역시 |
3rd row | 경상남도 |
4th row | 경상남도 |
5th row | 경상남도 |
Common Values
Value | Count | Frequency (%) |
부산광역시 | 96 | |
경상남도 | 4 | 4.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
부산광역시 | 96 | |
경상남도 | 4 | 4.0% |
gugun_dc
Categorical
HIGH CORRELATION
 
Distinct | 11 |
---|---|
Distinct (%) | 11.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
부산진구 | |
---|---|
남구 | |
동래구 | |
금정구 | |
동구 | |
Other values (6) |
Length
Max length | 7 |
---|---|
Median length | 4 |
Mean length | 3.09 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 김해시 |
---|---|
2nd row | 해운대구 |
3rd row | 김해시 |
4th row | 김해시 |
5th row | 창원시 성산구 |
Common Values
Value | Count | Frequency (%) |
부산진구 | 32 | |
남구 | 21 | |
동래구 | 18 | |
금정구 | 8 | 8.0% |
동구 | 6 | 6.0% |
김해시 | 3 | 3.0% |
해운대구 | 3 | 3.0% |
기장군 | 3 | 3.0% |
북구 | 3 | 3.0% |
강서구 | 2 | 2.0% |
Length
Value | Count | Frequency (%) |
부산진구 | 32 | |
남구 | 21 | |
동래구 | 18 | |
금정구 | 8 | 7.9% |
동구 | 6 | 5.9% |
김해시 | 3 | 3.0% |
해운대구 | 3 | 3.0% |
기장군 | 3 | 3.0% |
북구 | 3 | 3.0% |
강서구 | 2 | 2.0% |
Other values (2) | 2 | 2.0% |
dong_dc
Text
MISSING
 
Distinct | 51 |
---|---|
Distinct (%) | 70.8% |
Missing | 28 |
Missing (%) | 28.0% |
Memory size | 932.0 B |
Value | Count | Frequency (%) |
안락제2동 | 3 | 4.2% |
부암제1동 | 3 | 4.2% |
양정제1동 | 3 | 4.2% |
초읍동 | 3 | 4.2% |
용호제2동 | 3 | 4.2% |
온천제3동 | 2 | 2.8% |
사직제2동 | 2 | 2.8% |
범천제2동 | 2 | 2.8% |
안락제1동 | 2 | 2.8% |
양정제2동 | 2 | 2.8% |
Other values (41) | 47 |
Most occurring characters
Value | Count | Frequency (%) |
동 | 70 | |
제 | 61 | |
2 | 26 | 7.7% |
1 | 22 | 6.5% |
3 | 10 | 3.0% |
범 | 8 | 2.4% |
대 | 7 | 2.1% |
부 | 6 | 1.8% |
연 | 6 | 1.8% |
안 | 6 | 1.8% |
Other values (40) | 115 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 275 | |
Decimal Number | 62 | 18.4% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
동 | 70 | |
제 | 61 | |
범 | 8 | 2.9% |
대 | 7 | 2.5% |
부 | 6 | 2.2% |
연 | 6 | 2.2% |
안 | 6 | 2.2% |
전 | 6 | 2.2% |
양 | 5 | 1.8% |
정 | 5 | 1.8% |
Other values (34) | 95 |
Decimal Number
Value | Count | Frequency (%) |
2 | 26 | |
1 | 22 | |
3 | 10 | 16.1% |
4 | 2 | 3.2% |
5 | 1 | 1.6% |
6 | 1 | 1.6% |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 275 | |
Common | 62 | 18.4% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
동 | 70 | |
제 | 61 | |
범 | 8 | 2.9% |
대 | 7 | 2.5% |
부 | 6 | 2.2% |
연 | 6 | 2.2% |
안 | 6 | 2.2% |
전 | 6 | 2.2% |
양 | 5 | 1.8% |
정 | 5 | 1.8% |
Other values (34) | 95 |
Common
Value | Count | Frequency (%) |
2 | 26 | |
1 | 22 | |
3 | 10 | 16.1% |
4 | 2 | 3.2% |
5 | 1 | 1.6% |
6 | 1 | 1.6% |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 275 | |
ASCII | 62 | 18.4% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
동 | 70 | |
제 | 61 | |
범 | 8 | 2.9% |
대 | 7 | 2.5% |
부 | 6 | 2.2% |
연 | 6 | 2.2% |
안 | 6 | 2.2% |
전 | 6 | 2.2% |
양 | 5 | 1.8% |
정 | 5 | 1.8% |
Other values (34) | 95 |
ASCII
Value | Count | Frequency (%) |
2 | 26 | |
1 | 22 | |
3 | 10 | 16.1% |
4 | 2 | 3.2% |
5 | 1 | 1.6% |
6 | 1 | 1.6% |
sex_dc
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 2.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
여 | |
---|---|
남 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 남 |
---|---|
2nd row | 남 |
3rd row | 남 |
4th row | 남 |
5th row | 남 |
Common Values
Value | Count | Frequency (%) |
여 | 54 | |
남 | 46 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
여 | 54 | |
남 | 46 |
age_dc
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 9 |
---|---|
Distinct (%) | 9.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 46.05 |
Minimum | 25 |
---|---|
Maximum | 65 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.0 KiB |
Quantile statistics
Minimum | 25 |
---|---|
5-th percentile | 25 |
Q1 | 40 |
median | 45 |
Q3 | 55 |
95-th percentile | 60 |
Maximum | 65 |
Range | 40 |
Interquartile range (IQR) | 15 |
Descriptive statistics
Standard deviation | 9.8804725 |
---|---|
Coefficient of variation (CV) | 0.21455966 |
Kurtosis | -0.32824197 |
Mean | 46.05 |
Median Absolute Deviation (MAD) | 5 |
Skewness | -0.33056118 |
Sum | 4605 |
Variance | 97.623737 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
50 | 21 | |
45 | 17 | |
40 | 16 | |
55 | 15 | |
35 | 11 | |
60 | 9 | |
25 | 7 | 7.0% |
65 | 3 | 3.0% |
30 | 1 | 1.0% |
Value | Count | Frequency (%) |
25 | 7 | 7.0% |
30 | 1 | 1.0% |
35 | 11 | |
40 | 16 | |
45 | 17 | |
50 | 21 | |
55 | 15 | |
60 | 9 | |
65 | 3 | 3.0% |
Value | Count | Frequency (%) |
65 | 3 | 3.0% |
60 | 9 | |
55 | 15 | |
50 | 21 | |
45 | 17 | |
40 | 16 | |
35 | 11 | |
30 | 1 | 1.0% |
25 | 7 | 7.0% |
ticket_cnt
Categorical
IMBALANCE
 
Distinct | 4 |
---|---|
Distinct (%) | 4.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 932.0 B |
1 | |
---|---|
2 | |
3 | 2 |
5 | 1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 1.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 86 | |
2 | 11 | 11.0% |
3 | 2 | 2.0% |
5 | 1 | 1.0% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 86 | |
2 | 11 | 11.0% |
3 | 2 | 2.0% |
5 | 1 | 1.0% |
ticket_dt | base_year | base_day | sido_dc | gugun_dc | dong_dc | sex_dc | age_dc | ticket_cnt | |
---|---|---|---|---|---|---|---|---|---|
ticket_dt | 1.000 | 0.963 | 0.963 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 |
base_year | 0.963 | 1.000 | 0.963 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 |
base_day | 0.963 | 0.963 | 1.000 | 0.000 | 1.000 | 1.000 | 0.000 | 0.000 | 0.000 |
sido_dc | 0.000 | 0.000 | 0.000 | 1.000 | 1.000 | 1.000 | 0.215 | 0.527 | 0.000 |
gugun_dc | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 0.105 | 0.464 | 0.000 |
dong_dc | 1.000 | 1.000 | 1.000 | 1.000 | 0.999 | 1.000 | 0.163 | 0.000 | 0.000 |
sex_dc | 0.000 | 0.000 | 0.000 | 0.215 | 0.105 | 0.163 | 1.000 | 0.295 | 0.000 |
age_dc | 0.000 | 0.000 | 0.000 | 0.527 | 0.464 | 0.000 | 0.295 | 1.000 | 0.000 |
ticket_cnt | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
ticket_dt | ticket_cnt | base_day | base_year | sido_dc | sex_dc | gugun_dc | |
---|---|---|---|---|---|---|---|
ticket_dt | 1.000 | 0.000 | 0.826 | 0.826 | 0.000 | 0.000 | 0.953 |
ticket_cnt | 0.000 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
base_day | 0.826 | 0.000 | 1.000 | 0.826 | 0.000 | 0.000 | 0.953 |
base_year | 0.826 | 0.000 | 0.826 | 1.000 | 0.000 | 0.000 | 0.953 |
sido_dc | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.138 | 0.953 |
sex_dc | 0.000 | 0.000 | 0.000 | 0.000 | 0.138 | 1.000 | 0.091 |
gugun_dc | 0.953 | 0.000 | 0.953 | 0.953 | 0.953 | 0.091 | 1.000 |
age_dc | ticket_dt | base_year | base_day | sido_dc | gugun_dc | sex_dc | ticket_cnt | |
---|---|---|---|---|---|---|---|---|
age_dc | 1.000 | 0.000 | 0.000 | 0.000 | 0.509 | 0.226 | 0.283 | 0.000 |
ticket_dt | 0.000 | 1.000 | 0.826 | 0.826 | 0.000 | 0.953 | 0.000 | 0.000 |
base_year | 0.000 | 0.826 | 1.000 | 0.826 | 0.000 | 0.953 | 0.000 | 0.000 |
base_day | 0.000 | 0.826 | 0.826 | 1.000 | 0.000 | 0.953 | 0.000 | 0.000 |
sido_dc | 0.509 | 0.000 | 0.000 | 0.000 | 1.000 | 0.953 | 0.138 | 0.000 |
gugun_dc | 0.226 | 0.953 | 0.953 | 0.953 | 0.953 | 1.000 | 0.091 | 0.000 |
sex_dc | 0.283 | 0.000 | 0.000 | 0.000 | 0.138 | 0.091 | 1.000 | 0.000 |
ticket_cnt | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
ticket_dt | base_year | base_month | base_day | sido_dc | gugun_dc | dong_dc | sex_dc | age_dc | ticket_cnt | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 20180118 | 2018 | 1 | 18 | 경상남도 | 김해시 | <NA> | 남 | 35 | 1 |
1 | 20200121 | 2020 | 1 | 21 | 부산광역시 | 해운대구 | 재송제1동 | 남 | 55 | 1 |
2 | 20180118 | 2018 | 1 | 18 | 경상남도 | 김해시 | 부원동 | 남 | 35 | 1 |
3 | 20180118 | 2018 | 1 | 18 | 경상남도 | 김해시 | 활천동 | 남 | 35 | 1 |
4 | 20180118 | 2018 | 1 | 18 | 경상남도 | 창원시 성산구 | 사파동 | 남 | 35 | 1 |
5 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 강서구 | <NA> | 여 | 25 | 1 |
6 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 강서구 | 대저2동 | 여 | 25 | 1 |
7 | 20200121 | 2020 | 1 | 21 | 부산광역시 | 해운대구 | 재송제1동 | 여 | 40 | 1 |
8 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 금정구 | <NA> | 남 | 55 | 1 |
9 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 금정구 | <NA> | 여 | 50 | 2 |
ticket_dt | base_year | base_month | base_day | sido_dc | gugun_dc | dong_dc | sex_dc | age_dc | ticket_cnt | |
---|---|---|---|---|---|---|---|---|---|---|
90 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 양정제2동 | 여 | 45 | 1 |
91 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 전포제1동 | 남 | 40 | 1 |
92 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 전포제2동 | 여 | 35 | 1 |
93 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 전포제3동 | 남 | 40 | 1 |
94 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 초읍동 | 남 | 60 | 1 |
95 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 초읍동 | 여 | 35 | 1 |
96 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 부산진구 | 초읍동 | 여 | 45 | 1 |
97 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 북구 | <NA> | 남 | 40 | 1 |
98 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 북구 | <NA> | 남 | 55 | 1 |
99 | 20180118 | 2018 | 1 | 18 | 부산광역시 | 북구 | <NA> | 여 | 45 | 2 |