Overview

Dataset statistics

Number of variables10
Number of observations100
Missing cells28
Missing cells (%)2.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.5 KiB
Average record size in memory87.3 B

Variable types

Categorical8
Text1
Numeric1

Alerts

base_month has constant value ""Constant
ticket_dt is highly overall correlated with base_year and 2 other fieldsHigh correlation
base_day is highly overall correlated with ticket_dt and 2 other fieldsHigh correlation
base_year is highly overall correlated with ticket_dt and 2 other fieldsHigh correlation
sido_dc is highly overall correlated with age_dc and 1 other fieldsHigh correlation
gugun_dc is highly overall correlated with ticket_dt and 3 other fieldsHigh correlation
age_dc is highly overall correlated with sido_dcHigh correlation
ticket_dt is highly imbalanced (80.6%)Imbalance
base_year is highly imbalanced (80.6%)Imbalance
base_day is highly imbalanced (80.6%)Imbalance
sido_dc is highly imbalanced (75.8%)Imbalance
ticket_cnt is highly imbalanced (64.2%)Imbalance
dong_dc has 28 (28.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 10:00:45.333598
Analysis finished2023-12-10 10:00:47.177762
Duration1.84 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

ticket_dt
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
20180118
97 
20200121
 
3

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20180118
2nd row20200121
3rd row20180118
4th row20180118
5th row20180118

Common Values

ValueCountFrequency (%)
20180118 97
97.0%
20200121 3
 
3.0%

Length

2023-12-10T19:00:47.297831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:47.517725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20180118 97
97.0%
20200121 3
 
3.0%

base_year
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2018
97 
2020
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2020
3rd row2018
4th row2018
5th row2018

Common Values

ValueCountFrequency (%)
2018 97
97.0%
2020 3
 
3.0%

Length

2023-12-10T19:00:47.751629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:48.046368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2018 97
97.0%
2020 3
 
3.0%

base_month
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
100 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 100
100.0%

Length

2023-12-10T19:00:48.277229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:48.495446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 100
100.0%

base_day
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
18
97 
21
 
3

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row18
2nd row21
3rd row18
4th row18
5th row18

Common Values

ValueCountFrequency (%)
18 97
97.0%
21 3
 
3.0%

Length

2023-12-10T19:00:48.704581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:48.881331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
18 97
97.0%
21 3
 
3.0%

sido_dc
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
부산광역시
96 
경상남도
 
4

Length

Max length5
Median length5
Mean length4.96
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경상남도
2nd row부산광역시
3rd row경상남도
4th row경상남도
5th row경상남도

Common Values

ValueCountFrequency (%)
부산광역시 96
96.0%
경상남도 4
 
4.0%

Length

2023-12-10T19:00:49.083399image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:49.302041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 96
96.0%
경상남도 4
 
4.0%

gugun_dc
Categorical

HIGH CORRELATION 

Distinct11
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
부산진구
32 
남구
21 
동래구
18 
금정구
동구
Other values (6)
15 

Length

Max length7
Median length4
Mean length3.09
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row김해시
2nd row해운대구
3rd row김해시
4th row김해시
5th row창원시 성산구

Common Values

ValueCountFrequency (%)
부산진구 32
32.0%
남구 21
21.0%
동래구 18
18.0%
금정구 8
 
8.0%
동구 6
 
6.0%
김해시 3
 
3.0%
해운대구 3
 
3.0%
기장군 3
 
3.0%
북구 3
 
3.0%
강서구 2
 
2.0%

Length

2023-12-10T19:00:49.484178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부산진구 32
31.7%
남구 21
20.8%
동래구 18
17.8%
금정구 8
 
7.9%
동구 6
 
5.9%
김해시 3
 
3.0%
해운대구 3
 
3.0%
기장군 3
 
3.0%
북구 3
 
3.0%
강서구 2
 
2.0%
Other values (2) 2
 
2.0%

dong_dc
Text

MISSING 

Distinct51
Distinct (%)70.8%
Missing28
Missing (%)28.0%
Memory size932.0 B
2023-12-10T19:00:49.869099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.6805556
Min length3

Characters and Unicode

Total characters337
Distinct characters50
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)48.6%

Sample

1st row재송제1동
2nd row부원동
3rd row활천동
4th row사파동
5th row대저2동
ValueCountFrequency (%)
안락제2동 3
 
4.2%
부암제1동 3
 
4.2%
양정제1동 3
 
4.2%
초읍동 3
 
4.2%
용호제2동 3
 
4.2%
온천제3동 2
 
2.8%
사직제2동 2
 
2.8%
범천제2동 2
 
2.8%
안락제1동 2
 
2.8%
양정제2동 2
 
2.8%
Other values (41) 47
65.3%
2023-12-10T19:00:50.468915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
70
20.8%
61
18.1%
2 26
 
7.7%
1 22
 
6.5%
3 10
 
3.0%
8
 
2.4%
7
 
2.1%
6
 
1.8%
6
 
1.8%
6
 
1.8%
Other values (40) 115
34.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 275
81.6%
Decimal Number 62
 
18.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
70
25.5%
61
22.2%
8
 
2.9%
7
 
2.5%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.8%
5
 
1.8%
Other values (34) 95
34.5%
Decimal Number
ValueCountFrequency (%)
2 26
41.9%
1 22
35.5%
3 10
 
16.1%
4 2
 
3.2%
5 1
 
1.6%
6 1
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 275
81.6%
Common 62
 
18.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
70
25.5%
61
22.2%
8
 
2.9%
7
 
2.5%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.8%
5
 
1.8%
Other values (34) 95
34.5%
Common
ValueCountFrequency (%)
2 26
41.9%
1 22
35.5%
3 10
 
16.1%
4 2
 
3.2%
5 1
 
1.6%
6 1
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 275
81.6%
ASCII 62
 
18.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
70
25.5%
61
22.2%
8
 
2.9%
7
 
2.5%
6
 
2.2%
6
 
2.2%
6
 
2.2%
6
 
2.2%
5
 
1.8%
5
 
1.8%
Other values (34) 95
34.5%
ASCII
ValueCountFrequency (%)
2 26
41.9%
1 22
35.5%
3 10
 
16.1%
4 2
 
3.2%
5 1
 
1.6%
6 1
 
1.6%

sex_dc
Categorical

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
54 
46 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
54
54.0%
46
46.0%

Length

2023-12-10T19:00:50.741896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:50.924967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
54
54.0%
46
46.0%

age_dc
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.05
Minimum25
Maximum65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T19:00:51.079351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum25
5-th percentile25
Q140
median45
Q355
95-th percentile60
Maximum65
Range40
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.8804725
Coefficient of variation (CV)0.21455966
Kurtosis-0.32824197
Mean46.05
Median Absolute Deviation (MAD)5
Skewness-0.33056118
Sum4605
Variance97.623737
MonotonicityNot monotonic
2023-12-10T19:00:51.288543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
50 21
21.0%
45 17
17.0%
40 16
16.0%
55 15
15.0%
35 11
11.0%
60 9
9.0%
25 7
 
7.0%
65 3
 
3.0%
30 1
 
1.0%
ValueCountFrequency (%)
25 7
 
7.0%
30 1
 
1.0%
35 11
11.0%
40 16
16.0%
45 17
17.0%
50 21
21.0%
55 15
15.0%
60 9
9.0%
65 3
 
3.0%
ValueCountFrequency (%)
65 3
 
3.0%
60 9
9.0%
55 15
15.0%
50 21
21.0%
45 17
17.0%
40 16
16.0%
35 11
11.0%
30 1
 
1.0%
25 7
 
7.0%

ticket_cnt
Categorical

IMBALANCE 

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
86 
2
11 
3
 
2
5
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 86
86.0%
2 11
 
11.0%
3 2
 
2.0%
5 1
 
1.0%

Length

2023-12-10T19:00:51.586522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:00:51.822989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 86
86.0%
2 11
 
11.0%
3 2
 
2.0%
5 1
 
1.0%

Interactions

2023-12-10T19:00:46.368338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T19:00:51.991471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ticket_dtbase_yearbase_daysido_dcgugun_dcdong_dcsex_dcage_dcticket_cnt
ticket_dt1.0000.9630.9630.0001.0001.0000.0000.0000.000
base_year0.9631.0000.9630.0001.0001.0000.0000.0000.000
base_day0.9630.9631.0000.0001.0001.0000.0000.0000.000
sido_dc0.0000.0000.0001.0001.0001.0000.2150.5270.000
gugun_dc1.0001.0001.0001.0001.0000.9990.1050.4640.000
dong_dc1.0001.0001.0001.0000.9991.0000.1630.0000.000
sex_dc0.0000.0000.0000.2150.1050.1631.0000.2950.000
age_dc0.0000.0000.0000.5270.4640.0000.2951.0000.000
ticket_cnt0.0000.0000.0000.0000.0000.0000.0000.0001.000
2023-12-10T19:00:52.272324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ticket_dtticket_cntbase_daybase_yearsido_dcsex_dcgugun_dc
ticket_dt1.0000.0000.8260.8260.0000.0000.953
ticket_cnt0.0001.0000.0000.0000.0000.0000.000
base_day0.8260.0001.0000.8260.0000.0000.953
base_year0.8260.0000.8261.0000.0000.0000.953
sido_dc0.0000.0000.0000.0001.0000.1380.953
sex_dc0.0000.0000.0000.0000.1381.0000.091
gugun_dc0.9530.0000.9530.9530.9530.0911.000
2023-12-10T19:00:52.513580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
age_dcticket_dtbase_yearbase_daysido_dcgugun_dcsex_dcticket_cnt
age_dc1.0000.0000.0000.0000.5090.2260.2830.000
ticket_dt0.0001.0000.8260.8260.0000.9530.0000.000
base_year0.0000.8261.0000.8260.0000.9530.0000.000
base_day0.0000.8260.8261.0000.0000.9530.0000.000
sido_dc0.5090.0000.0000.0001.0000.9530.1380.000
gugun_dc0.2260.9530.9530.9530.9531.0000.0910.000
sex_dc0.2830.0000.0000.0000.1380.0911.0000.000
ticket_cnt0.0000.0000.0000.0000.0000.0000.0001.000

Missing values

2023-12-10T19:00:46.705565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:00:47.061232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ticket_dtbase_yearbase_monthbase_daysido_dcgugun_dcdong_dcsex_dcage_dcticket_cnt
0201801182018118경상남도김해시<NA>351
1202001212020121부산광역시해운대구재송제1동551
2201801182018118경상남도김해시부원동351
3201801182018118경상남도김해시활천동351
4201801182018118경상남도창원시 성산구사파동351
5201801182018118부산광역시강서구<NA>251
6201801182018118부산광역시강서구대저2동251
7202001212020121부산광역시해운대구재송제1동401
8201801182018118부산광역시금정구<NA>551
9201801182018118부산광역시금정구<NA>502
ticket_dtbase_yearbase_monthbase_daysido_dcgugun_dcdong_dcsex_dcage_dcticket_cnt
90201801182018118부산광역시부산진구양정제2동451
91201801182018118부산광역시부산진구전포제1동401
92201801182018118부산광역시부산진구전포제2동351
93201801182018118부산광역시부산진구전포제3동401
94201801182018118부산광역시부산진구초읍동601
95201801182018118부산광역시부산진구초읍동351
96201801182018118부산광역시부산진구초읍동451
97201801182018118부산광역시북구<NA>401
98201801182018118부산광역시북구<NA>551
99201801182018118부산광역시북구<NA>452