Overview

Dataset statistics

Number of variables11
Number of observations100
Missing cells23
Missing cells (%)2.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.2 KiB
Average record size in memory94.3 B

Variable types

Numeric3
Categorical7
Text1

Alerts

ticket_ty is highly overall correlated with base_day and 2 other fieldsHigh correlation
show_ty is highly overall correlated with base_day and 2 other fieldsHigh correlation
gnr_ty is highly overall correlated with base_day and 2 other fieldsHigh correlation
show_dt is highly overall correlated with base_day and 2 other fieldsHigh correlation
base_day is highly overall correlated with show_dt and 5 other fieldsHigh correlation
base_year is highly overall correlated with show_dt and 2 other fieldsHigh correlation
base_month is highly overall correlated with show_dt and 2 other fieldsHigh correlation
sido_dc is highly overall correlated with gugun_dcHigh correlation
gugun_dc is highly overall correlated with sido_dcHigh correlation
base_year is highly imbalanced (80.6%)Imbalance
base_month is highly imbalanced (80.6%)Imbalance
sido_dc is highly imbalanced (65.4%)Imbalance
dong_dc has 23 (23.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 09:47:33.313527
Analysis finished2023-12-10 09:47:37.140097
Duration3.83 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

show_dt
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20171018
Minimum20170109
Maximum20200207
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:47:37.296514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170109
5-th percentile20170109
Q120170114
median20170114
Q320170116
95-th percentile20170120
Maximum20200207
Range30098
Interquartile range (IQR)2

Descriptive statistics

Standard deviation5159.1845
Coefficient of variation (CV)0.00025577215
Kurtosis29.897755
Mean20171018
Median Absolute Deviation (MAD)0
Skewness5.5946468
Sum2.0171018 × 109
Variance26617184
MonotonicityNot monotonic
2023-12-10T18:47:37.615993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
20170114 61
61.0%
20170120 11
 
11.0%
20170119 10
 
10.0%
20170109 8
 
8.0%
20170116 7
 
7.0%
20200207 3
 
3.0%
ValueCountFrequency (%)
20170109 8
 
8.0%
20170114 61
61.0%
20170116 7
 
7.0%
20170119 10
 
10.0%
20170120 11
 
11.0%
20200207 3
 
3.0%
ValueCountFrequency (%)
20200207 3
 
3.0%
20170120 11
 
11.0%
20170119 10
 
10.0%
20170116 7
 
7.0%
20170114 61
61.0%
20170109 8
 
8.0%

base_year
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2017
97 
2020
 
3

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017
2nd row2020
3rd row2017
4th row2017
5th row2017

Common Values

ValueCountFrequency (%)
2017 97
97.0%
2020 3
 
3.0%

Length

2023-12-10T18:47:37.899556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:38.210776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2017 97
97.0%
2020 3
 
3.0%

base_month
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
97 
2
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 97
97.0%
2 3
 
3.0%

Length

2023-12-10T18:47:38.519564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:38.840032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 97
97.0%
2 3
 
3.0%

base_day
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.69
Minimum7
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:47:39.059370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile9
Q114
median14
Q316
95-th percentile20
Maximum20
Range13
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.1356375
Coefficient of variation (CV)0.21345388
Kurtosis0.36030513
Mean14.69
Median Absolute Deviation (MAD)0
Skewness-0.13197855
Sum1469
Variance9.8322222
MonotonicityNot monotonic
2023-12-10T18:47:39.351111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
14 61
61.0%
20 11
 
11.0%
19 10
 
10.0%
9 8
 
8.0%
16 7
 
7.0%
7 3
 
3.0%
ValueCountFrequency (%)
7 3
 
3.0%
9 8
 
8.0%
14 61
61.0%
16 7
 
7.0%
19 10
 
10.0%
20 11
 
11.0%
ValueCountFrequency (%)
20 11
 
11.0%
19 10
 
10.0%
16 7
 
7.0%
14 61
61.0%
9 8
 
8.0%
7 3
 
3.0%

sido_dc
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
부산광역시
89 
경상남도
10 
대구광역시
 
1

Length

Max length5
Median length5
Mean length4.9
Min length4

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row부산광역시
2nd row부산광역시
3rd row부산광역시
4th row부산광역시
5th row부산광역시

Common Values

ValueCountFrequency (%)
부산광역시 89
89.0%
경상남도 10
 
10.0%
대구광역시 1
 
1.0%

Length

2023-12-10T18:47:39.689673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:39.908966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 89
89.0%
경상남도 10
 
10.0%
대구광역시 1
 
1.0%

gugun_dc
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)17.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
해운대구
18 
남구
11 
수영구
금정구
양산시
Other values (12)
46 

Length

Max length4
Median length3
Mean length3.01
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row남구
2nd row해운대구
3rd row북구
4th row수영구
5th row수영구

Common Values

ValueCountFrequency (%)
해운대구 18
18.0%
남구 11
11.0%
수영구 9
9.0%
금정구 9
9.0%
양산시 7
 
7.0%
동래구 7
 
7.0%
부산진구 7
 
7.0%
서구 5
 
5.0%
영도구 5
 
5.0%
연제구 5
 
5.0%
Other values (7) 17
17.0%

Length

2023-12-10T18:47:40.134344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
해운대구 18
18.0%
남구 11
11.0%
수영구 9
9.0%
금정구 9
9.0%
양산시 7
 
7.0%
동래구 7
 
7.0%
부산진구 7
 
7.0%
연제구 5
 
5.0%
영도구 5
 
5.0%
서구 5
 
5.0%
Other values (7) 17
17.0%

dong_dc
Text

MISSING 

Distinct66
Distinct (%)85.7%
Missing23
Missing (%)23.0%
Memory size932.0 B
2023-12-10T18:47:40.867185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5
Mean length4.4805195
Min length2

Characters and Unicode

Total characters345
Distinct characters74
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)74.0%

Sample

1st row문현제4동
2nd row화명제1동
3rd row남천제1동
4th row남항동
5th row영선제1동
ValueCountFrequency (%)
대연제3동 3
 
3.9%
좌제2동 3
 
3.9%
문현제4동 2
 
2.6%
우제1동 2
 
2.6%
장전제1동 2
 
2.6%
명륜동 2
 
2.6%
남천제1동 2
 
2.6%
화명제1동 2
 
2.6%
좌제1동 2
 
2.6%
대청동 1
 
1.3%
Other values (56) 56
72.7%
2023-12-10T18:47:41.774426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
77
22.3%
61
17.7%
1 23
 
6.7%
2 20
 
5.8%
3 12
 
3.5%
9
 
2.6%
8
 
2.3%
6
 
1.7%
6
 
1.7%
6
 
1.7%
Other values (64) 117
33.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 283
82.0%
Decimal Number 62
 
18.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
77
27.2%
61
21.6%
9
 
3.2%
8
 
2.8%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
5
 
1.8%
Other values (56) 93
32.9%
Decimal Number
ValueCountFrequency (%)
1 23
37.1%
2 20
32.3%
3 12
19.4%
5 2
 
3.2%
4 2
 
3.2%
9 1
 
1.6%
8 1
 
1.6%
6 1
 
1.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 283
82.0%
Common 62
 
18.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
77
27.2%
61
21.6%
9
 
3.2%
8
 
2.8%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
5
 
1.8%
Other values (56) 93
32.9%
Common
ValueCountFrequency (%)
1 23
37.1%
2 20
32.3%
3 12
19.4%
5 2
 
3.2%
4 2
 
3.2%
9 1
 
1.6%
8 1
 
1.6%
6 1
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 283
82.0%
ASCII 62
 
18.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
77
27.2%
61
21.6%
9
 
3.2%
8
 
2.8%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
5
 
1.8%
Other values (56) 93
32.9%
ASCII
ValueCountFrequency (%)
1 23
37.1%
2 20
32.3%
3 12
19.4%
5 2
 
3.2%
4 2
 
3.2%
9 1
 
1.6%
8 1
 
1.6%
6 1
 
1.6%

ticket_ty
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
공연
74 
강좌
26 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강좌
2nd row강좌
3rd row강좌
4th row강좌
5th row강좌

Common Values

ValueCountFrequency (%)
공연 74
74.0%
강좌 26
 
26.0%

Length

2023-12-10T18:47:42.062210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:42.439648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
공연 74
74.0%
강좌 26
 
26.0%

show_ty
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
기획공연
61 
기획강좌
26 
예술단공연
13 

Length

Max length5
Median length4
Mean length4.13
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기획강좌
2nd row기획강좌
3rd row기획강좌
4th row기획강좌
5th row기획강좌

Common Values

ValueCountFrequency (%)
기획공연 61
61.0%
기획강좌 26
26.0%
예술단공연 13
 
13.0%

Length

2023-12-10T18:47:42.702821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:42.940331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
기획공연 61
61.0%
기획강좌 26
26.0%
예술단공연 13
 
13.0%

gnr_ty
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
클래식
72 
강좌
26 
무용
 
2

Length

Max length3
Median length3
Mean length2.72
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강좌
2nd row강좌
3rd row강좌
4th row강좌
5th row강좌

Common Values

ValueCountFrequency (%)
클래식 72
72.0%
강좌 26
 
26.0%
무용 2
 
2.0%

Length

2023-12-10T18:47:43.153828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T18:47:43.335213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
클래식 72
72.0%
강좌 26
 
26.0%
무용 2
 
2.0%

view_cnt
Real number (ℝ)

Distinct13
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.52
Minimum1
Maximum34
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-10T18:47:43.587898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34
95-th percentile12.05
Maximum34
Range33
Interquartile range (IQR)3

Descriptive statistics

Standard deviation4.4551185
Coefficient of variation (CV)1.2656587
Kurtosis23.135755
Mean3.52
Median Absolute Deviation (MAD)1
Skewness4.2042667
Sum352
Variance19.848081
MonotonicityNot monotonic
2023-12-10T18:47:43.871696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2 37
37.0%
1 27
27.0%
4 12
 
12.0%
3 6
 
6.0%
5 4
 
4.0%
8 3
 
3.0%
7 3
 
3.0%
17 2
 
2.0%
6 2
 
2.0%
12 1
 
1.0%
Other values (3) 3
 
3.0%
ValueCountFrequency (%)
1 27
27.0%
2 37
37.0%
3 6
 
6.0%
4 12
 
12.0%
5 4
 
4.0%
6 2
 
2.0%
7 3
 
3.0%
8 3
 
3.0%
12 1
 
1.0%
13 1
 
1.0%
ValueCountFrequency (%)
34 1
 
1.0%
17 2
 
2.0%
15 1
 
1.0%
13 1
 
1.0%
12 1
 
1.0%
8 3
 
3.0%
7 3
 
3.0%
6 2
 
2.0%
5 4
 
4.0%
4 12
12.0%

Interactions

2023-12-10T18:47:35.874614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:34.566884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:35.214962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:36.088817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:34.751307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:35.500310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:36.338064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:35.010506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T18:47:35.701816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T18:47:44.075127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
show_dtbase_yearbase_monthbase_daysido_dcgugun_dcdong_dcticket_tyshow_tygnr_tyview_cnt
show_dt1.0000.9630.9631.0000.0000.0000.0000.4220.2110.2110.000
base_year0.9631.0000.9631.0000.0000.0000.0000.3210.1600.1600.000
base_month0.9630.9631.0001.0000.0000.0000.0000.3210.1600.1600.000
base_day1.0001.0001.0001.0000.2470.3490.0000.7210.8130.6570.000
sido_dc0.0000.0000.0000.2471.0000.8921.0000.0930.7090.2050.000
gugun_dc0.0000.0000.0000.3490.8921.0001.0000.2140.6080.4580.119
dong_dc0.0000.0000.0000.0001.0001.0001.0000.0000.0000.0000.942
ticket_ty0.4220.3210.3210.7210.0930.2140.0001.0001.0001.0000.235
show_ty0.2110.1600.1600.8130.7090.6080.0001.0001.0000.9570.406
gnr_ty0.2110.1600.1600.6570.2050.4580.0001.0000.9571.0000.000
view_cnt0.0000.0000.0000.0000.0000.1190.9420.2350.4060.0001.000
2023-12-10T18:47:44.471196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ticket_tybase_yearbase_monthshow_tysido_dcgugun_dcgnr_ty
ticket_ty1.0000.2080.2080.9950.1530.1720.995
base_year0.2081.0000.8260.2620.0000.0000.262
base_month0.2080.8261.0000.2620.0000.0000.262
show_ty0.9950.2620.2621.0000.3650.3770.746
sido_dc0.1530.0000.0000.3651.0000.7150.063
gugun_dc0.1720.0000.0000.3770.7151.0000.255
gnr_ty0.9950.2620.2620.7460.0630.2551.000
2023-12-10T18:47:45.121948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
show_dtbase_dayview_cntbase_yearbase_monthsido_dcgugun_dcticket_tyshow_tygnr_ty
show_dt1.0000.773-0.1310.8260.8260.0000.0000.2080.2620.262
base_day0.7731.000-0.0210.9850.9850.1890.1710.8420.8290.619
view_cnt-0.131-0.0211.0000.0000.0000.0000.0330.1640.1800.000
base_year0.8260.9850.0001.0000.8260.0000.0000.2080.2620.262
base_month0.8260.9850.0000.8261.0000.0000.0000.2080.2620.262
sido_dc0.0000.1890.0000.0000.0001.0000.7150.1530.3650.063
gugun_dc0.0000.1710.0330.0000.0000.7151.0000.1720.3770.255
ticket_ty0.2080.8420.1640.2080.2080.1530.1721.0000.9950.995
show_ty0.2620.8290.1800.2620.2620.3650.3770.9951.0000.746
gnr_ty0.2620.6190.0000.2620.2620.0630.2550.9950.7461.000

Missing values

2023-12-10T18:47:36.678770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T18:47:36.971390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

show_dtbase_yearbase_monthbase_daysido_dcgugun_dcdong_dcticket_tyshow_tygnr_tyview_cnt
020170109201719부산광역시남구문현제4동강좌기획강좌강좌1
120200207202027부산광역시해운대구<NA>강좌기획강좌강좌1
220170109201719부산광역시북구화명제1동강좌기획강좌강좌1
320170109201719부산광역시수영구<NA>강좌기획강좌강좌1
420170109201719부산광역시수영구남천제1동강좌기획강좌강좌1
520170109201719부산광역시영도구남항동강좌기획강좌강좌1
620170109201719부산광역시영도구영선제1동강좌기획강좌강좌1
720200207202027부산광역시해운대구반여제1동강좌기획강좌강좌1
820170109201719부산광역시해운대구재송제1동강좌기획강좌강좌1
920170109201719부산광역시해운대구좌제1동강좌기획강좌강좌2
show_dtbase_yearbase_monthbase_daysido_dcgugun_dcdong_dcticket_tyshow_tygnr_tyview_cnt
90201701202017120경상남도양산시<NA>공연예술단공연클래식3
91201701202017120경상남도양산시강서동공연예술단공연클래식2
92201701202017120경상남도양산시덕계동공연예술단공연클래식1
93201701202017120경상남도양산시상북면공연예술단공연클래식2
94201701202017120경상남도양산시평산동공연예술단공연클래식1
95201701202017120대구광역시북구무태조야동공연예술단공연클래식2
96201701202017120부산광역시강서구명지2동공연예술단공연클래식5
97201701202017120부산광역시금정구<NA>공연예술단공연클래식34
98201701202017120부산광역시금정구구서제1동공연예술단공연클래식8
99201701202017120부산광역시금정구구서제2동공연예술단공연클래식15