Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1963
Duplicate rows (%)19.6%
Total size in memory498.0 KiB
Average record size in memory51.0 B

Variable types

Numeric2
DateTime1
Categorical2

Dataset

Description전자게임기기에서 발생되는 배팅 금액과 게임 횟수에 대한 데이터
Author그랜드코리아레저(주)
URLhttps://www.data.go.kr/data/15072467/fileData.do

Alerts

Dataset has 1963 (19.6%) duplicate rowsDuplicates
번호 is highly imbalanced (52.2%)Imbalance

Reproduction

Analysis started2023-12-12 05:55:26.721503
Analysis finished2023-12-12 05:55:28.031041
Duration1.31 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Real number (ℝ)

Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1927.9776
Minimum14
Maximum4534
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:55:28.113056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile66
Q1922
median1380
Q33782
95-th percentile4486
Maximum4534
Range4520
Interquartile range (IQR)2860

Descriptive statistics

Standard deviation1563.1211
Coefficient of variation (CV)0.81075687
Kurtosis-1.2041118
Mean1927.9776
Median Absolute Deviation (MAD)1190
Skewness0.5462399
Sum19279776
Variance2443347.5
MonotonicityNot monotonic
2023-12-12T14:55:28.265270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1220 502
 
5.0%
4278 407
 
4.1%
4294 404
 
4.0%
1516 386
 
3.9%
1328 380
 
3.8%
4302 378
 
3.8%
1196 374
 
3.7%
4286 358
 
3.6%
1276 274
 
2.7%
1528 250
 
2.5%
Other values (75) 6287
62.9%
ValueCountFrequency (%)
14 20
 
0.2%
22 84
0.8%
34 58
0.6%
42 67
0.7%
50 129
1.3%
58 95
0.9%
66 107
1.1%
74 62
0.6%
82 109
1.1%
90 91
0.9%
ValueCountFrequency (%)
4534 90
 
0.9%
4530 37
 
0.4%
4518 49
 
0.5%
4510 8
 
0.1%
4506 25
 
0.2%
4498 84
 
0.8%
4486 210
2.1%
4314 61
 
0.6%
4302 378
3.8%
4294 404
4.0%

일자
Date

Distinct597
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2019-04-26 00:00:00
Maximum2019-04-26 09:57:00
2023-12-12T14:55:28.419372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:55:28.593616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

번호
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
8484 
1
951 
2
 
565

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 8484
84.8%
1 951
 
9.5%
2 565
 
5.7%

Length

2023-12-12T14:55:28.732489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:55:28.823239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 8484
84.8%
1 951
 
9.5%
2 565
 
5.7%

타입
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
금액
5015 
횟수
4985 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row금액
2nd row횟수
3rd row횟수
4th row횟수
5th row횟수

Common Values

ValueCountFrequency (%)
금액 5015
50.1%
횟수 4985
49.9%

Length

2023-12-12T14:55:28.934245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:55:29.040147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
금액 5015
50.1%
횟수 4985
49.9%

결과값
Real number (ℝ)

Distinct148
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8245.124
Minimum1
Maximum850000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T14:55:29.178882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median100
Q34000
95-th percentile50000
Maximum850000
Range849999
Interquartile range (IQR)3999

Descriptive statistics

Standard deviation25534.611
Coefficient of variation (CV)3.0969347
Kurtosis161.70627
Mean8245.124
Median Absolute Deviation (MAD)99
Skewness8.673068
Sum82451240
Variance6.5201636 × 108
MonotonicityNot monotonic
2023-12-12T14:55:29.337156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 4981
49.8%
20000 416
 
4.2%
4000 402
 
4.0%
500 351
 
3.5%
2000 289
 
2.9%
100 284
 
2.8%
5000 247
 
2.5%
50000 179
 
1.8%
1500 174
 
1.7%
720 172
 
1.7%
Other values (138) 2505
25.1%
ValueCountFrequency (%)
1 4981
49.8%
2 3
 
< 0.1%
3 1
 
< 0.1%
50 4
 
< 0.1%
100 284
 
2.8%
180 42
 
0.4%
200 17
 
0.2%
250 33
 
0.3%
280 13
 
0.1%
300 5
 
0.1%
ValueCountFrequency (%)
850000 1
< 0.1%
450000 1
< 0.1%
370000 1
< 0.1%
355000 1
< 0.1%
350000 2
< 0.1%
345000 1
< 0.1%
290000 1
< 0.1%
280000 1
< 0.1%
260000 1
< 0.1%
250000 2
< 0.1%

Interactions

2023-12-12T14:55:27.573061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:55:27.072467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:55:27.708220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:55:27.451879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:55:29.440267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분번호타입결과값
구분1.0000.5620.0000.264
번호0.5621.0000.0000.212
타입0.0000.0001.0000.137
결과값0.2640.2120.1371.000
2023-12-12T14:55:29.554265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호타입
번호1.0000.000
타입0.0001.000
2023-12-12T14:55:29.641336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분결과값번호타입
구분1.000-0.1330.4520.000
결과값-0.1331.0000.1450.146
번호0.4520.1451.0000.000
타입0.0000.1460.0001.000

Missing values

2023-12-12T14:55:27.860341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:55:27.987376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분일자번호타입결과값
7276645302019-04-26 08:02:000금액1900
6615438102019-04-26 07:07:000횟수1
1640138102019-04-26 01:32:000횟수1
521139022019-04-26 05:02:000횟수1
7128944862019-04-26 07:54:000횟수1
4082611922019-04-26 03:53:000금액4000
3392012762019-04-26 03:05:000금액5000
5970242942019-04-26 06:05:000횟수1
8556215242019-04-26 09:18:000금액6000
7391912202019-04-26 08:08:000횟수1
구분일자번호타입결과값
693471622019-04-26 07:38:000금액80000
3167342019-04-26 00:20:002횟수1
2250838102019-04-26 02:04:000횟수1
118071982019-04-26 01:05:001횟수1
8100712202019-04-26 08:50:000횟수1
7710512202019-04-26 08:24:000횟수1
3325912762019-04-26 03:02:000금액5000
177721982019-04-26 01:39:000횟수1
9074713282019-04-26 09:39:000횟수1
2353243022019-04-26 02:08:000금액480

Duplicate rows

Most frequently occurring

구분일자번호타입결과값# duplicates
25511962019-04-26 00:29:000금액40008
209022019-04-26 05:02:000횟수17
76513442019-04-26 09:42:000횟수17
1039222019-04-26 09:32:000금액60006
21911922019-04-26 03:54:000금액40006
28311962019-04-26 04:43:000금액40006
50212202019-04-26 08:50:000금액40006
60212762019-04-26 03:15:000횟수16
67713282019-04-26 04:15:000횟수16
72213282019-04-26 09:39:000횟수16