Overview

Dataset statistics

Number of variables13
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.8 KiB
Average record size in memory112.3 B

Variable types

Categorical5
Numeric6
Text2

Dataset

Description샘플 데이터
Author서울시(스마트카드사)
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=70

Alerts

사용자구분(BILL_USER) is highly imbalanced (50.3%)Imbalance

Reproduction

Analysis started2024-01-14 06:50:23.350733
Analysis finished2024-01-14 06:50:28.695972
Duration5.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

년(YEAR)
Categorical

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2018
122 
2019
102 
2020
99 
2017
92 
2021
85 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2019
2nd row2021
3rd row2020
4th row2018
5th row2019

Common Values

ValueCountFrequency (%)
2018 122
24.4%
2019 102
20.4%
2020 99
19.8%
2017 92
18.4%
2021 85
17.0%

Length

2024-01-14T15:50:28.763757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:50:28.891963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2018 122
24.4%
2019 102
20.4%
2020 99
19.8%
2017 92
18.4%
2021 85
17.0%

월(MONTH)
Real number (ℝ)

Distinct12
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.226
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:29.014453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.3599856
Coefficient of variation (CV)0.53967002
Kurtosis-1.1802568
Mean6.226
Median Absolute Deviation (MAD)3
Skewness0.067244765
Sum3113
Variance11.289503
MonotonicityNot monotonic
2024-01-14T15:50:29.178100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 54
10.8%
7 52
10.4%
9 48
9.6%
10 47
9.4%
2 43
8.6%
1 43
8.6%
5 42
8.4%
6 41
8.2%
4 38
7.6%
8 32
6.4%
Other values (2) 60
12.0%
ValueCountFrequency (%)
1 43
8.6%
2 43
8.6%
3 54
10.8%
4 38
7.6%
5 42
8.4%
6 41
8.2%
7 52
10.4%
8 32
6.4%
9 48
9.6%
10 47
9.4%
ValueCountFrequency (%)
12 32
6.4%
11 28
5.6%
10 47
9.4%
9 48
9.6%
8 32
6.4%
7 52
10.4%
6 41
8.2%
5 42
8.4%
4 38
7.6%
3 54
10.8%

일(DAY)
Real number (ℝ)

Distinct31
Distinct (%)6.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.552
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:29.311835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q19
median17
Q324
95-th percentile30
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.7837124
Coefficient of variation (CV)0.53067378
Kurtosis-1.1597745
Mean16.552
Median Absolute Deviation (MAD)8
Skewness-0.060814941
Sum8276
Variance77.153603
MonotonicityNot monotonic
2024-01-14T15:50:29.463594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
20 25
 
5.0%
9 25
 
5.0%
12 21
 
4.2%
14 21
 
4.2%
21 20
 
4.0%
27 19
 
3.8%
30 19
 
3.8%
19 18
 
3.6%
25 18
 
3.6%
8 18
 
3.6%
Other values (21) 296
59.2%
ValueCountFrequency (%)
1 15
3.0%
2 15
3.0%
3 9
 
1.8%
4 16
3.2%
5 7
 
1.4%
6 15
3.0%
7 18
3.6%
8 18
3.6%
9 25
5.0%
10 15
3.0%
ValueCountFrequency (%)
31 15
3.0%
30 19
3.8%
29 17
3.4%
28 17
3.4%
27 19
3.8%
26 17
3.4%
25 18
3.6%
24 14
2.8%
23 11
2.2%
22 16
3.2%

시간(HOUR)
Real number (ℝ)

Distinct19
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.75
Minimum5
Maximum23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:29.622318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile6
Q19
median14
Q318
95-th percentile22
Maximum23
Range18
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.0197506
Coefficient of variation (CV)0.36507277
Kurtosis-1.213784
Mean13.75
Median Absolute Deviation (MAD)4
Skewness-0.047035598
Sum6875
Variance25.197896
MonotonicityNot monotonic
2024-01-14T15:50:29.762955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
18 42
 
8.4%
8 41
 
8.2%
16 40
 
8.0%
17 35
 
7.0%
7 33
 
6.6%
9 32
 
6.4%
13 30
 
6.0%
15 27
 
5.4%
12 27
 
5.4%
19 25
 
5.0%
Other values (9) 168
33.6%
ValueCountFrequency (%)
5 12
 
2.4%
6 24
4.8%
7 33
6.6%
8 41
8.2%
9 32
6.4%
10 20
4.0%
11 18
3.6%
12 27
5.4%
13 30
6.0%
14 20
4.0%
ValueCountFrequency (%)
23 4
 
0.8%
22 22
4.4%
21 25
5.0%
20 23
4.6%
19 25
5.0%
18 42
8.4%
17 35
7.0%
16 40
8.0%
15 27
5.4%
14 20
4.0%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
0
262 
30
238 

Length

Max length2
Median length1
Mean length1.476
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30
2nd row30
3rd row0
4th row30
5th row0

Common Values

ValueCountFrequency (%)
0 262
52.4%
30 238
47.6%

Length

2024-01-14T15:50:29.918068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:50:30.039557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 262
52.4%
30 238
47.6%

사용자구분(BILL_USER)
Categorical

IMBALANCE 

Distinct6
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
일반
348 
경로
108 
장애인
 
23
청소년
 
15
국가유공자
 
4

Length

Max length5
Median length2
Mean length2.104
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row일반
4th row일반
5th row경로

Common Values

ValueCountFrequency (%)
일반 348
69.6%
경로 108
 
21.6%
장애인 23
 
4.6%
청소년 15
 
3.0%
국가유공자 4
 
0.8%
어린이 2
 
0.4%

Length

2024-01-14T15:50:30.162513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:50:30.275835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 348
69.6%
경로 108
 
21.6%
장애인 23
 
4.6%
청소년 15
 
3.0%
국가유공자 4
 
0.8%
어린이 2
 
0.4%
Distinct233
Distinct (%)46.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1676.432
Minimum150
Maximum4710
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:30.413194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile198.9
Q1311
median2519
Q32715
95-th percentile4116.05
Maximum4710
Range4560
Interquartile range (IQR)2404

Descriptive statistics

Standard deviation1373.8533
Coefficient of variation (CV)0.81951033
Kurtosis-1.3160076
Mean1676.432
Median Absolute Deviation (MAD)1608
Skewness0.22451864
Sum838216
Variance1887473
MonotonicityNot monotonic
2024-01-14T15:50:30.581881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
311 6
 
1.2%
221 5
 
1.0%
220 5
 
1.0%
219 5
 
1.0%
229 5
 
1.0%
201 5
 
1.0%
340 5
 
1.0%
212 5
 
1.0%
206 5
 
1.0%
2718 5
 
1.0%
Other values (223) 449
89.8%
ValueCountFrequency (%)
150 4
0.8%
151 2
 
0.4%
152 3
0.6%
153 2
 
0.4%
154 2
 
0.4%
155 4
0.8%
156 2
 
0.4%
157 4
0.8%
159 2
 
0.4%
201 5
1.0%
ValueCountFrequency (%)
4710 2
0.4%
4709 3
0.6%
4708 1
 
0.2%
4706 1
 
0.2%
4703 1
 
0.2%
4138 1
 
0.2%
4136 2
0.4%
4134 1
 
0.2%
4131 1
 
0.2%
4129 2
0.4%
Distinct11
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2호선
106 
5호선
81 
7호선
64 
3호선
57 
6호선
49 
Other values (6)
143 

Length

Max length8
Median length3
Mean length3.138
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9호선
2nd row8호선
3rd row5호선
4th row5호선
5th row4호선

Common Values

ValueCountFrequency (%)
2호선 106
21.2%
5호선 81
16.2%
7호선 64
12.8%
3호선 57
11.4%
6호선 49
9.8%
4호선 48
9.6%
8호선 30
 
6.0%
9호선 27
 
5.4%
1호선 20
 
4.0%
9호선2~3단계 11
 
2.2%

Length

2024-01-14T15:50:30.701514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2호선 106
21.2%
5호선 81
16.2%
7호선 64
12.8%
3호선 57
11.4%
6호선 49
9.8%
4호선 48
9.6%
8호선 30
 
6.0%
9호선 27
 
5.4%
1호선 20
 
4.0%
9호선2~3단계 11
 
2.2%
Distinct210
Distinct (%)42.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2024-01-14T15:50:30.956438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.51
Min length2

Characters and Unicode

Total characters2255
Distinct characters232
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)16.2%

Sample

1st row마천
2nd row홍대입구
3rd row흑석(중앙대입구)
4th row건대입구
5th row경복궁(정부서울청사)
ValueCountFrequency (%)
잠실(송파구청 13
 
2.6%
종로3가 13
 
2.6%
당산 9
 
1.8%
고속터미널 8
 
1.6%
강남 6
 
1.2%
홍대입구 6
 
1.2%
공덕 6
 
1.2%
서울역 6
 
1.2%
구로디지털단지 6
 
1.2%
신림 6
 
1.2%
Other values (200) 421
84.2%
2024-01-14T15:50:31.445963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 120
 
5.3%
) 120
 
5.3%
103
 
4.6%
76
 
3.4%
67
 
3.0%
56
 
2.5%
52
 
2.3%
43
 
1.9%
42
 
1.9%
37
 
1.6%
Other values (222) 1539
68.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1968
87.3%
Open Punctuation 120
 
5.3%
Close Punctuation 120
 
5.3%
Decimal Number 23
 
1.0%
Uppercase Letter 15
 
0.7%
Other Punctuation 9
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
103
 
5.2%
76
 
3.9%
67
 
3.4%
56
 
2.8%
52
 
2.6%
43
 
2.2%
42
 
2.1%
37
 
1.9%
36
 
1.8%
34
 
1.7%
Other values (211) 1422
72.3%
Decimal Number
ValueCountFrequency (%)
3 16
69.6%
4 4
 
17.4%
5 1
 
4.3%
9 1
 
4.3%
1 1
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
D 10
66.7%
P 5
33.3%
Other Punctuation
ValueCountFrequency (%)
. 8
88.9%
· 1
 
11.1%
Open Punctuation
ValueCountFrequency (%)
( 120
100.0%
Close Punctuation
ValueCountFrequency (%)
) 120
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1968
87.3%
Common 272
 
12.1%
Latin 15
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
103
 
5.2%
76
 
3.9%
67
 
3.4%
56
 
2.8%
52
 
2.6%
43
 
2.2%
42
 
2.1%
37
 
1.9%
36
 
1.8%
34
 
1.7%
Other values (211) 1422
72.3%
Common
ValueCountFrequency (%)
( 120
44.1%
) 120
44.1%
3 16
 
5.9%
. 8
 
2.9%
4 4
 
1.5%
5 1
 
0.4%
9 1
 
0.4%
· 1
 
0.4%
1 1
 
0.4%
Latin
ValueCountFrequency (%)
D 10
66.7%
P 5
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1968
87.3%
ASCII 286
 
12.7%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 120
42.0%
) 120
42.0%
3 16
 
5.6%
D 10
 
3.5%
. 8
 
2.8%
P 5
 
1.7%
4 4
 
1.4%
5 1
 
0.3%
9 1
 
0.3%
1 1
 
0.3%
Hangul
ValueCountFrequency (%)
103
 
5.2%
76
 
3.9%
67
 
3.4%
56
 
2.8%
52
 
2.6%
43
 
2.2%
42
 
2.1%
37
 
1.9%
36
 
1.8%
34
 
1.7%
Other values (211) 1422
72.3%
None
ValueCountFrequency (%)
· 1
100.0%
Distinct235
Distinct (%)47.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1699.51
Minimum150
Maximum4709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:31.608810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile202
Q1242.75
median2533.5
Q32721
95-th percentile4117.05
Maximum4709
Range4559
Interquartile range (IQR)2478.25

Descriptive statistics

Standard deviation1380.0865
Coefficient of variation (CV)0.81204967
Kurtosis-1.4234782
Mean1699.51
Median Absolute Deviation (MAD)1589.5
Skewness0.16435701
Sum849755
Variance1904638.8
MonotonicityNot monotonic
2024-01-14T15:50:31.767834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
239 7
 
1.4%
327 6
 
1.2%
420 6
 
1.2%
211 6
 
1.2%
216 6
 
1.2%
214 6
 
1.2%
331 5
 
1.0%
225 5
 
1.0%
329 5
 
1.0%
2816 5
 
1.0%
Other values (225) 443
88.6%
ValueCountFrequency (%)
150 3
0.6%
151 3
0.6%
153 2
0.4%
154 1
 
0.2%
155 2
0.4%
157 4
0.8%
158 3
0.6%
159 1
 
0.2%
201 3
0.6%
202 4
0.8%
ValueCountFrequency (%)
4709 3
0.6%
4138 2
0.4%
4136 3
0.6%
4134 3
0.6%
4133 1
 
0.2%
4132 1
 
0.2%
4127 1
 
0.2%
4125 3
0.6%
4124 2
0.4%
4123 2
0.4%
Distinct11
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2호선
123 
5호선
75 
7호선
59 
3호선
56 
4호선
48 
Other values (6)
139 

Length

Max length8
Median length3
Mean length3.12
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6호선
2nd row4호선
3rd row2호선
4th row6호선
5th row2호선

Common Values

ValueCountFrequency (%)
2호선 123
24.6%
5호선 75
15.0%
7호선 59
11.8%
3호선 56
11.2%
4호선 48
 
9.6%
6호선 42
 
8.4%
9호선 33
 
6.6%
8호선 26
 
5.2%
1호선 20
 
4.0%
우이신설선 10
 
2.0%

Length

2024-01-14T15:50:31.926279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2호선 123
24.6%
5호선 75
15.0%
7호선 59
11.8%
3호선 56
11.2%
4호선 48
 
9.6%
6호선 42
 
8.4%
9호선 33
 
6.6%
8호선 26
 
5.2%
1호선 20
 
4.0%
우이신설선 10
 
2.0%
Distinct223
Distinct (%)44.6%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2024-01-14T15:50:32.280706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.332
Min length2

Characters and Unicode

Total characters2166
Distinct characters238
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)17.2%

Sample

1st row을지로입구
2nd row신당
3rd row상월곡(한국과학기술연구원)
4th row고려대(종암)
5th row청량리(서울시립대입구)
ValueCountFrequency (%)
압구정 8
 
1.6%
을지로3가 8
 
1.6%
건대입구 7
 
1.4%
강남 7
 
1.4%
을지로입구 6
 
1.2%
신당 6
 
1.2%
당산 6
 
1.2%
동대문역사문화공원(ddp 6
 
1.2%
양재(서초구청 6
 
1.2%
종로3가 6
 
1.2%
Other values (213) 434
86.8%
2024-01-14T15:50:32.722749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 104
 
4.8%
( 104
 
4.8%
90
 
4.2%
82
 
3.8%
67
 
3.1%
50
 
2.3%
43
 
2.0%
41
 
1.9%
38
 
1.8%
35
 
1.6%
Other values (228) 1512
69.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1910
88.2%
Close Punctuation 104
 
4.8%
Open Punctuation 104
 
4.8%
Decimal Number 22
 
1.0%
Uppercase Letter 18
 
0.8%
Other Punctuation 8
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
90
 
4.7%
82
 
4.3%
67
 
3.5%
50
 
2.6%
43
 
2.3%
41
 
2.1%
38
 
2.0%
35
 
1.8%
33
 
1.7%
32
 
1.7%
Other values (217) 1399
73.2%
Decimal Number
ValueCountFrequency (%)
3 14
63.6%
4 3
 
13.6%
5 3
 
13.6%
1 1
 
4.5%
9 1
 
4.5%
Uppercase Letter
ValueCountFrequency (%)
D 12
66.7%
P 6
33.3%
Other Punctuation
ValueCountFrequency (%)
. 6
75.0%
· 2
 
25.0%
Close Punctuation
ValueCountFrequency (%)
) 104
100.0%
Open Punctuation
ValueCountFrequency (%)
( 104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1910
88.2%
Common 238
 
11.0%
Latin 18
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
90
 
4.7%
82
 
4.3%
67
 
3.5%
50
 
2.6%
43
 
2.3%
41
 
2.1%
38
 
2.0%
35
 
1.8%
33
 
1.7%
32
 
1.7%
Other values (217) 1399
73.2%
Common
ValueCountFrequency (%)
) 104
43.7%
( 104
43.7%
3 14
 
5.9%
. 6
 
2.5%
4 3
 
1.3%
5 3
 
1.3%
· 2
 
0.8%
1 1
 
0.4%
9 1
 
0.4%
Latin
ValueCountFrequency (%)
D 12
66.7%
P 6
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1910
88.2%
ASCII 254
 
11.7%
None 2
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 104
40.9%
( 104
40.9%
3 14
 
5.5%
D 12
 
4.7%
P 6
 
2.4%
. 6
 
2.4%
4 3
 
1.2%
5 3
 
1.2%
1 1
 
0.4%
9 1
 
0.4%
Hangul
ValueCountFrequency (%)
90
 
4.7%
82
 
4.3%
67
 
3.5%
50
 
2.6%
43
 
2.3%
41
 
2.1%
38
 
2.0%
35
 
1.8%
33
 
1.7%
32
 
1.7%
Other values (217) 1399
73.2%
None
ValueCountFrequency (%)
· 2
100.0%

인원합계(PASSN_CNT)
Real number (ℝ)

Distinct22
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.644
Minimum1
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-01-14T15:50:32.845539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile8.05
Maximum47
Range46
Interquartile range (IQR)1

Descriptive statistics

Standard deviation4.2162564
Coefficient of variation (CV)1.5946507
Kurtosis40.376813
Mean2.644
Median Absolute Deviation (MAD)0
Skewness5.4702155
Sum1322
Variance17.776818
MonotonicityNot monotonic
2024-01-14T15:50:32.953566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
1 303
60.6%
2 75
 
15.0%
3 36
 
7.2%
4 23
 
4.6%
5 15
 
3.0%
8 9
 
1.8%
6 9
 
1.8%
7 5
 
1.0%
11 3
 
0.6%
12 3
 
0.6%
Other values (12) 19
 
3.8%
ValueCountFrequency (%)
1 303
60.6%
2 75
 
15.0%
3 36
 
7.2%
4 23
 
4.6%
5 15
 
3.0%
6 9
 
1.8%
7 5
 
1.0%
8 9
 
1.8%
9 2
 
0.4%
10 2
 
0.4%
ValueCountFrequency (%)
47 1
0.2%
35 1
0.2%
34 1
0.2%
23 2
0.4%
20 2
0.4%
18 2
0.4%
16 2
0.4%
15 1
0.2%
14 2
0.4%
13 1
0.2%

Interactions

2024-01-14T15:50:27.702860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.032154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.587408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.116341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.638631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:26.840206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.811721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.113622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.664905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.197428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.718095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.120054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.897299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.222417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.745252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.306400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.793699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.327247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.991686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.336857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.827953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.390677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:26.258686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.432965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:28.077694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.421810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.952586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.468576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:26.467897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.518601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:28.163248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:24.506397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.037982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:25.562014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:26.655488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-14T15:50:27.591831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-14T15:50:33.035380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년(YEAR)월(MONTH)일(DAY)시간(HOUR)분_30분단위(HALF_HOUR)사용자구분(BILL_USER)승차역ID(GETON_STATION_ID)승차_호선명(GETON_LINE_NM)하차역ID(GETOFF_STATION_ID)하차_호선명(GETOFF_LINE_NM)인원합계(PASSN_CNT)
년(YEAR)1.0000.2450.0000.0000.0980.0000.0810.0000.0000.1450.000
월(MONTH)0.2451.0000.0000.3570.0000.0480.0000.0000.0650.1210.000
일(DAY)0.0000.0001.0000.0490.0100.0610.1350.1320.1350.0000.092
시간(HOUR)0.0000.3570.0491.0000.0000.0000.0000.1130.0520.0000.049
분_30분단위(HALF_HOUR)0.0980.0000.0100.0001.0000.0000.0000.0000.1290.0740.000
사용자구분(BILL_USER)0.0000.0480.0610.0000.0001.0000.0000.0000.0000.0000.000
승차역ID(GETON_STATION_ID)0.0810.0000.1350.0000.0000.0001.0000.0000.2630.0000.080
승차_호선명(GETON_LINE_NM)0.0000.0000.1320.1130.0000.0000.0001.0000.0000.0000.000
하차역ID(GETOFF_STATION_ID)0.0000.0650.1350.0520.1290.0000.2630.0001.0000.0960.000
하차_호선명(GETOFF_LINE_NM)0.1450.1210.0000.0000.0740.0000.0000.0000.0961.0000.000
인원합계(PASSN_CNT)0.0000.0000.0920.0490.0000.0000.0800.0000.0000.0001.000
2024-01-14T15:50:33.175961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년(YEAR)분_30분단위(HALF_HOUR)승차_호선명(GETON_LINE_NM)사용자구분(BILL_USER)하차_호선명(GETOFF_LINE_NM)
년(YEAR)1.0000.1190.0000.0000.079
분_30분단위(HALF_HOUR)0.1191.0000.0000.0000.070
승차_호선명(GETON_LINE_NM)0.0000.0001.0000.0000.000
사용자구분(BILL_USER)0.0000.0000.0001.0000.000
하차_호선명(GETOFF_LINE_NM)0.0790.0700.0000.0001.000
2024-01-14T15:50:33.287079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
월(MONTH)일(DAY)시간(HOUR)승차역ID(GETON_STATION_ID)하차역ID(GETOFF_STATION_ID)인원합계(PASSN_CNT)년(YEAR)분_30분단위(HALF_HOUR)사용자구분(BILL_USER)승차_호선명(GETON_LINE_NM)하차_호선명(GETOFF_LINE_NM)
월(MONTH)1.0000.0190.0130.0110.081-0.0450.1040.0000.0240.0000.051
일(DAY)0.0191.000-0.0290.0350.0490.0160.0000.0000.0000.1040.000
시간(HOUR)0.013-0.0291.000-0.0830.009-0.0690.0000.0000.0000.0400.000
승차역ID(GETON_STATION_ID)0.0110.035-0.0831.000-0.035-0.0270.0670.0000.0000.0000.000
하차역ID(GETOFF_STATION_ID)0.0810.0490.009-0.0351.0000.0570.0000.0850.0000.0000.052
인원합계(PASSN_CNT)-0.0450.016-0.069-0.0270.0571.0000.0000.0000.0000.0000.000
년(YEAR)0.1040.0000.0000.0670.0000.0001.0000.1190.0000.0000.079
분_30분단위(HALF_HOUR)0.0000.0000.0000.0000.0850.0000.1191.0000.0000.0000.070
사용자구분(BILL_USER)0.0240.0000.0000.0000.0000.0000.0000.0001.0000.0000.000
승차_호선명(GETON_LINE_NM)0.0000.1040.0400.0000.0000.0000.0000.0000.0001.0000.000
하차_호선명(GETOFF_LINE_NM)0.0510.0000.0000.0000.0520.0000.0790.0700.0000.0001.000

Missing values

2024-01-14T15:50:28.350784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-14T15:50:28.614700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

년(YEAR)월(MONTH)일(DAY)시간(HOUR)분_30분단위(HALF_HOUR)사용자구분(BILL_USER)승차역ID(GETON_STATION_ID)승차_호선명(GETON_LINE_NM)승차_역명(GETON_STATION_NM)하차역ID(GETOFF_STATION_ID)하차_호선명(GETOFF_LINE_NM)하차_역명(GETOFF_STATION_NM)인원합계(PASSN_CNT)
02019223930일반2339호선마천4126호선을지로입구3
120216131430일반2438호선홍대입구2284호선신당2
22020517220일반27265호선흑석(중앙대입구)2072호선상월곡(한국과학기술연구원)1
320184181730일반4255호선건대입구25236호선고려대(종암)2
42019710180경로2264호선경복궁(정부서울청사)4202호선청량리(서울시립대입구)2
52018371530일반25399호선잠실(송파구청)2263호선종각1
620203202230일반2045호선여의도25502호선답십리1
72020222130경로3313호선홍대입구26202호선종로3가16
820191690일반26447호선망원41102호선공덕1
9201911221430일반26287호선압구정26302호선개롱5
년(YEAR)월(MONTH)일(DAY)시간(HOUR)분_30분단위(HALF_HOUR)사용자구분(BILL_USER)승차역ID(GETON_STATION_ID)승차_호선명(GETON_LINE_NM)승차_역명(GETON_STATION_NM)하차역ID(GETOFF_STATION_ID)하차_호선명(GETOFF_LINE_NM)하차_역명(GETOFF_STATION_NM)인원합계(PASSN_CNT)
4902020491830일반27368호선종로3가2508호선염창1
49120207301530일반2482호선학동25655호선홍제1
492202010580일반27402호선동대문역사문화공원(DDP)2199호선우장산1
4932018622130일반41106호선천왕4205호선상월곡(한국과학기술연구원)1
494201812202230일반1576호선충무로41172호선종로5가1
4952021629180일반4205호선신목동2143호선명동13
4962021723120경로25486호선오금27536호선사당2
4972019312210일반2307호선올림픽공원(한국체대)27574호선증산(명지대앞)3
4982019121460일반28272호선봉천26276호선강변(동서울터미널)4
4992021109210일반41237호선잠실(송파구청)26407호선을지로3가1