Overview

Dataset statistics

Number of variables9
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory37.7 KiB
Average record size in memory77.3 B

Variable types

DateTime1
Numeric4
Categorical3
Text1

Dataset

Description샘플 데이터
Author서울시(스마트카드사)
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=14

Alerts

호선ID is highly overall correlated with 역ID and 1 other fieldsHigh correlation
역ID is highly overall correlated with 호선ID and 1 other fieldsHigh correlation
승차총승객수 is highly overall correlated with 하차총승객수High correlation
하차총승객수 is highly overall correlated with 승차총승객수High correlation
호선 is highly overall correlated with 호선ID and 1 other fieldsHigh correlation
승차총승객수 has 8 (1.6%) zerosZeros
하차총승객수 has 15 (3.0%) zerosZeros

Reproduction

Analysis started2024-04-16 11:26:20.862027
Analysis finished2024-04-16 11:26:22.978626
Duration2.12 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct360
Distinct (%)72.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
Minimum2014-01-02 00:00:00
Maximum2015-10-31 00:00:00
2024-04-16T20:26:23.033395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:23.138565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

호선ID
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean149.978
Minimum1
Maximum404
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-16T20:26:23.233637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median205
Q3207
95-th percentile401
Maximum404
Range403
Interquartile range (IQR)204

Descriptive statistics

Standard deviation128.22538
Coefficient of variation (CV)0.85496129
Kurtosis-0.76439235
Mean149.978
Median Absolute Deviation (MAD)3
Skewness0.26144135
Sum74989
Variance16441.749
MonotonicityNot monotonic
2024-04-16T20:26:23.323469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
205 79
15.8%
2 78
15.6%
207 77
15.4%
206 75
15.0%
3 60
12.0%
401 39
7.8%
4 32
6.4%
208 29
 
5.8%
1 18
 
3.6%
404 13
 
2.6%
ValueCountFrequency (%)
1 18
 
3.6%
2 78
15.6%
3 60
12.0%
4 32
6.4%
205 79
15.8%
206 75
15.0%
207 77
15.4%
208 29
 
5.8%
401 39
7.8%
404 13
 
2.6%
ValueCountFrequency (%)
404 13
 
2.6%
401 39
7.8%
208 29
 
5.8%
207 77
15.4%
206 75
15.0%
205 79
15.8%
4 32
6.4%
3 60
12.0%
2 78
15.6%
1 18
 
3.6%

호선
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
5호선
79 
2호선
78 
7호선
77 
6호선
75 
3호선
60 
Other values (5)
131 

Length

Max length6
Median length3
Mean length3.078
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row6호선
2nd row8호선
3rd row6호선
4th row6호선
5th row2호선

Common Values

ValueCountFrequency (%)
5호선 79
15.8%
2호선 78
15.6%
7호선 77
15.4%
6호선 75
15.0%
3호선 60
12.0%
9호선 39
7.8%
4호선 32
6.4%
8호선 29
 
5.8%
1호선 18
 
3.6%
9호선2단계 13
 
2.6%

Length

2024-04-16T20:26:23.452836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-16T20:26:23.557666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5호선 79
15.8%
2호선 78
15.6%
7호선 77
15.4%
6호선 75
15.0%
3호선 60
12.0%
9호선 39
7.8%
4호선 32
6.4%
8호선 29
 
5.8%
1호선 18
 
3.6%
9호선2단계 13
 
2.6%

역ID
Real number (ℝ)

HIGH CORRELATION 

Distinct243
Distinct (%)48.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1914.57
Minimum150
Maximum4130
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-16T20:26:23.683398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile203
Q1324
median2547.5
Q32730.5
95-th percentile4117.05
Maximum4130
Range3980
Interquartile range (IQR)2406.5

Descriptive statistics

Standard deviation1341.879
Coefficient of variation (CV)0.7008775
Kurtosis-1.3523634
Mean1914.57
Median Absolute Deviation (MAD)273.5
Skewness-0.13006505
Sum957285
Variance1800639.4
MonotonicityNot monotonic
2024-04-16T20:26:23.805499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
309 6
 
1.2%
2537 5
 
1.0%
4117 5
 
1.0%
2730 5
 
1.0%
226 5
 
1.0%
2526 5
 
1.0%
4126 4
 
0.8%
4110 4
 
0.8%
424 4
 
0.8%
4125 4
 
0.8%
Other values (233) 453
90.6%
ValueCountFrequency (%)
150 1
 
0.2%
151 3
0.6%
153 2
0.4%
155 1
 
0.2%
156 1
 
0.2%
157 3
0.6%
158 4
0.8%
159 3
0.6%
201 3
0.6%
202 3
0.6%
ValueCountFrequency (%)
4130 2
0.4%
4129 4
0.8%
4128 2
0.4%
4127 1
 
0.2%
4126 4
0.8%
4125 4
0.8%
4124 1
 
0.2%
4123 2
0.4%
4122 2
0.4%
4121 1
 
0.2%


Text

Distinct217
Distinct (%)43.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2024-04-16T20:26:24.080805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length3.178
Min length2

Characters and Unicode

Total characters1589
Distinct characters201
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique76 ?
Unique (%)15.2%

Sample

1st row대흥
2nd row수진
3rd row화랑대
4th row버티고개
5th row도림천
ValueCountFrequency (%)
동대문역사문화공원 9
 
1.8%
영등포구청 8
 
1.6%
천호 7
 
1.4%
고속터미널 7
 
1.4%
을지로3가 6
 
1.2%
지축 6
 
1.2%
시청 6
 
1.2%
가락시장 6
 
1.2%
신길 5
 
1.0%
종로3가 5
 
1.0%
Other values (207) 435
87.0%
2024-04-16T20:26:24.439775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
56
 
3.5%
53
 
3.3%
50
 
3.1%
49
 
3.1%
37
 
2.3%
35
 
2.2%
35
 
2.2%
32
 
2.0%
27
 
1.7%
26
 
1.6%
Other values (191) 1189
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1552
97.7%
Decimal Number 13
 
0.8%
Open Punctuation 12
 
0.8%
Close Punctuation 12
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
56
 
3.6%
53
 
3.4%
50
 
3.2%
49
 
3.2%
37
 
2.4%
35
 
2.3%
35
 
2.3%
32
 
2.1%
27
 
1.7%
26
 
1.7%
Other values (187) 1152
74.2%
Decimal Number
ValueCountFrequency (%)
3 11
84.6%
4 2
 
15.4%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1552
97.7%
Common 37
 
2.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
56
 
3.6%
53
 
3.4%
50
 
3.2%
49
 
3.2%
37
 
2.4%
35
 
2.3%
35
 
2.3%
32
 
2.1%
27
 
1.7%
26
 
1.7%
Other values (187) 1152
74.2%
Common
ValueCountFrequency (%)
( 12
32.4%
) 12
32.4%
3 11
29.7%
4 2
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1552
97.7%
ASCII 37
 
2.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
56
 
3.6%
53
 
3.4%
50
 
3.2%
49
 
3.2%
37
 
2.4%
35
 
2.3%
35
 
2.3%
32
 
2.1%
27
 
1.7%
26
 
1.7%
Other values (187) 1152
74.2%
ASCII
ValueCountFrequency (%)
( 12
32.4%
) 12
32.4%
3 11
29.7%
4 2
 
5.4%
Distinct22
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
05:00:00~05:59:59
 
35
00:00:00~00:59:59
 
29
14:00:00~14:59:59
 
29
23:00:00~23:59:59
 
28
08:00:00~08:59:59
 
27
Other values (17)
352 

Length

Max length17
Median length17
Mean length17
Min length17

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row21:00:00~21:59:59
2nd row22:00:00~22:59:59
3rd row17:00:00~17:59:59
4th row12:00:00~12:59:59
5th row09:00:00~09:59:59

Common Values

ValueCountFrequency (%)
05:00:00~05:59:59 35
 
7.0%
00:00:00~00:59:59 29
 
5.8%
14:00:00~14:59:59 29
 
5.8%
23:00:00~23:59:59 28
 
5.6%
08:00:00~08:59:59 27
 
5.4%
21:00:00~21:59:59 27
 
5.4%
22:00:00~22:59:59 26
 
5.2%
19:00:00~19:59:59 26
 
5.2%
16:00:00~16:59:59 25
 
5.0%
17:00:00~17:59:59 25
 
5.0%
Other values (12) 223
44.6%

Length

2024-04-16T20:26:24.554386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
05:00:00~05:59:59 35
 
7.0%
00:00:00~00:59:59 29
 
5.8%
14:00:00~14:59:59 29
 
5.8%
23:00:00~23:59:59 28
 
5.6%
08:00:00~08:59:59 27
 
5.4%
21:00:00~21:59:59 27
 
5.4%
22:00:00~22:59:59 26
 
5.2%
19:00:00~19:59:59 26
 
5.2%
16:00:00~16:59:59 25
 
5.0%
17:00:00~17:59:59 25
 
5.0%
Other values (12) 223
44.6%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
0
263 
30
237 

Length

Max length2
Median length1
Mean length1.474
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30
2nd row0
3rd row30
4th row0
5th row30

Common Values

ValueCountFrequency (%)
0 263
52.6%
30 237
47.4%

Length

2024-04-16T20:26:24.656369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-16T20:26:24.732668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 263
52.6%
30 237
47.4%

승차총승객수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct366
Distinct (%)73.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean415.954
Minimum0
Maximum5463
Zeros8
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-16T20:26:24.818920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10.95
Q184
median241.5
Q3492.5
95-th percentile1480.15
Maximum5463
Range5463
Interquartile range (IQR)408.5

Descriptive statistics

Standard deviation574.5776
Coefficient of variation (CV)1.3813489
Kurtosis21.003083
Mean415.954
Median Absolute Deviation (MAD)175.5
Skewness3.7541424
Sum207977
Variance330139.41
MonotonicityNot monotonic
2024-04-16T20:26:24.956786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8
 
1.6%
1 7
 
1.4%
66 7
 
1.4%
23 6
 
1.2%
240 5
 
1.0%
226 4
 
0.8%
203 4
 
0.8%
41 4
 
0.8%
277 3
 
0.6%
181 3
 
0.6%
Other values (356) 449
89.8%
ValueCountFrequency (%)
0 8
1.6%
1 7
1.4%
2 1
 
0.2%
3 2
 
0.4%
4 1
 
0.2%
5 1
 
0.2%
7 2
 
0.4%
8 1
 
0.2%
10 2
 
0.4%
11 1
 
0.2%
ValueCountFrequency (%)
5463 1
0.2%
4309 1
0.2%
3889 1
0.2%
3350 1
0.2%
2999 1
0.2%
2771 1
0.2%
2641 1
0.2%
2481 1
0.2%
2197 1
0.2%
2138 1
0.2%

하차총승객수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct373
Distinct (%)74.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean399.718
Minimum0
Maximum5554
Zeros15
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-16T20:26:25.064785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.95
Q192.75
median233
Q3499.5
95-th percentile1335.1
Maximum5554
Range5554
Interquartile range (IQR)406.75

Descriptive statistics

Standard deviation512.14107
Coefficient of variation (CV)1.281256
Kurtosis25.033933
Mean399.718
Median Absolute Deviation (MAD)171.5
Skewness3.7949991
Sum199859
Variance262288.47
MonotonicityNot monotonic
2024-04-16T20:26:25.181391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 15
 
3.0%
31 5
 
1.0%
197 4
 
0.8%
79 4
 
0.8%
295 4
 
0.8%
69 4
 
0.8%
25 4
 
0.8%
495 3
 
0.6%
14 3
 
0.6%
59 3
 
0.6%
Other values (363) 451
90.2%
ValueCountFrequency (%)
0 15
3.0%
1 3
 
0.6%
3 3
 
0.6%
4 2
 
0.4%
6 1
 
0.2%
7 1
 
0.2%
8 2
 
0.4%
9 1
 
0.2%
10 1
 
0.2%
12 2
 
0.4%
ValueCountFrequency (%)
5554 1
0.2%
3390 1
0.2%
3009 1
0.2%
2659 1
0.2%
2318 1
0.2%
2087 1
0.2%
2069 1
0.2%
2059 1
0.2%
1961 1
0.2%
1935 1
0.2%

Interactions

2024-04-16T20:26:22.425197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.195449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.804323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.126237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.531254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.521152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.891052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.207179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.612007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.604890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.972120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.283109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.689146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:21.712641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.049278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-16T20:26:22.350636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-16T20:26:25.276616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선ID호선역ID승차시간구분30분시간구간ID승차총승객수하차총승객수
호선ID1.0001.0001.0000.1130.0000.3830.293
호선1.0001.0000.9710.1280.0000.2600.270
역ID1.0000.9711.0000.1000.0000.1970.313
승차시간구분0.1130.1280.1001.0000.2020.2810.199
30분시간구간ID0.0000.0000.0000.2021.0000.0000.044
승차총승객수0.3830.2600.1970.2810.0001.0000.732
하차총승객수0.2930.2700.3130.1990.0440.7321.000
2024-04-16T20:26:25.387312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선승차시간구분30분시간구간ID
호선1.0000.0460.000
승차시간구분0.0461.0000.156
30분시간구간ID0.0000.1561.000
2024-04-16T20:26:25.469718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선ID역ID승차총승객수하차총승객수호선승차시간구분30분시간구간ID
호선ID1.0000.991-0.142-0.1250.9930.0560.000
역ID0.9911.000-0.143-0.1270.9090.0530.000
승차총승객수-0.142-0.1431.0000.7510.1200.1110.000
하차총승객수-0.125-0.1270.7511.0000.1310.0800.033
호선0.9930.9090.1200.1311.0000.0460.000
승차시간구분0.0560.0530.1110.0800.0461.0000.156
30분시간구간ID0.0000.0000.0000.0330.0000.1561.000

Missing values

2024-04-16T20:26:22.819081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-16T20:26:22.935370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

운행일자호선ID호선역ID승차시간구분30분시간구간ID승차총승객수하차총승객수
02014-05-212066호선2626대흥21:00:00~21:59:5930110143
12015-01-152088호선2826수진22:00:00~22:59:59069165
22014-07-182066호선2647화랑대17:00:00~17:59:5930544414
32015-01-192066호선2633버티고개12:00:00~12:59:5905946
42014-06-1822호선247도림천09:00:00~09:59:59303567
52015-05-082066호선2617새절11:00:00~11:59:5930260122
62014-12-082055호선2534광화문22:00:00~22:59:5930977175
72014-02-162088호선2814몽촌토성16:00:00~16:59:590240218
82014-02-0322호선213구의05:00:00~05:59:5901160
92014-08-0122호선237당산21:00:00~21:59:590320475
운행일자호선ID호선역ID승차시간구분30분시간구간ID승차총승객수하차총승객수
4902015-04-192077호선2749철산20:00:00~20:59:5903971252
4912015-06-1411호선151시청19:00:00~19:59:5901957491
4922014-12-3133호선329고속터미널22:00:00~22:59:5901144633
4932015-10-272055호선2548천호15:00:00~15:59:5930648687
4942014-05-252066호선2616구산13:00:00~13:59:5930194114
4952015-07-192077호선2735반포22:00:00~22:59:59019069
4962015-06-072055호선2559개롱17:00:00~17:59:590115214
4972015-01-244049호선2단계4126언주06:00:00~06:59:5902579
4982014-07-282077호선2727군자17:00:00~17:59:5930316321
4992014-07-242066호선2640안암17:00:00~17:59:590954392