Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory585.9 KiB
Average record size in memory60.0 B

Variable types

Numeric4
Categorical1
Text1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12914/S/1/datasetView.do

Alerts

사용일자 is highly overall correlated with 등록일자High correlation
승차총승객수 is highly overall correlated with 하차총승객수High correlation
하차총승객수 is highly overall correlated with 승차총승객수High correlation
등록일자 is highly overall correlated with 사용일자High correlation

Reproduction

Analysis started2024-05-11 06:20:45.211863
Analysis finished2024-05-11 06:20:49.388067
Duration4.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사용일자
Real number (ℝ)

HIGH CORRELATION 

Distinct170
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20190348
Minimum20190101
Maximum20190619
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:20:49.541131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20190101
5-th percentile20190109
Q120190211
median20190326
Q320190507
95-th percentile20190610
Maximum20190619
Range518
Interquartile range (IQR)296

Descriptive statistics

Standard deviation162.0542
Coefficient of variation (CV)8.0263204 × 10-6
Kurtosis-1.2124412
Mean20190348
Median Absolute Deviation (MAD)122
Skewness0.021460156
Sum2.0190348 × 1011
Variance26261.564
MonotonicityNot monotonic
2024-05-11T15:20:49.824191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20190405 79
 
0.8%
20190420 77
 
0.8%
20190130 77
 
0.8%
20190310 75
 
0.8%
20190413 74
 
0.7%
20190128 72
 
0.7%
20190430 72
 
0.7%
20190216 72
 
0.7%
20190307 72
 
0.7%
20190219 71
 
0.7%
Other values (160) 9259
92.6%
ValueCountFrequency (%)
20190101 58
0.6%
20190102 62
0.6%
20190103 46
0.5%
20190104 59
0.6%
20190105 58
0.6%
20190106 58
0.6%
20190107 66
0.7%
20190108 54
0.5%
20190109 49
0.5%
20190110 60
0.6%
ValueCountFrequency (%)
20190619 4
 
< 0.1%
20190618 54
0.5%
20190617 65
0.7%
20190616 51
0.5%
20190615 54
0.5%
20190614 58
0.6%
20190613 66
0.7%
20190612 67
0.7%
20190611 61
0.6%
20190610 51
0.5%

노선명
Categorical

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
5호선
896 
7호선
845 
2호선
816 
경부선
658 
6호선
619 
Other values (20)
6166 

Length

Max length8
Median length3
Mean length3.2735
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4호선
2nd row경인선
3rd row2호선
4th row8호선
5th row경부선

Common Values

ValueCountFrequency (%)
5호선 896
 
9.0%
7호선 845
 
8.5%
2호선 816
 
8.2%
경부선 658
 
6.6%
6호선 619
 
6.2%
분당선 608
 
6.1%
3호선 598
 
6.0%
경원선 489
 
4.9%
경의선 431
 
4.3%
4호선 417
 
4.2%
Other values (15) 3623
36.2%

Length

2024-05-11T15:20:50.057783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
5호선 896
 
8.7%
7호선 845
 
8.2%
2호선 816
 
8.0%
경부선 658
 
6.4%
6호선 619
 
6.0%
분당선 608
 
5.9%
3호선 598
 
5.8%
경원선 489
 
4.8%
경의선 431
 
4.2%
1호선 422
 
4.1%
Other values (15) 3872
37.8%

역명
Text

Distinct509
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:20:51.012238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.5768
Min length2

Characters and Unicode

Total characters35768
Distinct characters292
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row충무로
2nd row부평
3rd row왕십리(성동구청)
4th row잠실(송파구청)
5th row독산
ValueCountFrequency (%)
서울역 87
 
0.9%
공덕 74
 
0.7%
김포공항 55
 
0.5%
디지털미디어시티 54
 
0.5%
종로3가 53
 
0.5%
왕십리(성동구청 52
 
0.5%
신설동 52
 
0.5%
홍대입구 52
 
0.5%
고속터미널 48
 
0.5%
오금 44
 
0.4%
Other values (499) 9429
94.3%
2024-05-11T15:20:51.751374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1251
 
3.5%
) 1181
 
3.3%
( 1181
 
3.3%
1177
 
3.3%
846
 
2.4%
748
 
2.1%
747
 
2.1%
706
 
2.0%
697
 
1.9%
563
 
1.6%
Other values (282) 26671
74.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 33110
92.6%
Close Punctuation 1181
 
3.3%
Open Punctuation 1181
 
3.3%
Decimal Number 233
 
0.7%
Other Punctuation 63
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1251
 
3.8%
1177
 
3.6%
846
 
2.6%
748
 
2.3%
747
 
2.3%
706
 
2.1%
697
 
2.1%
563
 
1.7%
561
 
1.7%
528
 
1.6%
Other values (273) 25286
76.4%
Decimal Number
ValueCountFrequency (%)
3 88
37.8%
4 56
24.0%
1 36
15.5%
5 20
 
8.6%
2 20
 
8.6%
9 13
 
5.6%
Close Punctuation
ValueCountFrequency (%)
) 1181
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1181
100.0%
Other Punctuation
ValueCountFrequency (%)
. 63
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 33110
92.6%
Common 2658
 
7.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1251
 
3.8%
1177
 
3.6%
846
 
2.6%
748
 
2.3%
747
 
2.3%
706
 
2.1%
697
 
2.1%
563
 
1.7%
561
 
1.7%
528
 
1.6%
Other values (273) 25286
76.4%
Common
ValueCountFrequency (%)
) 1181
44.4%
( 1181
44.4%
3 88
 
3.3%
. 63
 
2.4%
4 56
 
2.1%
1 36
 
1.4%
5 20
 
0.8%
2 20
 
0.8%
9 13
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 33110
92.6%
ASCII 2658
 
7.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1251
 
3.8%
1177
 
3.6%
846
 
2.6%
748
 
2.3%
747
 
2.3%
706
 
2.1%
697
 
2.1%
563
 
1.7%
561
 
1.7%
528
 
1.6%
Other values (273) 25286
76.4%
ASCII
ValueCountFrequency (%)
) 1181
44.4%
( 1181
44.4%
3 88
 
3.3%
. 63
 
2.4%
4 56
 
2.1%
1 36
 
1.4%
5 20
 
0.8%
2 20
 
0.8%
9 13
 
0.5%

승차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8221
Distinct (%)82.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12497.362
Minimum1
Maximum125284
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:20:52.059219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1079.6
Q14121.5
median8836.5
Q316307.25
95-th percentile37424.45
Maximum125284
Range125283
Interquartile range (IQR)12185.75

Descriptive statistics

Standard deviation12881.787
Coefficient of variation (CV)1.0307605
Kurtosis10.948277
Mean12497.362
Median Absolute Deviation (MAD)5481
Skewness2.6530986
Sum1.2497362 × 108
Variance1.6594043 × 108
MonotonicityNot monotonic
2024-05-11T15:20:52.344126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 47
 
0.5%
2 15
 
0.1%
3 7
 
0.1%
2178 6
 
0.1%
1374 5
 
0.1%
4411 5
 
0.1%
4557 4
 
< 0.1%
8547 4
 
< 0.1%
2179 4
 
< 0.1%
4030 4
 
< 0.1%
Other values (8211) 9899
99.0%
ValueCountFrequency (%)
1 47
0.5%
2 15
 
0.1%
3 7
 
0.1%
4 1
 
< 0.1%
5 2
 
< 0.1%
6 1
 
< 0.1%
20 1
 
< 0.1%
26 1
 
< 0.1%
29 1
 
< 0.1%
32 1
 
< 0.1%
ValueCountFrequency (%)
125284 1
< 0.1%
124516 1
< 0.1%
120636 1
< 0.1%
119912 1
< 0.1%
119816 1
< 0.1%
118278 1
< 0.1%
115657 1
< 0.1%
113431 1
< 0.1%
110394 1
< 0.1%
108878 1
< 0.1%

하차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8208
Distinct (%)82.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12447.271
Minimum0
Maximum129588
Zeros73
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:20:52.614087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1034.95
Q13951.5
median8590.5
Q316277.25
95-th percentile38668.15
Maximum129588
Range129588
Interquartile range (IQR)12325.75

Descriptive statistics

Standard deviation13079.112
Coefficient of variation (CV)1.0507614
Kurtosis10.613678
Mean12447.271
Median Absolute Deviation (MAD)5347
Skewness2.6205803
Sum1.2447271 × 108
Variance1.7106317 × 108
MonotonicityNot monotonic
2024-05-11T15:20:52.886876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 73
 
0.7%
988 5
 
0.1%
10757 5
 
0.1%
7617 5
 
0.1%
4945 5
 
0.1%
2142 4
 
< 0.1%
12743 4
 
< 0.1%
3902 4
 
< 0.1%
4953 4
 
< 0.1%
4890 4
 
< 0.1%
Other values (8198) 9887
98.9%
ValueCountFrequency (%)
0 73
0.7%
25 2
 
< 0.1%
26 1
 
< 0.1%
29 1
 
< 0.1%
30 1
 
< 0.1%
31 1
 
< 0.1%
33 3
 
< 0.1%
34 4
 
< 0.1%
35 3
 
< 0.1%
36 1
 
< 0.1%
ValueCountFrequency (%)
129588 1
< 0.1%
125097 1
< 0.1%
124399 1
< 0.1%
121176 1
< 0.1%
120632 1
< 0.1%
120374 1
< 0.1%
118613 1
< 0.1%
114449 1
< 0.1%
111249 1
< 0.1%
109494 1
< 0.1%

등록일자
Real number (ℝ)

HIGH CORRELATION 

Distinct170
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20190357
Minimum20190104
Maximum20190622
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:20:53.149426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20190104
5-th percentile20190112
Q120190214
median20190329
Q320190510
95-th percentile20190613
Maximum20190622
Range518
Interquartile range (IQR)296

Descriptive statistics

Standard deviation162.44891
Coefficient of variation (CV)8.0458662 × 10-6
Kurtosis-1.2073239
Mean20190357
Median Absolute Deviation (MAD)124
Skewness-0.0002721537
Sum2.0190357 × 1011
Variance26389.648
MonotonicityNot monotonic
2024-05-11T15:20:53.507650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20190408 79
 
0.8%
20190423 77
 
0.8%
20190202 77
 
0.8%
20190313 75
 
0.8%
20190416 74
 
0.7%
20190131 72
 
0.7%
20190503 72
 
0.7%
20190219 72
 
0.7%
20190310 72
 
0.7%
20190222 71
 
0.7%
Other values (160) 9259
92.6%
ValueCountFrequency (%)
20190104 58
0.6%
20190105 62
0.6%
20190106 46
0.5%
20190107 59
0.6%
20190108 58
0.6%
20190109 58
0.6%
20190110 66
0.7%
20190111 54
0.5%
20190112 49
0.5%
20190113 60
0.6%
ValueCountFrequency (%)
20190622 4
 
< 0.1%
20190621 54
0.5%
20190620 65
0.7%
20190619 51
0.5%
20190618 54
0.5%
20190617 58
0.6%
20190616 66
0.7%
20190615 67
0.7%
20190614 61
0.6%
20190613 51
0.5%

Interactions

2024-05-11T15:20:48.351051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:46.164418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:46.876597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.612815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:48.560291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:46.337436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.084089image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.764695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:48.749300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:46.512700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.278751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.937539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:48.914948image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:46.699001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:47.465945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:48.129386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:20:53.672944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자노선명승차총승객수하차총승객수등록일자
사용일자1.0000.0000.0750.0740.997
노선명0.0001.0000.5450.5360.000
승차총승객수0.0750.5451.0000.9860.073
하차총승객수0.0740.5360.9861.0000.070
등록일자0.9970.0000.0730.0701.000
2024-05-11T15:20:53.823786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자승차총승객수하차총승객수등록일자노선명
사용일자1.0000.0610.0611.0000.000
승차총승객수0.0611.0000.9930.0610.224
하차총승객수0.0610.9931.0000.0610.219
등록일자1.0000.0610.0611.0000.000
노선명0.0000.2240.2190.0001.000

Missing values

2024-05-11T15:20:49.119166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:20:49.296414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사용일자노선명역명승차총승객수하차총승객수등록일자
30943201902224호선충무로369743777820190225
1376520190124경인선부평386384198320190127
19098201902022호선왕십리(성동구청)142821197920190205
43673201903158호선잠실(송파구청)175952000720190318
9242120190606경부선독산9165935020190609
17551201901305호선왕십리(성동구청)5917678820190202
9368420190608과천선인덕원218852093720190611
37907201903063호선일원120241220620190309
83868201905227호선군자(능동)171681354220190525
5770620190408수인선소래포구5523509820190411
사용일자노선명역명승차총승객수하차총승객수등록일자
9663020190613분당선선정릉92481008420190616
28348201902179호선2~3단계종합운동장3040230520190220
66907201904243호선매봉141141347020190427
93267201906075호선마천5591552820190610
8094201901145호선천호(풍납토성)204892138020190117
3631220190303중앙선회기213442253620190306
3106220190222과천선인덕원309563064120190225
98760201906171호선신설동181161763920190620
95098201906106호선역촌4597539920190613
2282520190208분당선개포동3742388120190211