Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory585.9 KiB
Average record size in memory60.0 B

Variable types

Numeric4
Categorical1
Text1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12914/S/1/datasetView.do

Alerts

사용일자 is highly overall correlated with 등록일자High correlation
승차총승객수 is highly overall correlated with 하차총승객수High correlation
하차총승객수 is highly overall correlated with 승차총승객수High correlation
등록일자 is highly overall correlated with 사용일자High correlation

Reproduction

Analysis started2024-05-11 06:22:26.243520
Analysis finished2024-05-11 06:22:31.425092
Duration5.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사용일자
Real number (ℝ)

HIGH CORRELATION 

Distinct177
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20170361
Minimum20170101
Maximum20170626
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:22:31.635597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170101
5-th percentile20170109
Q120170215
median20170330
Q320170513
95-th percentile20170618
Maximum20170626
Range525
Interquartile range (IQR)298

Descriptive statistics

Standard deviation167.67586
Coefficient of variation (CV)8.3129824 × 10-6
Kurtosis-1.2317959
Mean20170361
Median Absolute Deviation (MAD)128
Skewness-0.0068351297
Sum2.0170361 × 1011
Variance28115.193
MonotonicityNot monotonic
2024-05-11T15:22:31.921784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20170221 75
 
0.8%
20170405 74
 
0.7%
20170220 74
 
0.7%
20170523 74
 
0.7%
20170322 73
 
0.7%
20170608 72
 
0.7%
20170415 69
 
0.7%
20170620 69
 
0.7%
20170123 68
 
0.7%
20170320 68
 
0.7%
Other values (167) 9284
92.8%
ValueCountFrequency (%)
20170101 49
0.5%
20170102 61
0.6%
20170103 57
0.6%
20170104 57
0.6%
20170105 64
0.6%
20170106 61
0.6%
20170107 62
0.6%
20170108 49
0.5%
20170109 45
0.4%
20170110 56
0.6%
ValueCountFrequency (%)
20170626 48
0.5%
20170625 62
0.6%
20170624 60
0.6%
20170623 46
0.5%
20170622 61
0.6%
20170621 49
0.5%
20170620 69
0.7%
20170619 62
0.6%
20170618 64
0.6%
20170617 52
0.5%

노선명
Categorical

Distinct24
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
7호선
894 
5호선
893 
2호선
860 
경부선
695 
6호선
652 
Other values (19)
6006 

Length

Max length8
Median length3
Mean length3.1297
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row과천선
2nd row과천선
3rd row3호선
4th row3호선
5th row7호선

Common Values

ValueCountFrequency (%)
7호선 894
 
8.9%
5호선 893
 
8.9%
2호선 860
 
8.6%
경부선 695
 
7.0%
6호선 652
 
6.5%
3호선 624
 
6.2%
분당선 602
 
6.0%
경원선 528
 
5.3%
경의선 450
 
4.5%
9호선 442
 
4.4%
Other values (14) 3360
33.6%

Length

2024-05-11T15:22:32.172130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7호선 894
 
8.8%
5호선 893
 
8.7%
2호선 860
 
8.4%
경부선 695
 
6.8%
6호선 652
 
6.4%
3호선 624
 
6.1%
분당선 602
 
5.9%
경원선 528
 
5.2%
경의선 450
 
4.4%
9호선 442
 
4.3%
Other values (14) 3569
35.0%

역명
Text

Distinct492
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:22:32.623740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.5229
Min length2

Characters and Unicode

Total characters35229
Distinct characters283
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row범계
2nd row경마공원
3rd row홍제
4th row잠원
5th row가산디지털단지
ValueCountFrequency (%)
서울역 85
 
0.9%
공덕 63
 
0.6%
홍대입구 63
 
0.6%
김포공항 59
 
0.6%
고속터미널 58
 
0.6%
디지털미디어시티 57
 
0.6%
왕십리(성동구청 53
 
0.5%
종로3가 51
 
0.5%
신도림 48
 
0.5%
동대문역사문화공원 45
 
0.4%
Other values (482) 9418
94.2%
2024-05-11T15:22:33.434757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1305
 
3.7%
) 1168
 
3.3%
( 1168
 
3.3%
1156
 
3.3%
864
 
2.5%
816
 
2.3%
716
 
2.0%
700
 
2.0%
670
 
1.9%
596
 
1.7%
Other values (273) 26070
74.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 32698
92.8%
Close Punctuation 1168
 
3.3%
Open Punctuation 1168
 
3.3%
Decimal Number 135
 
0.4%
Other Punctuation 60
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1305
 
4.0%
1156
 
3.5%
864
 
2.6%
816
 
2.5%
716
 
2.2%
700
 
2.1%
670
 
2.0%
596
 
1.8%
547
 
1.7%
526
 
1.6%
Other values (267) 24802
75.9%
Decimal Number
ValueCountFrequency (%)
3 83
61.5%
4 31
 
23.0%
5 21
 
15.6%
Close Punctuation
ValueCountFrequency (%)
) 1168
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1168
100.0%
Other Punctuation
ValueCountFrequency (%)
. 60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 32698
92.8%
Common 2531
 
7.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1305
 
4.0%
1156
 
3.5%
864
 
2.6%
816
 
2.5%
716
 
2.2%
700
 
2.1%
670
 
2.0%
596
 
1.8%
547
 
1.7%
526
 
1.6%
Other values (267) 24802
75.9%
Common
ValueCountFrequency (%)
) 1168
46.1%
( 1168
46.1%
3 83
 
3.3%
. 60
 
2.4%
4 31
 
1.2%
5 21
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 32698
92.8%
ASCII 2531
 
7.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1305
 
4.0%
1156
 
3.5%
864
 
2.6%
816
 
2.5%
716
 
2.2%
700
 
2.1%
670
 
2.0%
596
 
1.8%
547
 
1.7%
526
 
1.6%
Other values (267) 24802
75.9%
ASCII
ValueCountFrequency (%)
) 1168
46.1%
( 1168
46.1%
3 83
 
3.3%
. 60
 
2.4%
4 31
 
1.2%
5 21
 
0.8%

승차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8273
Distinct (%)82.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13001.64
Minimum1
Maximum133544
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:22:33.757722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1150.95
Q14355.75
median9139.5
Q316756.25
95-th percentile38595.3
Maximum133544
Range133543
Interquartile range (IQR)12400.5

Descriptive statistics

Standard deviation13397.265
Coefficient of variation (CV)1.0304289
Kurtosis11.126119
Mean13001.64
Median Absolute Deviation (MAD)5683
Skewness2.672859
Sum1.300164 × 108
Variance1.7948671 × 108
MonotonicityNot monotonic
2024-05-11T15:22:34.109382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 15
 
0.1%
1381 5
 
0.1%
4995 5
 
0.1%
14244 5
 
0.1%
5859 5
 
0.1%
5379 4
 
< 0.1%
2132 4
 
< 0.1%
1989 4
 
< 0.1%
4607 4
 
< 0.1%
1942 4
 
< 0.1%
Other values (8263) 9945
99.5%
ValueCountFrequency (%)
1 15
0.1%
2 4
 
< 0.1%
3 2
 
< 0.1%
4 1
 
< 0.1%
6 1
 
< 0.1%
19 1
 
< 0.1%
29 2
 
< 0.1%
30 2
 
< 0.1%
31 1
 
< 0.1%
33 2
 
< 0.1%
ValueCountFrequency (%)
133544 1
< 0.1%
133213 1
< 0.1%
130761 1
< 0.1%
126409 1
< 0.1%
121943 1
< 0.1%
118336 1
< 0.1%
116124 1
< 0.1%
113622 1
< 0.1%
112907 1
< 0.1%
112615 1
< 0.1%

하차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8257
Distinct (%)82.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12949.72
Minimum0
Maximum140234
Zeros23
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:22:34.386177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1064.95
Q14180.5
median8879
Q316889.25
95-th percentile39001.55
Maximum140234
Range140234
Interquartile range (IQR)12708.75

Descriptive statistics

Standard deviation13573.186
Coefficient of variation (CV)1.0481452
Kurtosis10.596298
Mean12949.72
Median Absolute Deviation (MAD)5540.5
Skewness2.6188468
Sum1.294972 × 108
Variance1.8423138 × 108
MonotonicityNot monotonic
2024-05-11T15:22:34.682952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 23
 
0.2%
3635 5
 
0.1%
3562 5
 
0.1%
8562 5
 
0.1%
7472 5
 
0.1%
2403 4
 
< 0.1%
6317 4
 
< 0.1%
4905 4
 
< 0.1%
8249 4
 
< 0.1%
2582 4
 
< 0.1%
Other values (8247) 9937
99.4%
ValueCountFrequency (%)
0 23
0.2%
19 1
 
< 0.1%
21 1
 
< 0.1%
22 3
 
< 0.1%
23 1
 
< 0.1%
25 2
 
< 0.1%
26 1
 
< 0.1%
27 2
 
< 0.1%
28 1
 
< 0.1%
29 2
 
< 0.1%
ValueCountFrequency (%)
140234 1
< 0.1%
135578 1
< 0.1%
133806 1
< 0.1%
129981 1
< 0.1%
125765 1
< 0.1%
115499 1
< 0.1%
112512 1
< 0.1%
112420 1
< 0.1%
111507 1
< 0.1%
111487 1
< 0.1%

등록일자
Real number (ℝ)

HIGH CORRELATION 

Distinct172
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20170376
Minimum20170109
Maximum20170629
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:22:34.957338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20170109
5-th percentile20170117
Q120170223
median20170402
Q320170516
95-th percentile20170621
Maximum20170629
Range520
Interquartile range (IQR)293

Descriptive statistics

Standard deviation161.60483
Coefficient of variation (CV)8.0119888 × 10-6
Kurtosis-1.1571544
Mean20170376
Median Absolute Deviation (MAD)120
Skewness-0.028271264
Sum2.0170376 × 1011
Variance26116.12
MonotonicityNot monotonic
2024-05-11T15:22:35.242490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20170314 340
 
3.4%
20170301 75
 
0.8%
20170408 74
 
0.7%
20170526 74
 
0.7%
20170228 74
 
0.7%
20170325 73
 
0.7%
20170611 72
 
0.7%
20170623 69
 
0.7%
20170418 69
 
0.7%
20170323 68
 
0.7%
Other values (162) 9012
90.1%
ValueCountFrequency (%)
20170109 49
0.5%
20170110 61
0.6%
20170111 57
0.6%
20170112 57
0.6%
20170113 64
0.6%
20170114 61
0.6%
20170115 62
0.6%
20170116 49
0.5%
20170117 45
0.4%
20170118 56
0.6%
ValueCountFrequency (%)
20170629 48
0.5%
20170628 62
0.6%
20170627 60
0.6%
20170626 46
0.5%
20170625 61
0.6%
20170624 49
0.5%
20170623 69
0.7%
20170622 62
0.6%
20170621 64
0.6%
20170620 52
0.5%

Interactions

2024-05-11T15:22:30.200125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:27.730573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:28.542726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:29.455069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:30.386296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:27.930425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:28.781136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:29.643702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:30.573074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:28.133197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:29.002933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:29.824514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:30.787111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:28.351100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:29.234850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:22:30.019308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:22:35.444518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자노선명승차총승객수하차총승객수등록일자
사용일자1.0000.0000.0540.0600.989
노선명0.0001.0000.5180.5050.000
승차총승객수0.0540.5181.0000.9850.067
하차총승객수0.0600.5050.9851.0000.067
등록일자0.9890.0000.0670.0671.000
2024-05-11T15:22:35.650420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자승차총승객수하차총승객수등록일자노선명
사용일자1.0000.0580.0551.0000.000
승차총승객수0.0581.0000.9910.0580.217
하차총승객수0.0550.9911.0000.0550.210
등록일자1.0000.0580.0551.0000.000
노선명0.0000.2170.2100.0001.000

Missing values

2024-05-11T15:22:31.083500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:22:31.337519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사용일자노선명역명승차총승객수하차총승객수등록일자
790320170115과천선범계190861866620170123
3876220170310과천선경마공원7924807920170314
41738201703153호선홍제241862319620170318
8979201701163호선잠원6618631420170124
29190201702217호선가산디지털단지517715196420170301
80177201705225호선청구4938456120170525
86830201706033호선남부터미널(예술의전당)340863392620170606
80715201705237호선마들141941326320170526
8103320170524공항철도 1호선김포공항10821736020170527
18676201702034호선동작(현충원)3136327620170211
사용일자노선명역명승차총승객수하차총승객수등록일자
5260220170404경춘선김유정55560420170407
616201701025호선서대문181231890920170110
85738201706017호선사가정176641751520170604
5261020170404경춘선마석4669468820170407
80516201705231호선제기동212462151220170526
65633201704271호선제기동225642305220170430
89720201706082호선봉천274932513320170611
3004420170223안산선대야미4348395620170303
45918201703235호선천호(풍납토성)222332387020170326
1821220170202수인선신포1732179420170210