Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory585.9 KiB
Average record size in memory60.0 B

Variable types

Numeric4
Categorical1
Text1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12914/S/1/datasetView.do

Alerts

사용일자 is highly overall correlated with 등록일자High correlation
승차총승객수 is highly overall correlated with 하차총승객수High correlation
하차총승객수 is highly overall correlated with 승차총승객수High correlation
등록일자 is highly overall correlated with 사용일자High correlation

Reproduction

Analysis started2024-05-11 06:20:56.699837
Analysis finished2024-05-11 06:21:01.711073
Duration5.01 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사용일자
Real number (ℝ)

HIGH CORRELATION 

Distinct172
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20180352
Minimum20180101
Maximum20180621
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:21:01.836148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20180101
5-th percentile20180109
Q120180213
median20180327
Q320180509
95-th percentile20180613
Maximum20180621
Range520
Interquartile range (IQR)296

Descriptive statistics

Standard deviation164.09481
Coefficient of variation (CV)8.1314146 × 10-6
Kurtosis-1.2163644
Mean20180352
Median Absolute Deviation (MAD)123
Skewness0.028961866
Sum2.0180352 × 1011
Variance26927.107
MonotonicityNot monotonic
2024-05-11T15:21:02.086072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20180223 76
 
0.8%
20180313 74
 
0.7%
20180419 72
 
0.7%
20180210 72
 
0.7%
20180226 72
 
0.7%
20180324 72
 
0.7%
20180311 71
 
0.7%
20180530 71
 
0.7%
20180510 71
 
0.7%
20180323 70
 
0.7%
Other values (162) 9279
92.8%
ValueCountFrequency (%)
20180101 61
0.6%
20180102 53
0.5%
20180103 45
0.4%
20180104 49
0.5%
20180105 68
0.7%
20180106 65
0.7%
20180107 52
0.5%
20180108 54
0.5%
20180109 55
0.5%
20180110 62
0.6%
ValueCountFrequency (%)
20180621 56
0.6%
20180620 65
0.7%
20180619 60
0.6%
20180618 47
0.5%
20180617 58
0.6%
20180616 54
0.5%
20180615 62
0.6%
20180614 59
0.6%
20180613 56
0.6%
20180612 51
0.5%

노선명
Categorical

Distinct25
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2호선
876 
5호선
867 
7호선
855 
경부선
659 
6호선
650 
Other values (20)
6093 

Length

Max length8
Median length3
Mean length3.1831
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3호선
2nd row중앙선
3rd row경부선
4th row3호선
5th row4호선

Common Values

ValueCountFrequency (%)
2호선 876
 
8.8%
5호선 867
 
8.7%
7호선 855
 
8.6%
경부선 659
 
6.6%
6호선 650
 
6.5%
3호선 612
 
6.1%
분당선 573
 
5.7%
경원선 561
 
5.6%
4호선 445
 
4.5%
경의선 443
 
4.4%
Other values (15) 3459
34.6%

Length

2024-05-11T15:21:02.347112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2호선 876
 
8.6%
5호선 867
 
8.5%
7호선 855
 
8.4%
경부선 659
 
6.4%
6호선 650
 
6.4%
3호선 612
 
6.0%
분당선 573
 
5.6%
경원선 561
 
5.5%
4호선 445
 
4.3%
경의선 443
 
4.3%
Other values (15) 3693
36.1%

역명
Text

Distinct503
Distinct (%)5.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:21:02.839286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.5903
Min length2

Characters and Unicode

Total characters35903
Distinct characters289
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row충무로
2nd row회기
3rd row천안
4th row독립문
5th row회현(남대문시장)
ValueCountFrequency (%)
서울역 103
 
1.0%
공덕 72
 
0.7%
왕십리(성동구청 59
 
0.6%
신설동 58
 
0.6%
김포공항 55
 
0.5%
디지털미디어시티 54
 
0.5%
홍대입구 51
 
0.5%
고속터미널 50
 
0.5%
충정로(경기대입구 46
 
0.5%
건대입구 46
 
0.5%
Other values (493) 9406
94.1%
2024-05-11T15:21:03.463262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1306
 
3.6%
1243
 
3.5%
) 1217
 
3.4%
( 1217
 
3.4%
864
 
2.4%
812
 
2.3%
805
 
2.2%
717
 
2.0%
714
 
2.0%
577
 
1.6%
Other values (279) 26431
73.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 33181
92.4%
Close Punctuation 1217
 
3.4%
Open Punctuation 1217
 
3.4%
Decimal Number 209
 
0.6%
Other Punctuation 79
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1306
 
3.9%
1243
 
3.7%
864
 
2.6%
812
 
2.4%
805
 
2.4%
717
 
2.2%
714
 
2.2%
577
 
1.7%
572
 
1.7%
532
 
1.6%
Other values (270) 25039
75.5%
Decimal Number
ValueCountFrequency (%)
3 81
38.8%
4 39
18.7%
1 38
18.2%
2 20
 
9.6%
5 17
 
8.1%
9 14
 
6.7%
Close Punctuation
ValueCountFrequency (%)
) 1217
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1217
100.0%
Other Punctuation
ValueCountFrequency (%)
. 79
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 33181
92.4%
Common 2722
 
7.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1306
 
3.9%
1243
 
3.7%
864
 
2.6%
812
 
2.4%
805
 
2.4%
717
 
2.2%
714
 
2.2%
577
 
1.7%
572
 
1.7%
532
 
1.6%
Other values (270) 25039
75.5%
Common
ValueCountFrequency (%)
) 1217
44.7%
( 1217
44.7%
3 81
 
3.0%
. 79
 
2.9%
4 39
 
1.4%
1 38
 
1.4%
2 20
 
0.7%
5 17
 
0.6%
9 14
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 33181
92.4%
ASCII 2722
 
7.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1306
 
3.9%
1243
 
3.7%
864
 
2.6%
812
 
2.4%
805
 
2.4%
717
 
2.2%
714
 
2.2%
577
 
1.7%
572
 
1.7%
532
 
1.6%
Other values (270) 25039
75.5%
ASCII
ValueCountFrequency (%)
) 1217
44.7%
( 1217
44.7%
3 81
 
3.0%
. 79
 
2.9%
4 39
 
1.4%
1 38
 
1.4%
2 20
 
0.7%
5 17
 
0.6%
9 14
 
0.5%

승차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8240
Distinct (%)82.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12536.301
Minimum1
Maximum133098
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:21:03.674167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1097.7
Q14042.75
median8855
Q316404.25
95-th percentile38017.2
Maximum133098
Range133097
Interquartile range (IQR)12361.5

Descriptive statistics

Standard deviation12942.538
Coefficient of variation (CV)1.0324049
Kurtosis9.7814071
Mean12536.301
Median Absolute Deviation (MAD)5593
Skewness2.5616248
Sum1.2536301 × 108
Variance1.675093 × 108
MonotonicityNot monotonic
2024-05-11T15:21:03.876103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 36
 
0.4%
2 8
 
0.1%
2076 6
 
0.1%
2374 5
 
0.1%
1218 5
 
0.1%
4660 5
 
0.1%
4245 4
 
< 0.1%
3205 4
 
< 0.1%
2060 4
 
< 0.1%
1587 4
 
< 0.1%
Other values (8230) 9919
99.2%
ValueCountFrequency (%)
1 36
0.4%
2 8
 
0.1%
3 3
 
< 0.1%
4 2
 
< 0.1%
5 2
 
< 0.1%
6 2
 
< 0.1%
8 1
 
< 0.1%
19 1
 
< 0.1%
31 1
 
< 0.1%
32 1
 
< 0.1%
ValueCountFrequency (%)
133098 1
< 0.1%
121007 1
< 0.1%
118692 1
< 0.1%
118128 1
< 0.1%
116410 1
< 0.1%
111586 1
< 0.1%
108915 1
< 0.1%
101418 1
< 0.1%
101308 1
< 0.1%
99000 1
< 0.1%

하차총승객수
Real number (ℝ)

HIGH CORRELATION 

Distinct8165
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12466.186
Minimum0
Maximum136675
Zeros54
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:21:04.115962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1019.95
Q13889
median8598
Q316220.25
95-th percentile38471.6
Maximum136675
Range136675
Interquartile range (IQR)12331.25

Descriptive statistics

Standard deviation13182.22
Coefficient of variation (CV)1.0574381
Kurtosis10.041702
Mean12466.186
Median Absolute Deviation (MAD)5456
Skewness2.5883346
Sum1.2466186 × 108
Variance1.7377093 × 108
MonotonicityNot monotonic
2024-05-11T15:21:04.346296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 54
 
0.5%
1963 5
 
0.1%
2431 5
 
0.1%
1749 5
 
0.1%
1823 5
 
0.1%
2524 5
 
0.1%
2803 4
 
< 0.1%
2334 4
 
< 0.1%
4967 4
 
< 0.1%
4369 4
 
< 0.1%
Other values (8155) 9905
99.1%
ValueCountFrequency (%)
0 54
0.5%
17 1
 
< 0.1%
19 1
 
< 0.1%
20 1
 
< 0.1%
21 1
 
< 0.1%
24 1
 
< 0.1%
25 2
 
< 0.1%
26 2
 
< 0.1%
27 1
 
< 0.1%
28 1
 
< 0.1%
ValueCountFrequency (%)
136675 1
< 0.1%
124113 1
< 0.1%
120685 1
< 0.1%
119053 1
< 0.1%
116245 1
< 0.1%
114779 1
< 0.1%
112640 1
< 0.1%
111583 1
< 0.1%
111411 1
< 0.1%
106722 1
< 0.1%

등록일자
Real number (ℝ)

HIGH CORRELATION 

Distinct172
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20180361
Minimum20180104
Maximum20180624
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:21:04.566680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20180104
5-th percentile20180112
Q120180216
median20180330
Q320180512
95-th percentile20180616
Maximum20180624
Range520
Interquartile range (IQR)296

Descriptive statistics

Standard deviation164.49149
Coefficient of variation (CV)8.1510676 × 10-6
Kurtosis-1.2096439
Mean20180361
Median Absolute Deviation (MAD)125
Skewness0.0016875755
Sum2.0180361 × 1011
Variance27057.45
MonotonicityNot monotonic
2024-05-11T15:21:05.185548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20180226 76
 
0.8%
20180316 74
 
0.7%
20180422 72
 
0.7%
20180213 72
 
0.7%
20180301 72
 
0.7%
20180327 72
 
0.7%
20180314 71
 
0.7%
20180602 71
 
0.7%
20180513 71
 
0.7%
20180326 70
 
0.7%
Other values (162) 9279
92.8%
ValueCountFrequency (%)
20180104 61
0.6%
20180105 53
0.5%
20180106 45
0.4%
20180107 49
0.5%
20180108 68
0.7%
20180109 65
0.7%
20180110 52
0.5%
20180111 54
0.5%
20180112 55
0.5%
20180113 62
0.6%
ValueCountFrequency (%)
20180624 56
0.6%
20180623 65
0.7%
20180622 60
0.6%
20180621 47
0.5%
20180620 58
0.6%
20180619 54
0.5%
20180618 62
0.6%
20180617 59
0.6%
20180616 56
0.6%
20180615 51
0.5%

Interactions

2024-05-11T15:21:00.708077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:58.119542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:58.977306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:59.789335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:21:00.886703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:58.324964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:59.207096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:59.996854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:21:01.056851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:58.535385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:59.402533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:21:00.237759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:21:01.264702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:58.756159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:20:59.599182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T15:21:00.501318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:21:05.376009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자노선명승차총승객수하차총승객수등록일자
사용일자1.0000.0000.0590.0480.997
노선명0.0001.0000.5300.5190.000
승차총승객수0.0590.5301.0000.9780.061
하차총승객수0.0480.5190.9781.0000.020
등록일자0.9970.0000.0610.0201.000
2024-05-11T15:21:05.595181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용일자승차총승객수하차총승객수등록일자노선명
사용일자1.0000.0340.0311.0000.000
승차총승객수0.0341.0000.9910.0340.216
하차총승객수0.0310.9911.0000.0310.210
등록일자1.0000.0340.0311.0000.000
노선명0.0000.2160.2100.0001.000

Missing values

2024-05-11T15:21:01.467562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:21:01.633288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사용일자노선명역명승차총승객수하차총승객수등록일자
53550201804033호선충무로4020180406
1305020180123중앙선회기267492603020180126
7867520180516경부선천안8720724020180519
10518201801193호선독립문8786880320180122
71062201805034호선회현(남대문시장)377444007920180506
63789201804205호선군자(능동)132081513820180423
20392201802055호선개롱7415760920180208
1723320180130경원선방학104891037420180202
147920180103경의선금촌6892678220180106
6649220180425경원선방학121661167320180428
사용일자노선명역명승차총승객수하차총승객수등록일자
94847201806131호선신설동10457993220180616
7326220180506공항철도 1호선운서4801518520180509
841220180115중앙선오빈37133520180118
98493201806196호선불광5652570720180622
5694520180408공항철도 1호선검암5871636020180411
9098020180606안산선고잔6141606220180609
1822620180201경강선곤지암1954195620180204
68068201804282호선강변(동서울터미널)561705822820180501
3055520180222분당선미금229822470620180225
4084220180312경인선오류동132851243020180315