Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory742.2 KiB
Average record size in memory76.0 B

Variable types

Numeric4
DateTime1
Categorical2
Text1

Dataset

Description서울교통공사 1~8호선 역별 일별 승객유형별 수송인원(환승유입인원 포함) 정보입니다. 해당 데이터는 연번, 날짜 호선, 역변호, 역명, 승객유형, 승차인원, 환승유입인원 데이터로 구성되어 있습니다.
URLhttps://www.data.go.kr/data/15104835/fileData.do

Alerts

역번호 is highly overall correlated with 호선High correlation
승차인원 is highly overall correlated with 환승유입인원High correlation
환승유입인원 is highly overall correlated with 승차인원High correlation
호선 is highly overall correlated with 역번호High correlation
연번 has unique valuesUnique
승차인원 has 436 (4.4%) zerosZeros
환승유입인원 has 1120 (11.2%) zerosZeros

Reproduction

Analysis started2023-12-12 21:56:56.612958
Analysis finished2023-12-12 21:56:59.561406
Duration2.95 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47516.656
Minimum21
Maximum95274
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:56:59.937404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile4772.7
Q123435.75
median47384
Q371382
95-th percentile90499.1
Maximum95274
Range95253
Interquartile range (IQR)47946.25

Descriptive statistics

Standard deviation27605.836
Coefficient of variation (CV)0.58097177
Kurtosis-1.2069809
Mean47516.656
Median Absolute Deviation (MAD)23979.5
Skewness0.012799
Sum4.7516656 × 108
Variance7.6208217 × 108
MonotonicityNot monotonic
2023-12-13T06:57:00.122974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
82755 1
 
< 0.1%
63500 1
 
< 0.1%
63185 1
 
< 0.1%
70734 1
 
< 0.1%
52793 1
 
< 0.1%
87546 1
 
< 0.1%
72040 1
 
< 0.1%
92823 1
 
< 0.1%
13270 1
 
< 0.1%
61001 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
21 1
< 0.1%
22 1
< 0.1%
30 1
< 0.1%
55 1
< 0.1%
72 1
< 0.1%
86 1
< 0.1%
88 1
< 0.1%
111 1
< 0.1%
115 1
< 0.1%
145 1
< 0.1%
ValueCountFrequency (%)
95274 1
< 0.1%
95270 1
< 0.1%
95262 1
< 0.1%
95243 1
< 0.1%
95239 1
< 0.1%
95212 1
< 0.1%
95207 1
< 0.1%
95200 1
< 0.1%
95190 1
< 0.1%
95189 1
< 0.1%

날짜
Date

Distinct55
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2022-01-01 00:00:00
Maximum2022-02-24 00:00:00
2023-12-13T06:57:00.265555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:57:00.408325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

호선
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2호선
2084 
5호선
1931 
7호선
1456 
6호선
1272 
3호선
1239 
Other values (3)
2018 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5호선
2nd row4호선
3rd row6호선
4th row7호선
5th row7호선

Common Values

ValueCountFrequency (%)
2호선 2084
20.8%
5호선 1931
19.3%
7호선 1456
14.6%
6호선 1272
12.7%
3호선 1239
12.4%
4호선 1025
10.2%
8호선 619
 
6.2%
1호선 374
 
3.7%

Length

2023-12-13T06:57:00.547919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:57:00.681595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 2084
20.8%
5호선 1931
19.3%
7호선 1456
14.6%
6호선 1272
12.7%
3호선 1239
12.4%
4호선 1025
10.2%
8호선 619
 
6.2%
1호선 374
 
3.7%

역번호
Real number (ℝ)

HIGH CORRELATION 

Distinct279
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1532.9507
Minimum150
Maximum2828
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:57:00.836351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile203
Q1310
median2519
Q32635
95-th percentile2814
Maximum2828
Range2678
Interquartile range (IQR)2325

Descriptive statistics

Standard deviation1181.0751
Coefficient of variation (CV)0.77045861
Kurtosis-1.9633683
Mean1532.9507
Median Absolute Deviation (MAD)301
Skewness-0.10750144
Sum15329507
Variance1394938.3
MonotonicityNot monotonic
2023-12-13T06:57:00.982775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
424 63
 
0.6%
240 57
 
0.6%
206 57
 
0.6%
230 55
 
0.5%
318 54
 
0.5%
2534 53
 
0.5%
239 52
 
0.5%
2624 52
 
0.5%
208 52
 
0.5%
420 52
 
0.5%
Other values (269) 9453
94.5%
ValueCountFrequency (%)
150 40
0.4%
151 51
0.5%
152 41
0.4%
153 42
0.4%
154 25
0.2%
155 42
0.4%
156 33
0.3%
157 36
0.4%
158 34
0.3%
159 30
0.3%
ValueCountFrequency (%)
2828 30
0.3%
2827 28
0.3%
2826 36
0.4%
2825 36
0.4%
2824 27
0.3%
2823 36
0.4%
2822 33
0.3%
2821 29
0.3%
2820 38
0.4%
2819 33
0.3%

역명
Text

Distinct244
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T06:57:01.269411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.3939
Min length2

Characters and Unicode

Total characters43939
Distinct characters241
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row양평
2nd row혜화
3rd row청구
4th row장승배기
5th row철산
ValueCountFrequency (%)
종로3가 137
 
1.4%
동대문역사문화공원(ddp 124
 
1.2%
시청 102
 
1.0%
신당 94
 
0.9%
고속터미널 94
 
0.9%
합정 93
 
0.9%
왕십리(성동구청 88
 
0.9%
공덕 88
 
0.9%
동대문 83
 
0.8%
태릉입구 82
 
0.8%
Other values (234) 9015
90.1%
2023-12-13T06:57:01.692516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 2258
 
5.1%
) 2258
 
5.1%
1834
 
4.2%
1722
 
3.9%
1234
 
2.8%
1111
 
2.5%
941
 
2.1%
803
 
1.8%
783
 
1.8%
723
 
1.6%
Other values (231) 30272
68.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 38608
87.9%
Open Punctuation 2258
 
5.1%
Close Punctuation 2258
 
5.1%
Uppercase Letter 372
 
0.8%
Decimal Number 308
 
0.7%
Other Punctuation 135
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1834
 
4.8%
1722
 
4.5%
1234
 
3.2%
1111
 
2.9%
941
 
2.4%
803
 
2.1%
783
 
2.0%
723
 
1.9%
707
 
1.8%
633
 
1.6%
Other values (222) 28117
72.8%
Decimal Number
ValueCountFrequency (%)
3 219
71.1%
4 64
 
20.8%
5 25
 
8.1%
Uppercase Letter
ValueCountFrequency (%)
D 248
66.7%
P 124
33.3%
Other Punctuation
ValueCountFrequency (%)
. 96
71.1%
· 39
28.9%
Open Punctuation
ValueCountFrequency (%)
( 2258
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2258
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38608
87.9%
Common 4959
 
11.3%
Latin 372
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1834
 
4.8%
1722
 
4.5%
1234
 
3.2%
1111
 
2.9%
941
 
2.4%
803
 
2.1%
783
 
2.0%
723
 
1.9%
707
 
1.8%
633
 
1.6%
Other values (222) 28117
72.8%
Common
ValueCountFrequency (%)
( 2258
45.5%
) 2258
45.5%
3 219
 
4.4%
. 96
 
1.9%
4 64
 
1.3%
· 39
 
0.8%
5 25
 
0.5%
Latin
ValueCountFrequency (%)
D 248
66.7%
P 124
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38608
87.9%
ASCII 5292
 
12.0%
None 39
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 2258
42.7%
) 2258
42.7%
D 248
 
4.7%
3 219
 
4.1%
P 124
 
2.3%
. 96
 
1.8%
4 64
 
1.2%
5 25
 
0.5%
Hangul
ValueCountFrequency (%)
1834
 
4.8%
1722
 
4.5%
1234
 
3.2%
1111
 
2.9%
941
 
2.4%
803
 
2.1%
783
 
2.0%
723
 
1.9%
707
 
1.8%
633
 
1.6%
Other values (222) 28117
72.8%
None
ValueCountFrequency (%)
· 39
100.0%

승객유형
Categorical

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
우대권
1587 
어린이
1579 
직원
1570 
일반
1541 
청소년
1497 
Other values (7)
2226 

Length

Max length7
Median length3
Mean length2.9355
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row어린이
2nd row직원
3rd row우대권
4th row어린이
5th row직원

Common Values

ValueCountFrequency (%)
우대권 1587
15.9%
어린이 1579
15.8%
직원 1570
15.7%
일반 1541
15.4%
청소년 1497
15.0%
중고생 1196
12.0%
영어 일반 604
 
6.0%
중국어 일반 247
 
2.5%
영어 어린이 102
 
1.0%
일어 일반 43
 
0.4%
Other values (2) 34
 
0.3%

Length

2023-12-13T06:57:01.830380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
일반 2435
22.1%
어린이 1715
15.5%
우대권 1587
14.4%
직원 1570
14.2%
청소년 1497
13.6%
중고생 1196
10.8%
영어 706
 
6.4%
중국어 270
 
2.4%
일어 54
 
0.5%

승차인원
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct3091
Distinct (%)30.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1975.7037
Minimum0
Maximum81847
Zeros436
Zeros (%)4.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:57:01.951774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q114
median93
Q31283.5
95-th percentile11151.3
Maximum81847
Range81847
Interquartile range (IQR)1269.5

Descriptive statistics

Standard deviation5416.6099
Coefficient of variation (CV)2.7416104
Kurtosis45.286366
Mean1975.7037
Median Absolute Deviation (MAD)92
Skewness5.6597064
Sum19757037
Variance29339662
MonotonicityNot monotonic
2023-12-13T06:57:02.083095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 890
 
8.9%
0 436
 
4.4%
2 407
 
4.1%
3 231
 
2.3%
4 114
 
1.1%
5 74
 
0.7%
6 64
 
0.6%
23 58
 
0.6%
36 55
 
0.5%
28 55
 
0.5%
Other values (3081) 7616
76.2%
ValueCountFrequency (%)
0 436
4.4%
1 890
8.9%
2 407
4.1%
3 231
 
2.3%
4 114
 
1.1%
5 74
 
0.7%
6 64
 
0.6%
7 53
 
0.5%
8 33
 
0.3%
9 32
 
0.3%
ValueCountFrequency (%)
81847 1
< 0.1%
81791 1
< 0.1%
77976 1
< 0.1%
71488 1
< 0.1%
59240 1
< 0.1%
58197 1
< 0.1%
56506 1
< 0.1%
56045 1
< 0.1%
55197 1
< 0.1%
54623 1
< 0.1%

환승유입인원
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct2525
Distinct (%)25.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1016.7829
Minimum0
Maximum34544
Zeros1120
Zeros (%)11.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:57:02.206218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median27
Q3606
95-th percentile5881.75
Maximum34544
Range34544
Interquartile range (IQR)602

Descriptive statistics

Standard deviation2817.5008
Coefficient of variation (CV)2.7709955
Kurtosis36.432653
Mean1016.7829
Median Absolute Deviation (MAD)27
Skewness5.255116
Sum10167829
Variance7938310.9
MonotonicityNot monotonic
2023-12-13T06:57:02.328196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1120
 
11.2%
1 768
 
7.7%
2 341
 
3.4%
3 199
 
2.0%
7 191
 
1.9%
4 187
 
1.9%
6 187
 
1.9%
5 184
 
1.8%
8 168
 
1.7%
9 166
 
1.7%
Other values (2515) 6489
64.9%
ValueCountFrequency (%)
0 1120
11.2%
1 768
7.7%
2 341
 
3.4%
3 199
 
2.0%
4 187
 
1.9%
5 184
 
1.8%
6 187
 
1.9%
7 191
 
1.9%
8 168
 
1.7%
9 166
 
1.7%
ValueCountFrequency (%)
34544 1
< 0.1%
33715 1
< 0.1%
33295 1
< 0.1%
31479 1
< 0.1%
30888 1
< 0.1%
30069 1
< 0.1%
29835 1
< 0.1%
29650 1
< 0.1%
29410 1
< 0.1%
28243 1
< 0.1%

Interactions

2023-12-13T06:56:58.926142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:57.680040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.140943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.564590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:59.031394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:57.805783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.250561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.656273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:59.118359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:57.919720image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.353334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.733207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:59.206048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.032580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.459643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:56:58.830604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:57:02.413054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번날짜호선역번호승객유형승차인원환승유입인원
연번1.0000.9980.0640.0620.0000.0630.069
날짜0.9981.0000.0590.0940.0000.0980.084
호선0.0640.0591.0000.9950.1360.1450.135
역번호0.0620.0940.9951.0000.1430.1240.126
승객유형0.0000.0000.1360.1431.0000.4600.496
승차인원0.0630.0980.1450.1240.4601.0000.938
환승유입인원0.0690.0840.1350.1260.4960.9381.000
2023-12-13T06:57:02.501526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선승객유형
호선1.0000.058
승객유형0.0581.000
2023-12-13T06:57:02.591282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번역번호승차인원환승유입인원호선승객유형
연번1.000-0.0010.0090.0060.0300.000
역번호-0.0011.0000.0060.0210.9060.067
승차인원0.0090.0061.0000.9770.0690.212
환승유입인원0.0060.0210.9771.0000.0650.234
호선0.0300.9060.0690.0651.0000.058
승객유형0.0000.0670.2120.2340.0581.000

Missing values

2023-12-13T06:56:59.332060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:56:59.491578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번날짜호선역번호역명승객유형승차인원환승유입인원
82754827552022-02-175호선2523양평어린이92
73870738712022-02-124호선420혜화직원10814
32834328352022-01-196호선2635청구우대권600350
64303643042022-02-067호선2742장승배기어린이3910
78269782702022-02-147호선2749철산직원10755
66522665232022-02-082호선233대림(구로구청)직원12535
34581345822022-01-206호선2638창신청소년6939
80828808292022-02-164호선417길음일반141898217
43267432682022-01-256호선2635청구우대권662358
25015250162022-01-153호선320을지로3가어린이194
연번날짜호선역번호역명승객유형승차인원환승유입인원
23749237502022-01-145호선2539신금호직원2611
10389103902022-01-067호선2741상도직원3915
25682256832022-01-156호선2611응암어린이11041
922892292022-01-062호선248양천구청직원12642
25432254332022-01-155호선2527여의도일반113866367
90429904302022-02-217호선2733학동중고생10
68908689092022-02-095호선2538청구청소년6532
58168581692022-02-034호선413쌍문어린이9912
55307553082022-02-016호선2623합정청소년11430
47384473852022-01-282호선222강남중고생188