Overview

Dataset statistics

Number of variables6
Number of observations3420
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory173.8 KiB
Average record size in memory52.0 B

Variable types

Numeric4
Text1
DateTime1

Dataset

Description서울교통공사의 월별 승하차인원 데이터입니다. 해당 데이터는 연번, 호선, 역번호, 역명, 월별 승하차 인원 데이터로 구성되어 있습니다. 연단위 데이터로 2022년 12월 기준 파일까지 업로드 합니다.
URLhttps://www.data.go.kr/data/15044249/fileData.do

Alerts

호선 is highly overall correlated with 고유역번호(외부역코드)High correlation
고유역번호(외부역코드) is highly overall correlated with 호선High correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:21:42.746152
Analysis finished2023-12-12 23:21:45.081020
Duration2.33 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct3420
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1710.5
Minimum1
Maximum3420
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T08:21:45.149641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile171.95
Q1855.75
median1710.5
Q32565.25
95-th percentile3249.05
Maximum3420
Range3419
Interquartile range (IQR)1709.5

Descriptive statistics

Standard deviation987.41329
Coefficient of variation (CV)0.57726588
Kurtosis-1.2
Mean1710.5
Median Absolute Deviation (MAD)855
Skewness0
Sum5849910
Variance974985
MonotonicityStrictly increasing
2023-12-13T08:21:45.296445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2274 1
 
< 0.1%
2276 1
 
< 0.1%
2277 1
 
< 0.1%
2278 1
 
< 0.1%
2279 1
 
< 0.1%
2280 1
 
< 0.1%
2281 1
 
< 0.1%
2282 1
 
< 0.1%
2283 1
 
< 0.1%
Other values (3410) 3410
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3420 1
< 0.1%
3419 1
< 0.1%
3418 1
< 0.1%
3417 1
< 0.1%
3416 1
< 0.1%
3415 1
< 0.1%
3414 1
< 0.1%
3413 1
< 0.1%
3412 1
< 0.1%
3411 1
< 0.1%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8070175
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T08:21:45.414763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q37
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1624999
Coefficient of variation (CV)0.44986312
Kurtosis-0.99241848
Mean4.8070175
Median Absolute Deviation (MAD)2
Skewness0.064046985
Sum16440
Variance4.6764058
MonotonicityNot monotonic
2023-12-13T08:21:45.531547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
5 672
19.6%
2 600
17.5%
7 504
14.7%
6 444
13.0%
3 396
11.6%
4 312
9.1%
8 216
 
6.3%
9 156
 
4.6%
1 120
 
3.5%
ValueCountFrequency (%)
1 120
 
3.5%
2 600
17.5%
3 396
11.6%
4 312
9.1%
5 672
19.6%
6 444
13.0%
7 504
14.7%
8 216
 
6.3%
9 156
 
4.6%
ValueCountFrequency (%)
9 156
 
4.6%
8 216
 
6.3%
7 504
14.7%
6 444
13.0%
5 672
19.6%
4 312
9.1%
3 396
11.6%
2 600
17.5%
1 120
 
3.5%

고유역번호(외부역코드)
Real number (ℝ)

HIGH CORRELATION 

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1730.4456
Minimum150
Maximum4138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T08:21:45.658023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile205
Q1320
median2534
Q32712
95-th percentile2827
Maximum4138
Range3988
Interquartile range (IQR)2392

Descriptive statistics

Standard deviation1260.5169
Coefficient of variation (CV)0.72843483
Kurtosis-1.5105945
Mean1730.4456
Median Absolute Deviation (MAD)284
Skewness-0.10072894
Sum5918124
Variance1588902.7
MonotonicityNot monotonic
2023-12-13T08:21:45.811513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150 12
 
0.4%
2625 12
 
0.4%
2631 12
 
0.4%
2630 12
 
0.4%
2629 12
 
0.4%
2628 12
 
0.4%
2627 12
 
0.4%
2626 12
 
0.4%
2624 12
 
0.4%
2633 12
 
0.4%
Other values (275) 3300
96.5%
ValueCountFrequency (%)
150 12
0.4%
151 12
0.4%
152 12
0.4%
153 12
0.4%
154 12
0.4%
155 12
0.4%
156 12
0.4%
157 12
0.4%
158 12
0.4%
159 12
0.4%
ValueCountFrequency (%)
4138 12
0.4%
4137 12
0.4%
4136 12
0.4%
4135 12
0.4%
4134 12
0.4%
4133 12
0.4%
4132 12
0.4%
4131 12
0.4%
4130 12
0.4%
4129 12
0.4%

역명
Text

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
2023-12-13T08:21:46.176889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length3.877193
Min length2

Characters and Unicode

Total characters13260
Distinct characters223
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울역(1)
2nd row시청(1)
3rd row종각
4th row종로3가(1)
5th row종로5가
ValueCountFrequency (%)
서울역(1 12
 
0.4%
약수(6 12
 
0.4%
이태원 12
 
0.4%
녹사평 12
 
0.4%
삼각지(6 12
 
0.4%
효창공원앞 12
 
0.4%
공덕(6 12
 
0.4%
대흥 12
 
0.4%
상수 12
 
0.4%
구산 12
 
0.4%
Other values (275) 3300
96.5%
2023-12-13T08:21:46.659285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 1044
 
7.9%
) 1044
 
7.9%
384
 
2.9%
336
 
2.5%
276
 
2.1%
264
 
2.0%
228
 
1.7%
5 216
 
1.6%
192
 
1.4%
2 192
 
1.4%
Other values (213) 9084
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10044
75.7%
Decimal Number 1128
 
8.5%
Open Punctuation 1044
 
7.9%
Close Punctuation 1044
 
7.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Decimal Number
ValueCountFrequency (%)
5 216
19.1%
2 192
17.0%
3 168
14.9%
6 132
11.7%
7 132
11.7%
4 108
9.6%
8 72
 
6.4%
1 72
 
6.4%
9 36
 
3.2%
Open Punctuation
ValueCountFrequency (%)
( 1044
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1044
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10044
75.7%
Common 3216
 
24.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Common
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10044
75.7%
ASCII 3216
 
24.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%
Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
Minimum2022-01-01 00:00:00
Maximum2022-12-01 00:00:00
2023-12-13T08:21:46.796650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:46.891293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)

승하차인원수
Real number (ℝ)

Distinct3411
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean840326.24
Minimum29363
Maximum5046738
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T08:21:47.020615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum29363
5-th percentile172470.2
Q1385222.5
median660951.5
Q31038360.5
95-th percentile2191778.8
Maximum5046738
Range5017375
Interquartile range (IQR)653138

Descriptive statistics

Standard deviation680615.8
Coefficient of variation (CV)0.80994234
Kurtosis6.2607341
Mean840326.24
Median Absolute Deviation (MAD)310237.5
Skewness2.1433149
Sum2.8739157 × 109
Variance4.6323787 × 1011
MonotonicityNot monotonic
2023-12-13T08:21:47.203230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
448376 2
 
0.1%
558113 2
 
0.1%
180575 2
 
0.1%
536152 2
 
0.1%
555894 2
 
0.1%
1020863 2
 
0.1%
221018 2
 
0.1%
779387 2
 
0.1%
773837 2
 
0.1%
38524 1
 
< 0.1%
Other values (3401) 3401
99.4%
ValueCountFrequency (%)
29363 1
< 0.1%
32073 1
< 0.1%
37913 1
< 0.1%
38524 1
< 0.1%
39941 1
< 0.1%
41144 1
< 0.1%
41199 1
< 0.1%
41861 1
< 0.1%
42998 1
< 0.1%
43000 1
< 0.1%
ValueCountFrequency (%)
5046738 1
< 0.1%
4783674 1
< 0.1%
4726082 1
< 0.1%
4710233 1
< 0.1%
4628444 1
< 0.1%
4459450 1
< 0.1%
4453941 1
< 0.1%
4407896 1
< 0.1%
4397082 1
< 0.1%
4380830 1
< 0.1%

Interactions

2023-12-13T08:21:44.526262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.098275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.589563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.046214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.629881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.209758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.704238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.170775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.721195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.328880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.824174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.299348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.820174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.456890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:43.943468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:21:44.425514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:21:47.293709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)수송연월승하차인원수
연번1.0000.2040.1810.9590.000
호선0.2041.0000.9420.0000.481
고유역번호(외부역코드)0.1810.9421.0000.0000.476
수송연월0.9590.0000.0001.0000.000
승하차인원수0.0000.4810.4760.0001.000
2023-12-13T08:21:47.386602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)승하차인원수
연번1.0000.0820.0830.070
호선0.0821.0000.989-0.400
고유역번호(외부역코드)0.0830.9891.000-0.425
승하차인원수0.070-0.400-0.4251.000

Missing values

2023-12-13T08:21:44.943113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:21:45.040590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선고유역번호(외부역코드)역명수송연월승하차인원수
011150서울역(1)2022-012139094
121151시청(1)2022-011016011
231152종각2022-011626882
341153종로3가(1)2022-011258090
451154종로5가2022-011112653
561155동대문(1)2022-01579385
671156신설동(1)2022-01689637
781157제기동2022-01971569
891158청량리2022-011083534
9101159동묘앞(1)2022-01502544
연번호선고유역번호(외부역코드)역명수송연월승하차인원수
3410341194129봉은사2022-12715674
3411341294130종합운동장(9)2022-12173308
3412341394131삼전2022-12263099
3413341494132석촌고분2022-12241773
3414341594133석촌(9)2022-12324957
3415341694134송파나루2022-12217354
3416341794135한성백제2022-12105457
3417341894136올림픽공원(9)2022-12315912
3418341994137둔촌오륜2022-1244869
3419342094138중앙보훈병원2022-12373547