Overview

Dataset statistics

Number of variables6
Number of observations3420
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory173.8 KiB
Average record size in memory52.0 B

Variable types

Numeric4
Text1
Categorical1

Dataset

Description서울교통공사의 월별 수송인원 데이터입니다. 해당 데이터는 연번, 호선, 역번호, 월별 수송 인원 데이터로 구성되어 있습니다. 연단위 데이터로 2022년 12월기준 파일까지 업로드 합니다.
URLhttps://www.data.go.kr/data/15044253/fileData.do

Alerts

연번 is highly overall correlated with 수송연월High correlation
호선 is highly overall correlated with 고유역번호(외부역코드)High correlation
고유역번호(외부역코드) is highly overall correlated with 호선High correlation
수송연월 is highly overall correlated with 연번High correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:00:34.651547
Analysis finished2023-12-12 06:00:37.484431
Duration2.83 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct3420
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1710.5
Minimum1
Maximum3420
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-12T15:00:37.558614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile171.95
Q1855.75
median1710.5
Q32565.25
95-th percentile3249.05
Maximum3420
Range3419
Interquartile range (IQR)1709.5

Descriptive statistics

Standard deviation987.41329
Coefficient of variation (CV)0.57726588
Kurtosis-1.2
Mean1710.5
Median Absolute Deviation (MAD)855
Skewness0
Sum5849910
Variance974985
MonotonicityStrictly increasing
2023-12-12T15:00:37.730901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2274 1
 
< 0.1%
2276 1
 
< 0.1%
2277 1
 
< 0.1%
2278 1
 
< 0.1%
2279 1
 
< 0.1%
2280 1
 
< 0.1%
2281 1
 
< 0.1%
2282 1
 
< 0.1%
2283 1
 
< 0.1%
Other values (3410) 3410
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3420 1
< 0.1%
3419 1
< 0.1%
3418 1
< 0.1%
3417 1
< 0.1%
3416 1
< 0.1%
3415 1
< 0.1%
3414 1
< 0.1%
3413 1
< 0.1%
3412 1
< 0.1%
3411 1
< 0.1%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8070175
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-12T15:00:37.877530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q37
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1624999
Coefficient of variation (CV)0.44986312
Kurtosis-0.99241848
Mean4.8070175
Median Absolute Deviation (MAD)2
Skewness0.064046985
Sum16440
Variance4.6764058
MonotonicityNot monotonic
2023-12-12T15:00:38.335417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
5 672
19.6%
2 600
17.5%
7 504
14.7%
6 444
13.0%
3 396
11.6%
4 312
9.1%
8 216
 
6.3%
9 156
 
4.6%
1 120
 
3.5%
ValueCountFrequency (%)
1 120
 
3.5%
2 600
17.5%
3 396
11.6%
4 312
9.1%
5 672
19.6%
6 444
13.0%
7 504
14.7%
8 216
 
6.3%
9 156
 
4.6%
ValueCountFrequency (%)
9 156
 
4.6%
8 216
 
6.3%
7 504
14.7%
6 444
13.0%
5 672
19.6%
4 312
9.1%
3 396
11.6%
2 600
17.5%
1 120
 
3.5%

고유역번호(외부역코드)
Real number (ℝ)

HIGH CORRELATION 

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1730.4456
Minimum150
Maximum4138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-12T15:00:38.506474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile205
Q1320
median2534
Q32712
95-th percentile2827
Maximum4138
Range3988
Interquartile range (IQR)2392

Descriptive statistics

Standard deviation1260.5169
Coefficient of variation (CV)0.72843483
Kurtosis-1.5105945
Mean1730.4456
Median Absolute Deviation (MAD)284
Skewness-0.10072894
Sum5918124
Variance1588902.7
MonotonicityNot monotonic
2023-12-12T15:00:38.659873image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150 12
 
0.4%
2625 12
 
0.4%
2631 12
 
0.4%
2630 12
 
0.4%
2629 12
 
0.4%
2628 12
 
0.4%
2627 12
 
0.4%
2626 12
 
0.4%
2624 12
 
0.4%
2633 12
 
0.4%
Other values (275) 3300
96.5%
ValueCountFrequency (%)
150 12
0.4%
151 12
0.4%
152 12
0.4%
153 12
0.4%
154 12
0.4%
155 12
0.4%
156 12
0.4%
157 12
0.4%
158 12
0.4%
159 12
0.4%
ValueCountFrequency (%)
4138 12
0.4%
4137 12
0.4%
4136 12
0.4%
4135 12
0.4%
4134 12
0.4%
4133 12
0.4%
4132 12
0.4%
4131 12
0.4%
4130 12
0.4%
4129 12
0.4%

역명
Text

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
2023-12-12T15:00:39.104553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length3.877193
Min length2

Characters and Unicode

Total characters13260
Distinct characters223
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울역(1)
2nd row시청(1)
3rd row종각
4th row종로3가(1)
5th row종로5가
ValueCountFrequency (%)
서울역(1 12
 
0.4%
약수(6 12
 
0.4%
이태원 12
 
0.4%
녹사평 12
 
0.4%
삼각지(6 12
 
0.4%
효창공원앞 12
 
0.4%
공덕(6 12
 
0.4%
대흥 12
 
0.4%
상수 12
 
0.4%
구산 12
 
0.4%
Other values (275) 3300
96.5%
2023-12-12T15:00:39.639002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 1044
 
7.9%
) 1044
 
7.9%
384
 
2.9%
336
 
2.5%
276
 
2.1%
264
 
2.0%
228
 
1.7%
5 216
 
1.6%
192
 
1.4%
2 192
 
1.4%
Other values (213) 9084
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10044
75.7%
Decimal Number 1128
 
8.5%
Open Punctuation 1044
 
7.9%
Close Punctuation 1044
 
7.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Decimal Number
ValueCountFrequency (%)
5 216
19.1%
2 192
17.0%
3 168
14.9%
6 132
11.7%
7 132
11.7%
4 108
9.6%
8 72
 
6.4%
1 72
 
6.4%
9 36
 
3.2%
Open Punctuation
ValueCountFrequency (%)
( 1044
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1044
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10044
75.7%
Common 3216
 
24.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Common
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10044
75.7%
ASCII 3216
 
24.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%
Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%

수송연월
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
2022-01
285 
2022-02
285 
2022-03
285 
2022-04
285 
2022-05
285 
Other values (7)
1995 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-01
2nd row2022-01
3rd row2022-01
4th row2022-01
5th row2022-01

Common Values

ValueCountFrequency (%)
2022-01 285
8.3%
2022-02 285
8.3%
2022-03 285
8.3%
2022-04 285
8.3%
2022-05 285
8.3%
2022-06 285
8.3%
2022-07 285
8.3%
2022-08 285
8.3%
2022-09 285
8.3%
2022-10 285
8.3%
Other values (2) 570
16.7%

Length

2023-12-12T15:00:39.799530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-01 285
8.3%
2022-02 285
8.3%
2022-03 285
8.3%
2022-04 285
8.3%
2022-05 285
8.3%
2022-06 285
8.3%
2022-07 285
8.3%
2022-08 285
8.3%
2022-09 285
8.3%
2022-10 285
8.3%
Other values (2) 570
16.7%

수송인원수
Real number (ℝ)

Distinct3415
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean646515.9
Minimum35710
Maximum3487771
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-12T15:00:39.937853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum35710
5-th percentile131222.95
Q1316197.5
median514526
Q3803657.5
95-th percentile1702785
Maximum3487771
Range3452061
Interquartile range (IQR)487460

Descriptive statistics

Standard deviation504783.03
Coefficient of variation (CV)0.78077435
Kurtosis5.0127787
Mean646515.9
Median Absolute Deviation (MAD)226265.5
Skewness1.9727126
Sum2.2110844 × 109
Variance2.5480591 × 1011
MonotonicityNot monotonic
2023-12-12T15:00:40.095530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
419526 2
 
0.1%
215912 2
 
0.1%
381652 2
 
0.1%
743545 2
 
0.1%
399622 2
 
0.1%
1817867 1
 
< 0.1%
879118 1
 
< 0.1%
47304 1
 
< 0.1%
485532 1
 
< 0.1%
2301246 1
 
< 0.1%
Other values (3405) 3405
99.6%
ValueCountFrequency (%)
35710 1
< 0.1%
39500 1
< 0.1%
41259 1
< 0.1%
43648 1
< 0.1%
45579 1
< 0.1%
47192 1
< 0.1%
47304 1
< 0.1%
47879 1
< 0.1%
48396 1
< 0.1%
48800 1
< 0.1%
ValueCountFrequency (%)
3487771 1
< 0.1%
3329938 1
< 0.1%
3324704 1
< 0.1%
3292051 1
< 0.1%
3283757 1
< 0.1%
3198271 1
< 0.1%
3192214 1
< 0.1%
3099704 1
< 0.1%
3058742 1
< 0.1%
3057157 1
< 0.1%

Interactions

2023-12-12T15:00:36.778203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.016946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.564573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.211055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.921197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.106767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.736911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.364011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:37.064055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.251682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.944180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.528815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:37.185285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:35.438432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.080886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:00:36.652559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:00:40.243681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)수송연월수송인원수
연번1.0000.2040.1810.9590.065
호선0.2041.0000.9420.0000.452
고유역번호(외부역코드)0.1810.9421.0000.0000.445
수송연월0.9590.0000.0001.0000.034
수송인원수0.0650.4520.4450.0341.000
2023-12-12T15:00:40.374224image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)수송인원수수송연월
연번1.0000.0820.0830.0800.838
호선0.0821.0000.989-0.3490.000
고유역번호(외부역코드)0.0830.9891.000-0.3740.000
수송인원수0.080-0.349-0.3741.0000.015
수송연월0.8380.0000.0000.0151.000

Missing values

2023-12-12T15:00:37.328268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:00:37.443016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선고유역번호(외부역코드)역명수송연월수송인원수
011150서울역(1)2022-011817867
121151시청(1)2022-01907397
231152종각2022-011510584
341153종로3가(1)2022-011021499
451154종로5가2022-011009266
561155동대문(1)2022-01473477
671156신설동(1)2022-01568506
781157제기동2022-01845594
891158청량리2022-01931116
9101159동묘앞(1)2022-01412234
연번호선고유역번호(외부역코드)역명수송연월수송인원수
3410341194129봉은사2022-121144553
3411341294130종합운동장(9)2022-12268140
3412341394131삼전2022-12363282
3413341494132석촌고분2022-12345068
3414341594133석촌(9)2022-12450396
3415341694134송파나루2022-12303843
3416341794135한성백제2022-12131958
3417341894136올림픽공원(9)2022-12487406
3418341994137둔촌오륜2022-1256621
3419342094138중앙보훈병원2022-12527143