Overview

Dataset statistics

Number of variables6
Number of observations3420
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory173.8 KiB
Average record size in memory52.0 B

Variable types

Numeric4
Text1
DateTime1

Dataset

Description서울교통공사의 월별 하차인원 데이터입니다. 해당 데이터는 연번, 호선, 역번호, 역명 데이터로 구성되어 있습니다. 연단위 데이터로 2022년 12월기준 파일까지 업로드 합니다.
URLhttps://www.data.go.kr/data/15044247/fileData.do

Alerts

호선 is highly overall correlated with 고유역번호(외부역코드)High correlation
고유역번호(외부역코드) is highly overall correlated with 호선High correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 21:01:31.392845
Analysis finished2023-12-12 21:01:33.979716
Duration2.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct3420
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1710.5
Minimum1
Maximum3420
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T06:01:34.080199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile171.95
Q1855.75
median1710.5
Q32565.25
95-th percentile3249.05
Maximum3420
Range3419
Interquartile range (IQR)1709.5

Descriptive statistics

Standard deviation987.41329
Coefficient of variation (CV)0.57726588
Kurtosis-1.2
Mean1710.5
Median Absolute Deviation (MAD)855
Skewness0
Sum5849910
Variance974985
MonotonicityStrictly increasing
2023-12-13T06:01:34.235906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
2274 1
 
< 0.1%
2276 1
 
< 0.1%
2277 1
 
< 0.1%
2278 1
 
< 0.1%
2279 1
 
< 0.1%
2280 1
 
< 0.1%
2281 1
 
< 0.1%
2282 1
 
< 0.1%
2283 1
 
< 0.1%
Other values (3410) 3410
99.7%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
3420 1
< 0.1%
3419 1
< 0.1%
3418 1
< 0.1%
3417 1
< 0.1%
3416 1
< 0.1%
3415 1
< 0.1%
3414 1
< 0.1%
3413 1
< 0.1%
3412 1
< 0.1%
3411 1
< 0.1%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.8070175
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T06:01:34.380127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q37
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1624999
Coefficient of variation (CV)0.44986312
Kurtosis-0.99241848
Mean4.8070175
Median Absolute Deviation (MAD)2
Skewness0.064046985
Sum16440
Variance4.6764058
MonotonicityNot monotonic
2023-12-13T06:01:34.515842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
5 672
19.6%
2 600
17.5%
7 504
14.7%
6 444
13.0%
3 396
11.6%
4 312
9.1%
8 216
 
6.3%
9 156
 
4.6%
1 120
 
3.5%
ValueCountFrequency (%)
1 120
 
3.5%
2 600
17.5%
3 396
11.6%
4 312
9.1%
5 672
19.6%
6 444
13.0%
7 504
14.7%
8 216
 
6.3%
9 156
 
4.6%
ValueCountFrequency (%)
9 156
 
4.6%
8 216
 
6.3%
7 504
14.7%
6 444
13.0%
5 672
19.6%
4 312
9.1%
3 396
11.6%
2 600
17.5%
1 120
 
3.5%

고유역번호(외부역코드)
Real number (ℝ)

HIGH CORRELATION 

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1730.4456
Minimum150
Maximum4138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T06:01:34.672915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile205
Q1320
median2534
Q32712
95-th percentile2827
Maximum4138
Range3988
Interquartile range (IQR)2392

Descriptive statistics

Standard deviation1260.5169
Coefficient of variation (CV)0.72843483
Kurtosis-1.5105945
Mean1730.4456
Median Absolute Deviation (MAD)284
Skewness-0.10072894
Sum5918124
Variance1588902.7
MonotonicityNot monotonic
2023-12-13T06:01:34.846212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150 12
 
0.4%
2625 12
 
0.4%
2631 12
 
0.4%
2630 12
 
0.4%
2629 12
 
0.4%
2628 12
 
0.4%
2627 12
 
0.4%
2626 12
 
0.4%
2624 12
 
0.4%
2633 12
 
0.4%
Other values (275) 3300
96.5%
ValueCountFrequency (%)
150 12
0.4%
151 12
0.4%
152 12
0.4%
153 12
0.4%
154 12
0.4%
155 12
0.4%
156 12
0.4%
157 12
0.4%
158 12
0.4%
159 12
0.4%
ValueCountFrequency (%)
4138 12
0.4%
4137 12
0.4%
4136 12
0.4%
4135 12
0.4%
4134 12
0.4%
4133 12
0.4%
4132 12
0.4%
4131 12
0.4%
4130 12
0.4%
4129 12
0.4%

역명
Text

Distinct285
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
2023-12-13T06:01:35.277833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length3.877193
Min length2

Characters and Unicode

Total characters13260
Distinct characters223
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울역(1)
2nd row시청(1)
3rd row종각
4th row종로3가(1)
5th row종로5가
ValueCountFrequency (%)
서울역(1 12
 
0.4%
약수(6 12
 
0.4%
이태원 12
 
0.4%
녹사평 12
 
0.4%
삼각지(6 12
 
0.4%
효창공원앞 12
 
0.4%
공덕(6 12
 
0.4%
대흥 12
 
0.4%
상수 12
 
0.4%
구산 12
 
0.4%
Other values (275) 3300
96.5%
2023-12-13T06:01:35.789924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 1044
 
7.9%
) 1044
 
7.9%
384
 
2.9%
336
 
2.5%
276
 
2.1%
264
 
2.0%
228
 
1.7%
5 216
 
1.6%
192
 
1.4%
2 192
 
1.4%
Other values (213) 9084
68.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10044
75.7%
Decimal Number 1128
 
8.5%
Open Punctuation 1044
 
7.9%
Close Punctuation 1044
 
7.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Decimal Number
ValueCountFrequency (%)
5 216
19.1%
2 192
17.0%
3 168
14.9%
6 132
11.7%
7 132
11.7%
4 108
9.6%
8 72
 
6.4%
1 72
 
6.4%
9 36
 
3.2%
Open Punctuation
ValueCountFrequency (%)
( 1044
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1044
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10044
75.7%
Common 3216
 
24.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Common
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10044
75.7%
ASCII 3216
 
24.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 1044
32.5%
) 1044
32.5%
5 216
 
6.7%
2 192
 
6.0%
3 168
 
5.2%
6 132
 
4.1%
7 132
 
4.1%
4 108
 
3.4%
8 72
 
2.2%
1 72
 
2.2%
Hangul
ValueCountFrequency (%)
384
 
3.8%
336
 
3.3%
276
 
2.7%
264
 
2.6%
228
 
2.3%
192
 
1.9%
180
 
1.8%
180
 
1.8%
168
 
1.7%
168
 
1.7%
Other values (202) 7668
76.3%
Distinct12
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size26.8 KiB
Minimum2022-01-01 00:00:00
Maximum2022-12-01 00:00:00
2023-12-13T06:01:35.943032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:36.062591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)

하차인원수
Real number (ℝ)

Distinct3414
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean416697.87
Minimum7458
Maximum2507715
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.2 KiB
2023-12-13T06:01:36.215415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7458
5-th percentile51653.7
Q1189706.75
median325663
Q3526504.75
95-th percentile1119433.7
Maximum2507715
Range2500257
Interquartile range (IQR)336798

Descriptive statistics

Standard deviation345840.1
Coefficient of variation (CV)0.82995407
Kurtosis5.8257419
Mean416697.87
Median Absolute Deviation (MAD)160034.5
Skewness2.0611188
Sum1.4251067 × 109
Variance1.1960537 × 1011
MonotonicityNot monotonic
2023-12-13T06:01:36.355099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
802419 2
 
0.1%
346966 2
 
0.1%
140896 2
 
0.1%
135989 2
 
0.1%
197660 2
 
0.1%
234184 2
 
0.1%
43495 1
 
< 0.1%
9659 1
 
< 0.1%
57410 1
 
< 0.1%
1338840 1
 
< 0.1%
Other values (3404) 3404
99.5%
ValueCountFrequency (%)
7458 1
< 0.1%
8098 1
< 0.1%
9659 1
< 0.1%
9678 1
< 0.1%
9904 1
< 0.1%
10223 1
< 0.1%
10236 1
< 0.1%
10419 1
< 0.1%
10429 1
< 0.1%
10500 1
< 0.1%
ValueCountFrequency (%)
2507715 1
< 0.1%
2369384 1
< 0.1%
2364056 1
< 0.1%
2333099 1
< 0.1%
2280881 1
< 0.1%
2242735 1
< 0.1%
2214810 1
< 0.1%
2212989 1
< 0.1%
2182162 1
< 0.1%
2176484 1
< 0.1%

Interactions

2023-12-13T06:01:33.333280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:31.837319image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.332029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.831895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:33.453596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:31.992666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.452274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.932327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:33.564865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.116814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.581194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:33.064752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:33.677622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.228498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:32.705521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:01:33.205240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:01:36.442798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)수송연월하차인원수
연번1.0000.2040.1810.9590.000
호선0.2041.0000.9420.0000.492
고유역번호(외부역코드)0.1810.9421.0000.0000.489
수송연월0.9590.0000.0001.0000.000
하차인원수0.0000.4920.4890.0001.000
2023-12-13T06:01:36.842718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)하차인원수
연번1.0000.0820.0830.062
호선0.0821.0000.989-0.426
고유역번호(외부역코드)0.0830.9891.000-0.450
하차인원수0.062-0.426-0.4501.000

Missing values

2023-12-13T06:01:33.809388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:01:33.930300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선고유역번호(외부역코드)역명수송연월하차인원수
011150서울역(1)2022-011050145
121151시청(1)2022-01508237
231152종각2022-01796150
341153종로3가(1)2022-01601965
451154종로5가2022-01552920
561155동대문(1)2022-01281274
671156신설동(1)2022-01338965
781157제기동2022-01492913
891158청량리2022-01544757
9101159동묘앞(1)2022-01255518
연번호선고유역번호(외부역코드)역명수송연월하차인원수
3410341194129봉은사2022-12104712
3411341294130종합운동장(9)2022-1224132
3412341394131삼전2022-1250872
3413341494132석촌고분2022-1245030
3414341594133석촌(9)2022-1255644
3415341694134송파나루2022-1248506
3416341794135한성백제2022-1223233
3417341894136올림픽공원(9)2022-1250955
3418341994137둔촌오륜2022-1210561
3419342094138중앙보훈병원2022-1260826