Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory664.1 KiB
Average record size in memory68.0 B

Variable types

Numeric4
Text1
Categorical2

Dataset

Description서울교통공사의 권종별(선불, 후불, 정기권, 우대권, 1회권, 단체권) 승차인원(월별, 역별, 호선별) 데이터입니다. 2022년 12월 데이터까지 업데이트 합니다.
URLhttps://www.data.go.kr/data/15044254/fileData.do

Alerts

연번 is highly overall correlated with 수송연월High correlation
호선 is highly overall correlated with 고유역번호(외부역코드)High correlation
고유역번호(외부역코드) is highly overall correlated with 호선High correlation
수송연월 is highly overall correlated with 연번High correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 07:04:57.649897
Analysis finished2023-12-12 07:05:00.615852
Duration2.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10321.404
Minimum3
Maximum20520
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:05:00.708898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile1070.9
Q15183.75
median10341.5
Q315454.25
95-th percentile19573.05
Maximum20520
Range20517
Interquartile range (IQR)10270.5

Descriptive statistics

Standard deviation5927.4076
Coefficient of variation (CV)0.57428306
Kurtosis-1.1996573
Mean10321.404
Median Absolute Deviation (MAD)5137
Skewness-0.0099087209
Sum1.0321404 × 108
Variance35134161
MonotonicityNot monotonic
2023-12-12T16:05:00.858157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18964 1
 
< 0.1%
1729 1
 
< 0.1%
249 1
 
< 0.1%
1544 1
 
< 0.1%
7634 1
 
< 0.1%
7346 1
 
< 0.1%
4623 1
 
< 0.1%
7024 1
 
< 0.1%
17858 1
 
< 0.1%
11061 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
13 1
< 0.1%
15 1
< 0.1%
17 1
< 0.1%
18 1
< 0.1%
ValueCountFrequency (%)
20520 1
< 0.1%
20518 1
< 0.1%
20517 1
< 0.1%
20514 1
< 0.1%
20513 1
< 0.1%
20510 1
< 0.1%
20508 1
< 0.1%
20507 1
< 0.1%
20506 1
< 0.1%
20505 1
< 0.1%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.804
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:05:00.961010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q37
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1585759
Coefficient of variation (CV)0.44932887
Kurtosis-0.98593469
Mean4.804
Median Absolute Deviation (MAD)2
Skewness0.066218605
Sum48040
Variance4.6594499
MonotonicityNot monotonic
2023-12-12T16:05:01.093469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
5 1964
19.6%
2 1726
17.3%
7 1496
15.0%
6 1297
13.0%
3 1198
12.0%
4 902
9.0%
8 601
 
6.0%
9 461
 
4.6%
1 355
 
3.5%
ValueCountFrequency (%)
1 355
 
3.5%
2 1726
17.3%
3 1198
12.0%
4 902
9.0%
5 1964
19.6%
6 1297
13.0%
7 1496
15.0%
8 601
 
6.0%
9 461
 
4.6%
ValueCountFrequency (%)
9 461
 
4.6%
8 601
 
6.0%
7 1496
15.0%
6 1297
13.0%
5 1964
19.6%
4 902
9.0%
3 1198
12.0%
2 1726
17.3%
1 355
 
3.5%

고유역번호(외부역코드)
Real number (ℝ)

HIGH CORRELATION 

Distinct285
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1729.5503
Minimum150
Maximum4138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:05:01.267056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile205
Q1319.75
median2534
Q32712
95-th percentile2827
Maximum4138
Range3988
Interquartile range (IQR)2392.25

Descriptive statistics

Standard deviation1261.1245
Coefficient of variation (CV)0.72916325
Kurtosis-1.5083272
Mean1729.5503
Median Absolute Deviation (MAD)284
Skewness-0.09647139
Sum17295503
Variance1590435.1
MonotonicityNot monotonic
2023-12-12T16:05:01.823946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2647 47
 
0.5%
2519 46
 
0.5%
2542 45
 
0.4%
2738 45
 
0.4%
2642 44
 
0.4%
2546 43
 
0.4%
428 43
 
0.4%
156 43
 
0.4%
333 42
 
0.4%
4135 42
 
0.4%
Other values (275) 9560
95.6%
ValueCountFrequency (%)
150 32
0.3%
151 34
0.3%
152 35
0.4%
153 37
0.4%
154 38
0.4%
155 30
0.3%
156 43
0.4%
157 37
0.4%
158 30
0.3%
159 39
0.4%
ValueCountFrequency (%)
4138 36
0.4%
4137 35
0.4%
4136 30
0.3%
4135 42
0.4%
4134 32
0.3%
4133 37
0.4%
4132 40
0.4%
4131 35
0.4%
4130 38
0.4%
4129 34
0.3%

역명
Text

Distinct285
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T16:05:02.186438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length3.8641
Min length2

Characters and Unicode

Total characters38641
Distinct characters223
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row군자(5)
2nd row신답
3rd row청구(5)
4th row충무로(4)
5th row상왕십리
ValueCountFrequency (%)
화랑대 47
 
0.5%
까치산(5 46
 
0.5%
마장 45
 
0.4%
이수(7 45
 
0.4%
월곡 44
 
0.4%
신설동(1 43
 
0.4%
아차산 43
 
0.4%
삼각지(4 43
 
0.4%
무악재 42
 
0.4%
한성백제 42
 
0.4%
Other values (275) 9560
95.6%
2023-12-12T16:05:02.789407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 3016
 
7.8%
) 3016
 
7.8%
1122
 
2.9%
1003
 
2.6%
826
 
2.1%
753
 
1.9%
693
 
1.8%
5 651
 
1.7%
571
 
1.5%
2 541
 
1.4%
Other values (213) 26449
68.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29356
76.0%
Decimal Number 3253
 
8.4%
Open Punctuation 3016
 
7.8%
Close Punctuation 3016
 
7.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1122
 
3.8%
1003
 
3.4%
826
 
2.8%
753
 
2.6%
693
 
2.4%
571
 
1.9%
523
 
1.8%
503
 
1.7%
498
 
1.7%
480
 
1.6%
Other values (202) 22384
76.3%
Decimal Number
ValueCountFrequency (%)
5 651
20.0%
2 541
16.6%
3 480
14.8%
7 385
11.8%
6 367
11.3%
4 316
9.7%
1 215
 
6.6%
8 193
 
5.9%
9 105
 
3.2%
Open Punctuation
ValueCountFrequency (%)
( 3016
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3016
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29356
76.0%
Common 9285
 
24.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1122
 
3.8%
1003
 
3.4%
826
 
2.8%
753
 
2.6%
693
 
2.4%
571
 
1.9%
523
 
1.8%
503
 
1.7%
498
 
1.7%
480
 
1.6%
Other values (202) 22384
76.3%
Common
ValueCountFrequency (%)
( 3016
32.5%
) 3016
32.5%
5 651
 
7.0%
2 541
 
5.8%
3 480
 
5.2%
7 385
 
4.1%
6 367
 
4.0%
4 316
 
3.4%
1 215
 
2.3%
8 193
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29356
76.0%
ASCII 9285
 
24.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 3016
32.5%
) 3016
32.5%
5 651
 
7.0%
2 541
 
5.8%
3 480
 
5.2%
7 385
 
4.1%
6 367
 
4.0%
4 316
 
3.4%
1 215
 
2.3%
8 193
 
2.1%
Hangul
ValueCountFrequency (%)
1122
 
3.8%
1003
 
3.4%
826
 
2.8%
753
 
2.6%
693
 
2.4%
571
 
1.9%
523
 
1.8%
503
 
1.7%
498
 
1.7%
480
 
1.6%
Other values (202) 22384
76.3%

수송연월
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2022-09
863 
2022-10
848 
2022-12
843 
2022-02
839 
2022-11
838 
Other values (7)
5769 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-12
2nd row2022-02
3rd row2022-12
4th row2022-10
5th row2022-12

Common Values

ValueCountFrequency (%)
2022-09 863
8.6%
2022-10 848
8.5%
2022-12 843
8.4%
2022-02 839
8.4%
2022-11 838
8.4%
2022-06 834
8.3%
2022-03 830
8.3%
2022-07 828
8.3%
2022-05 827
8.3%
2022-08 823
8.2%
Other values (2) 1627
16.3%

Length

2023-12-12T16:05:02.945585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-09 863
8.6%
2022-10 848
8.5%
2022-12 843
8.4%
2022-02 839
8.4%
2022-11 838
8.4%
2022-06 834
8.3%
2022-03 830
8.3%
2022-07 828
8.3%
2022-05 827
8.3%
2022-08 823
8.2%
Other values (2) 1627
16.3%

권종
Categorical

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
정기권
1694 
1회권
1694 
우대권
1665 
선불
1662 
단체권
1656 

Length

Max length3
Median length3
Mean length2.6709
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row선불
2nd row후불
3rd row선불
4th row후불
5th row선불

Common Values

ValueCountFrequency (%)
정기권 1694
16.9%
1회권 1694
16.9%
우대권 1665
16.7%
선불 1662
16.6%
단체권 1656
16.6%
후불 1629
16.3%

Length

2023-12-12T16:05:03.059213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:05:03.192954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정기권 1694
16.9%
1회권 1694
16.9%
우대권 1665
16.7%
선불 1662
16.6%
단체권 1656
16.6%
후불 1629
16.3%

승차인원수
Real number (ℝ)

Distinct8585
Distinct (%)85.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70171.198
Minimum0
Maximum1611119
Zeros73
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T16:05:03.323419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile699.95
Q12425.25
median16639.5
Q381729.75
95-th percentile299199.4
Maximum1611119
Range1611119
Interquartile range (IQR)79304.5

Descriptive statistics

Standard deviation131932.13
Coefficient of variation (CV)1.8801464
Kurtosis28.071962
Mean70171.198
Median Absolute Deviation (MAD)15829.5
Skewness4.3544723
Sum7.0171198 × 108
Variance1.7406087 × 1010
MonotonicityNot monotonic
2023-12-12T16:05:03.483077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 73
 
0.7%
1088 6
 
0.1%
1722 5
 
0.1%
757 5
 
0.1%
1362 5
 
0.1%
2744 5
 
0.1%
659 5
 
0.1%
1265 5
 
0.1%
1774 5
 
0.1%
1537 5
 
0.1%
Other values (8575) 9881
98.8%
ValueCountFrequency (%)
0 73
0.7%
87 1
 
< 0.1%
90 1
 
< 0.1%
103 1
 
< 0.1%
108 1
 
< 0.1%
116 1
 
< 0.1%
127 2
 
< 0.1%
129 1
 
< 0.1%
132 1
 
< 0.1%
133 2
 
< 0.1%
ValueCountFrequency (%)
1611119 1
< 0.1%
1610601 1
< 0.1%
1524155 1
< 0.1%
1520522 1
< 0.1%
1508910 1
< 0.1%
1473243 1
< 0.1%
1448127 1
< 0.1%
1442599 1
< 0.1%
1432465 1
< 0.1%
1427873 1
< 0.1%

Interactions

2023-12-12T16:04:59.892692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.433670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.847574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.339193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:05:00.024605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.519767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.961458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.456302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:05:00.173426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.631960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.078385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.582851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:05:00.299636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:58.735897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.193173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:04:59.736367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:05:03.602068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)수송연월권종승차인원수
연번1.0000.0100.0090.9590.2370.041
호선0.0101.0000.9420.0000.0000.186
고유역번호(외부역코드)0.0090.9421.0000.0000.0000.176
수송연월0.9590.0000.0001.0000.0000.000
권종0.2370.0000.0000.0001.0000.500
승차인원수0.0410.1860.1760.0000.5001.000
2023-12-12T16:05:03.698324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
권종수송연월
권종1.0000.000
수송연월0.0001.000
2023-12-12T16:05:03.781090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선고유역번호(외부역코드)승차인원수수송연월권종
연번1.0000.0080.008-0.0220.8390.126
호선0.0081.0000.989-0.1390.0000.000
고유역번호(외부역코드)0.0080.9891.000-0.1480.0000.000
승차인원수-0.022-0.139-0.1481.0000.0000.291
수송연월0.8390.0000.0000.0001.0000.000
권종0.1260.0000.0000.2910.0001.000

Missing values

2023-12-12T16:05:00.438157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:05:00.556192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선고유역번호(외부역코드)역명수송연월권종승차인원수
189631896452545군자(5)2022-12선불74133
204920502245신답2022-02후불20657
189561895752538청구(5)2022-12선불22215
15782157834423충무로(4)2022-10후불459185
18826188272207상왕십리2022-12선불89956
8764876572713수락산2022-06선불58974
19420194212231신대방2022-12정기권30238
605260533316독립문2022-04우대권63130
204152041662617새절2022-12단체권1835
152611526252548천호(5)2022-09단체권1947
연번호선고유역번호(외부역코드)역명수송연월권종승차인원수
3318331962620월드컵경기장2022-02단체권2957
7086708772745신풍2022-05선불70811
127541275572713수락산2022-08정기권14963
894689474427숙대입구2022-06후불241695
858985902230신림2022-06선불356010
631363142234신도림2022-041회권4004
5915922212건대입구(2)2022-01정기권22827
9027902862629삼각지(6)2022-06후불145117
18811188121151시청(1)2022-12선불193420
12832128331157제기동2022-08우대권261885