Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

DateTime1
Text1
Categorical1
Numeric2

Dataset

Description한국철공사의 정차역별(서울, 용산, 영등포, 안양, 수원, 오산, 서정리, 평택, 성환, 천안 등), 상하행구분별 승하차인원수 데이터를 제공합니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15088873/fileData.do

Alerts

승차인원수 is highly overall correlated with 하차인원수High correlation
하차인원수 is highly overall correlated with 승차인원수High correlation
승차인원수 has 629 (6.3%) zerosZeros
하차인원수 has 618 (6.2%) zerosZeros

Reproduction

Analysis started2023-12-12 15:42:51.955601
Analysis finished2023-12-12 15:42:52.928007
Duration0.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct234
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2020-01-01 00:00:00
Maximum2020-08-21 00:00:00
2023-12-13T00:42:53.014922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:42:53.184222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct225
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T00:42:53.574976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2347
Min length2

Characters and Unicode

Total characters22347
Distinct characters176
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row영등포
2nd row목포
3rd row임성리
4th row예산
5th row포항
ValueCountFrequency (%)
아산 63
 
0.6%
공주 62
 
0.6%
논산 60
 
0.6%
화본 60
 
0.6%
신해운대 59
 
0.6%
일로 59
 
0.6%
포항 59
 
0.6%
옥천 59
 
0.6%
서울 59
 
0.6%
다시 58
 
0.6%
Other values (215) 9402
94.0%
2023-12-13T00:42:54.101183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1128
 
5.0%
831
 
3.7%
791
 
3.5%
572
 
2.6%
566
 
2.5%
563
 
2.5%
546
 
2.4%
524
 
2.3%
436
 
2.0%
379
 
1.7%
Other values (166) 16011
71.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 22347
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1128
 
5.0%
831
 
3.7%
791
 
3.5%
572
 
2.6%
566
 
2.5%
563
 
2.5%
546
 
2.4%
524
 
2.3%
436
 
2.0%
379
 
1.7%
Other values (166) 16011
71.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 22347
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1128
 
5.0%
831
 
3.7%
791
 
3.5%
572
 
2.6%
566
 
2.5%
563
 
2.5%
546
 
2.4%
524
 
2.3%
436
 
2.0%
379
 
1.7%
Other values (166) 16011
71.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 22347
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1128
 
5.0%
831
 
3.7%
791
 
3.5%
572
 
2.6%
566
 
2.5%
563
 
2.5%
546
 
2.4%
524
 
2.3%
436
 
2.0%
379
 
1.7%
Other values (166) 16011
71.6%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
상행
5039 
하행
4961 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row하행
2nd row상행
3rd row상행
4th row하행
5th row하행

Common Values

ValueCountFrequency (%)
상행 5039
50.4%
하행 4961
49.6%

Length

2023-12-13T00:42:54.609730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:42:54.732345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상행 5039
50.4%
하행 4961
49.6%

승차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1796
Distinct (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean570.0901
Minimum0
Maximum51992
Zeros629
Zeros (%)6.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T00:42:54.882376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q18
median42
Q3240
95-th percentile2779.05
Maximum51992
Range51992
Interquartile range (IQR)232

Descriptive statistics

Standard deviation2238.5145
Coefficient of variation (CV)3.9265978
Kurtosis150.39725
Mean570.0901
Median Absolute Deviation (MAD)41
Skewness10.440243
Sum5700901
Variance5010947.3
MonotonicityNot monotonic
2023-12-13T00:42:55.065839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 629
 
6.3%
1 431
 
4.3%
2 321
 
3.2%
3 296
 
3.0%
4 237
 
2.4%
5 208
 
2.1%
7 188
 
1.9%
6 179
 
1.8%
8 171
 
1.7%
12 135
 
1.4%
Other values (1786) 7205
72.0%
ValueCountFrequency (%)
0 629
6.3%
1 431
4.3%
2 321
3.2%
3 296
3.0%
4 237
 
2.4%
5 208
 
2.1%
6 179
 
1.8%
7 188
 
1.9%
8 171
 
1.7%
9 128
 
1.3%
ValueCountFrequency (%)
51992 1
< 0.1%
46215 1
< 0.1%
43194 1
< 0.1%
39519 1
< 0.1%
38177 1
< 0.1%
36596 1
< 0.1%
35662 1
< 0.1%
34846 1
< 0.1%
33816 1
< 0.1%
33791 1
< 0.1%

하차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1810
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean595.5279
Minimum0
Maximum73214
Zeros618
Zeros (%)6.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T00:42:55.232730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q17
median41
Q3242.25
95-th percentile2756.3
Maximum73214
Range73214
Interquartile range (IQR)235.25

Descriptive statistics

Standard deviation2638.8877
Coefficient of variation (CV)4.431174
Kurtosis222.80421
Mean595.5279
Median Absolute Deviation (MAD)40
Skewness12.832177
Sum5955279
Variance6963728.4
MonotonicityNot monotonic
2023-12-13T00:42:55.414096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 618
 
6.2%
1 477
 
4.8%
2 378
 
3.8%
3 315
 
3.1%
4 245
 
2.5%
5 209
 
2.1%
6 191
 
1.9%
7 177
 
1.8%
9 155
 
1.6%
8 151
 
1.5%
Other values (1800) 7084
70.8%
ValueCountFrequency (%)
0 618
6.2%
1 477
4.8%
2 378
3.8%
3 315
3.1%
4 245
 
2.5%
5 209
 
2.1%
6 191
 
1.9%
7 177
 
1.8%
8 151
 
1.5%
9 155
 
1.6%
ValueCountFrequency (%)
73214 1
< 0.1%
64258 1
< 0.1%
51563 1
< 0.1%
51542 1
< 0.1%
51052 1
< 0.1%
46348 1
< 0.1%
46329 1
< 0.1%
45967 1
< 0.1%
43895 1
< 0.1%
41431 1
< 0.1%

Interactions

2023-12-13T00:42:52.494414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:42:52.272358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:42:52.611980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T00:42:52.371509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:42:55.537971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상행하행구분승차인원수하차인원수
상행하행구분1.0000.0530.050
승차인원수0.0531.0000.233
하차인원수0.0500.2331.000
2023-12-13T00:42:55.666886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차인원수하차인원수상행하행구분
승차인원수1.0000.6230.040
하차인원수0.6231.0000.038
상행하행구분0.0400.0381.000

Missing values

2023-12-13T00:42:52.776338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:42:52.879944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

운행일자정차역상행하행구분승차인원수하차인원수
731202020-06-19영등포하행1040018
233222020-02-23목포상행16120
432482020-04-10임성리상행120
656202020-06-01예산하행108590
632262020-05-26포항하행1461852
961242020-08-12곡성상행14684
854372020-07-17함안하행2075
868252020-07-21부강하행2578
712652020-06-14도계상행4238
283322020-03-06서경주하행3838
운행일자정차역상행하행구분승차인원수하차인원수
231722020-02-23신탄진하행250927
706392020-06-13매곡하행113
406972020-04-04여천상행5542
291812020-03-08논산하행124441
445602020-04-13오근장상행70168
128452020-01-30영주하행193323
309792020-03-12용궁상행11
745072020-06-22반곡상행709
279552020-03-05임성리상행120
331492020-03-17태백상행12121