Overview

Dataset statistics

Number of variables4
Number of observations236
Missing cells40
Missing cells (%)4.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.2 KiB
Average record size in memory35.6 B

Variable types

Text1
Numeric3

Dataset

Description역별 무궁화 상행 여객 승하차 실적 입니다. 역별 승차, 하차, 인키로 데이터로 구성되어 있습니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068479/fileData.do

Alerts

승차 is highly overall correlated with 하차 and 1 other fieldsHigh correlation
하차 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
인키로 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
승차 has 16 (6.8%) missing valuesMissing
하차 has 12 (5.1%) missing valuesMissing
인키로 has 12 (5.1%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 07:42:08.074243
Analysis finished2023-12-12 07:42:09.456329
Duration1.38 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct236
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
2023-12-12T16:42:09.763959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2288136
Min length2

Characters and Unicode

Total characters526
Distinct characters181
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)100.0%

Sample

1st row가평
2nd row각계
3rd row간석
4th row강경
5th row강구
ValueCountFrequency (%)
가평 1
 
0.4%
의정부 1
 
0.4%
전곡 1
 
0.4%
용궁 1
 
0.4%
용문 1
 
0.4%
용산 1
 
0.4%
웅천 1
 
0.4%
원동 1
 
0.4%
원주 1
 
0.4%
월내 1
 
0.4%
Other values (226) 226
95.8%
2023-12-12T16:42:10.242711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 526
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 526
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 526
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

승차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct220
Distinct (%)100.0%
Missing16
Missing (%)6.8%
Infinite0
Infinite (%)0.0%
Mean127359.32
Minimum1
Maximum1998931
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T16:42:10.392681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile360.45
Q14382.5
median23487
Q390827.25
95-th percentile520678.1
Maximum1998931
Range1998930
Interquartile range (IQR)86444.75

Descriptive statistics

Standard deviation296613.89
Coefficient of variation (CV)2.3289531
Kurtosis17.926595
Mean127359.32
Median Absolute Deviation (MAD)22665.5
Skewness4.0607666
Sum28019051
Variance8.7979799 × 1010
MonotonicityNot monotonic
2023-12-12T16:42:10.558710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14070 1
 
0.4%
86572 1
 
0.4%
1 1
 
0.4%
44161 1
 
0.4%
34775 1
 
0.4%
548078 1
 
0.4%
13749 1
 
0.4%
8544 1
 
0.4%
49537 1
 
0.4%
23223 1
 
0.4%
Other values (210) 210
89.0%
(Missing) 16
 
6.8%
ValueCountFrequency (%)
1 1
0.4%
3 1
0.4%
27 1
0.4%
123 1
0.4%
129 1
0.4%
186 1
0.4%
246 1
0.4%
316 1
0.4%
321 1
0.4%
322 1
0.4%
ValueCountFrequency (%)
1998931 1
0.4%
1675082 1
0.4%
1609755 1
0.4%
1607107 1
0.4%
1473786 1
0.4%
1229098 1
0.4%
1148855 1
0.4%
1016564 1
0.4%
834551 1
0.4%
576776 1
0.4%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct223
Distinct (%)99.6%
Missing12
Missing (%)5.1%
Infinite0
Infinite (%)0.0%
Mean125085.05
Minimum8
Maximum3436201
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T16:42:10.769433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile321.9
Q11776
median11618
Q356035
95-th percentile622734.55
Maximum3436201
Range3436193
Interquartile range (IQR)54259

Descriptive statistics

Standard deviation387961.34
Coefficient of variation (CV)3.1015804
Kurtosis33.934072
Mean125085.05
Median Absolute Deviation (MAD)11052.5
Skewness5.3592677
Sum28019051
Variance1.50514 × 1011
MonotonicityNot monotonic
2023-12-12T16:42:10.987460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
464 2
 
0.8%
1047 1
 
0.4%
1534323 1
 
0.4%
9654 1
 
0.4%
49562 1
 
0.4%
132203 1
 
0.4%
15200 1
 
0.4%
4224 1
 
0.4%
18438 1
 
0.4%
19671 1
 
0.4%
Other values (213) 213
90.3%
(Missing) 12
 
5.1%
ValueCountFrequency (%)
8 1
0.4%
11 1
0.4%
28 1
0.4%
32 1
0.4%
130 1
0.4%
157 1
0.4%
188 1
0.4%
210 1
0.4%
217 1
0.4%
243 1
0.4%
ValueCountFrequency (%)
3436201 1
0.4%
2362352 1
0.4%
2310984 1
0.4%
1534323 1
0.4%
1484660 1
0.4%
1376504 1
0.4%
1243330 1
0.4%
1123420 1
0.4%
1114498 1
0.4%
686187 1
0.4%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct224
Distinct (%)100.0%
Missing12
Missing (%)5.1%
Infinite0
Infinite (%)0.0%
Mean12292736
Minimum670
Maximum4.0051176 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T16:42:11.245190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum670
5-th percentile40562
Q1191303.25
median1043941
Q35123953.8
95-th percentile48054888
Maximum4.0051176 × 108
Range4.0051109 × 108
Interquartile range (IQR)4932650.5

Descriptive statistics

Standard deviation43691033
Coefficient of variation (CV)3.5542154
Kurtosis43.037645
Mean12292736
Median Absolute Deviation (MAD)980054.5
Skewness6.137737
Sum2.7535729 × 109
Variance1.9089063 × 1015
MonotonicityNot monotonic
2023-12-12T16:42:11.447947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
344526 1
 
0.4%
2269072 1
 
0.4%
211406103 1
 
0.4%
1230520 1
 
0.4%
1634821 1
 
0.4%
12716595 1
 
0.4%
683841 1
 
0.4%
92360 1
 
0.4%
1258145 1
 
0.4%
1859815 1
 
0.4%
Other values (214) 214
90.7%
(Missing) 12
 
5.1%
ValueCountFrequency (%)
670 1
0.4%
1943 1
0.4%
2032 1
0.4%
8688 1
0.4%
10525 1
0.4%
15813 1
0.4%
19855 1
0.4%
22386 1
0.4%
25311 1
0.4%
25394 1
0.4%
ValueCountFrequency (%)
400511758 1
0.4%
323093118 1
0.4%
211406103 1
0.4%
209344144 1
0.4%
189366065 1
0.4%
117556137 1
0.4%
109212451 1
0.4%
101519313 1
0.4%
95354220 1
0.4%
65108216 1
0.4%

Interactions

2023-12-12T16:42:08.909262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.242980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.606890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:09.011803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.363365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.708874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:09.101794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.487095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:42:08.812663image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:42:11.566254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.8370.756
하차0.8371.0000.922
인키로0.7560.9221.000
2023-12-12T16:42:11.666978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.7480.741
하차0.7481.0000.975
인키로0.7410.9751.000

Missing values

2023-12-12T16:42:09.238060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:42:09.318382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T16:42:09.399605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0가평2542<NA><NA>
1각계405815813
2간석<NA>919187922
3강경100493246452637473
4강구1758562225311
5강릉35018275782111313
6개포3217855332
7건천237551322386
8경산57677643537431974705
9경주25327426079716523852
역명승차하차인키로
226현동12392988540
227호계16501619040714257792
228홍성372565571305149612
229화명48082330983374444
230화본55646520611561
231화순57865958603650
232황간28328118331267928
233횡천605371362096
234효천53032309267198
235희방사12121308688