Overview

Dataset statistics

Number of variables4
Number of observations269
Missing cells102
Missing cells (%)9.5%
Duplicate rows1
Duplicate rows (%)0.4%
Total size in memory9.1 KiB
Average record size in memory34.5 B

Variable types

Text1
Categorical1
Numeric2

Dataset

Description매년 발간되는 철도통계연보에 수록된 역별 승하차 현황으로(역명, 승차인원, 하차인원 등), 간선여객 수송인원에 한하여 제공합니다.
URLhttps://www.data.go.kr/data/15029727/fileData.do

Alerts

단위 has constant value ""Constant
Dataset has 1 (0.4%) duplicate rowsDuplicates
승차인원 is highly overall correlated with 하차인원High correlation
하차인원 is highly overall correlated with 승차인원High correlation
역명 has 34 (12.6%) missing valuesMissing
승차인원 has 34 (12.6%) missing valuesMissing
하차인원 has 34 (12.6%) missing valuesMissing

Reproduction

Analysis started2023-12-11 23:49:14.074742
Analysis finished2023-12-11 23:49:14.889215
Duration0.81 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

MISSING 

Distinct235
Distinct (%)100.0%
Missing34
Missing (%)12.6%
Memory size2.2 KiB
2023-12-12T08:49:15.248158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2765957
Min length2

Characters and Unicode

Total characters535
Distinct characters179
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique235 ?
Unique (%)100.0%

Sample

1st row서울
2nd row용산
3rd row행신
4th row일산
5th row도라산
ValueCountFrequency (%)
미군기지 1
 
0.4%
북영천 1
 
0.4%
영덕 1
 
0.4%
명봉 1
 
0.4%
김천 1
 
0.4%
구미 1
 
0.4%
약목 1
 
0.4%
왜관 1
 
0.4%
신동 1
 
0.4%
대구 1
 
0.4%
Other values (225) 225
95.7%
2023-12-12T08:49:15.806240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
4.9%
20
 
3.7%
20
 
3.7%
14
 
2.6%
13
 
2.4%
13
 
2.4%
12
 
2.2%
10
 
1.9%
10
 
1.9%
10
 
1.9%
Other values (169) 387
72.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 535
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
4.9%
20
 
3.7%
20
 
3.7%
14
 
2.6%
13
 
2.4%
13
 
2.4%
12
 
2.2%
10
 
1.9%
10
 
1.9%
10
 
1.9%
Other values (169) 387
72.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 535
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
4.9%
20
 
3.7%
20
 
3.7%
14
 
2.6%
13
 
2.4%
13
 
2.4%
12
 
2.2%
10
 
1.9%
10
 
1.9%
10
 
1.9%
Other values (169) 387
72.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 535
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
4.9%
20
 
3.7%
20
 
3.7%
14
 
2.6%
13
 
2.4%
13
 
2.4%
12
 
2.2%
10
 
1.9%
10
 
1.9%
10
 
1.9%
Other values (169) 387
72.3%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
269 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
269
100.0%

Length

2023-12-12T08:49:15.981379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:49:16.104268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
269
100.0%

승차인원
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct235
Distinct (%)100.0%
Missing34
Missing (%)12.6%
Infinite0
Infinite (%)0.0%
Mean516620.87
Minimum0
Maximum16701060
Zeros1
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T08:49:16.223127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile349
Q15878
median43905
Q3324954.5
95-th percentile2415907.7
Maximum16701060
Range16701060
Interquartile range (IQR)319076.5

Descriptive statistics

Standard deviation1563528.9
Coefficient of variation (CV)3.0264532
Kurtosis53.86825
Mean516620.87
Median Absolute Deviation (MAD)42580
Skewness6.4076305
Sum1.214059 × 108
Variance2.4446226 × 1012
MonotonicityNot monotonic
2023-12-12T08:49:16.407837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
345168 1
 
0.4%
1325 1
 
0.4%
764427 1
 
0.4%
1978650 1
 
0.4%
44916 1
 
0.4%
651879 1
 
0.4%
2980 1
 
0.4%
1834877 1
 
0.4%
8263464 1
 
0.4%
958041 1
 
0.4%
Other values (225) 225
83.6%
(Missing) 34
 
12.6%
ValueCountFrequency (%)
0 1
0.4%
2 1
0.4%
3 1
0.4%
10 1
0.4%
49 1
0.4%
58 1
0.4%
66 1
0.4%
82 1
0.4%
112 1
0.4%
115 1
0.4%
ValueCountFrequency (%)
16701060 1
0.4%
8263464 1
0.4%
7190808 1
0.4%
6397281 1
0.4%
6000682 1
0.4%
5260330 1
0.4%
4941467 1
0.4%
3797768 1
0.4%
3740829 1
0.4%
3440680 1
0.4%

하차인원
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct234
Distinct (%)99.6%
Missing34
Missing (%)12.6%
Infinite0
Infinite (%)0.0%
Mean516620.87
Minimum0
Maximum16649827
Zeros2
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-12T08:49:16.587602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile180.6
Q16282.5
median46488
Q3308881.5
95-th percentile2414734.8
Maximum16649827
Range16649827
Interquartile range (IQR)302599

Descriptive statistics

Standard deviation1566097
Coefficient of variation (CV)3.0314241
Kurtosis53.076
Mean516620.87
Median Absolute Deviation (MAD)45133
Skewness6.3690686
Sum1.214059 × 108
Variance2.4526597 × 1012
MonotonicityNot monotonic
2023-12-12T08:49:16.805642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2
 
0.7%
343225 1
 
0.4%
1355 1
 
0.4%
766632 1
 
0.4%
1994872 1
 
0.4%
51498 1
 
0.4%
644679 1
 
0.4%
3525 1
 
0.4%
1836905 1
 
0.4%
8300091 1
 
0.4%
Other values (224) 224
83.3%
(Missing) 34
 
12.6%
ValueCountFrequency (%)
0 2
0.7%
1 1
0.4%
3 1
0.4%
16 1
0.4%
24 1
0.4%
32 1
0.4%
58 1
0.4%
82 1
0.4%
91 1
0.4%
104 1
0.4%
ValueCountFrequency (%)
16649827 1
0.4%
8300091 1
0.4%
7274191 1
0.4%
6400902 1
0.4%
6157505 1
0.4%
5323229 1
0.4%
4892871 1
0.4%
3735966 1
0.4%
3715260 1
0.4%
3516441 1
0.4%

Interactions

2023-12-12T08:49:14.461449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:49:14.233488image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:49:14.547237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:49:14.350393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:49:16.939233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차인원하차인원
승차인원1.0001.000
하차인원1.0001.000
2023-12-12T08:49:17.041634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차인원하차인원
승차인원1.0000.999
하차인원0.9991.000

Missing values

2023-12-12T08:49:14.668582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:49:14.753042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T08:49:14.839174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명단위승차인원하차인원
0서울1670106016649827
1용산60006826157505
2행신914436878286
3일산457464
4도라산853853
5서빙고100
6영등포34406803516441
7안양192182166519
8수원52603305323229
9오산8251390305
역명단위승차인원하차인원
259<NA><NA><NA>
260<NA><NA><NA>
261<NA><NA><NA>
262<NA><NA><NA>
263<NA><NA><NA>
264<NA><NA><NA>
265<NA><NA><NA>
266<NA><NA><NA>
267<NA><NA><NA>
268<NA><NA><NA>

Duplicate rows

Most frequently occurring

역명단위승차인원하차인원# duplicates
0<NA><NA><NA>34