Overview

Dataset statistics

Number of variables4
Number of observations51
Missing cells9
Missing cells (%)4.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory37.6 B

Variable types

Text1
Numeric3

Dataset

Description역별 KTX 하행 여객 승하차 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068468/fileData.do

Alerts

하차 is highly overall correlated with 인키로High correlation
인키로 is highly overall correlated with 하차High correlation
승차 has 5 (9.8%) missing valuesMissing
하차 has 2 (3.9%) missing valuesMissing
인키로 has 2 (3.9%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 13:05:15.302579
Analysis finished2023-12-12 13:05:17.106975
Duration1.8 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct51
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size540.0 B
2023-12-12T22:05:17.322473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length2
Mean length2.4509804
Min length2

Characters and Unicode

Total characters125
Distinct characters70
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)100.0%

Sample

1st row강릉
2nd row검암
3rd row경산
4th row계룡
5th row곡성
ValueCountFrequency (%)
강릉 1
 
2.0%
순천 1
 
2.0%
양평 1
 
2.0%
여수엑스포 1
 
2.0%
여천 1
 
2.0%
영등포 1
 
2.0%
오송 1
 
2.0%
용산 1
 
2.0%
울산 1
 
2.0%
익산 1
 
2.0%
Other values (41) 41
80.4%
2023-12-12T22:05:17.784631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
6.4%
6
 
4.8%
6
 
4.8%
5
 
4.0%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (60) 79
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 123
98.4%
Uppercase Letter 1
 
0.8%
Decimal Number 1
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Uppercase Letter
ValueCountFrequency (%)
T 1
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 123
98.4%
Latin 1
 
0.8%
Common 1
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Latin
ValueCountFrequency (%)
T 1
100.0%
Common
ValueCountFrequency (%)
2 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 123
98.4%
ASCII 2
 
1.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
ASCII
ValueCountFrequency (%)
T 1
50.0%
2 1
50.0%

승차
Real number (ℝ)

MISSING 

Distinct46
Distinct (%)100.0%
Missing5
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean723732.61
Minimum1
Maximum13780902
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T22:05:17.964736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile947.75
Q112429.25
median35051
Q3233942.25
95-th percentile3813676
Maximum13780902
Range13780901
Interquartile range (IQR)221513

Descriptive statistics

Standard deviation2216096.6
Coefficient of variation (CV)3.0620378
Kurtosis27.945271
Mean723732.61
Median Absolute Deviation (MAD)33306
Skewness5.0065796
Sum33291700
Variance4.911084 × 1012
MonotonicityNot monotonic
2023-12-12T22:05:18.179934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
130290 1
 
2.0%
32307 1
 
2.0%
476 1
 
2.0%
147009 1
 
2.0%
932884 1
 
2.0%
4908924 1
 
2.0%
178781 1
 
2.0%
382861 1
 
2.0%
59722 1
 
2.0%
14696 1
 
2.0%
Other values (36) 36
70.6%
(Missing) 5
 
9.8%
ValueCountFrequency (%)
1 1
2.0%
476 1
2.0%
830 1
2.0%
1301 1
2.0%
2189 1
2.0%
2572 1
2.0%
4015 1
2.0%
5371 1
2.0%
8335 1
2.0%
8570 1
2.0%
ValueCountFrequency (%)
13780902 1
2.0%
4908924 1
2.0%
4527436 1
2.0%
1672396 1
2.0%
1555762 1
2.0%
1527006 1
2.0%
948233 1
2.0%
932884 1
2.0%
770568 1
2.0%
548810 1
2.0%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct49
Distinct (%)100.0%
Missing2
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean679422.45
Minimum64
Maximum5742556
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T22:05:18.369605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum64
5-th percentile610.6
Q152820
median206069
Q3711150
95-th percentile2751429.4
Maximum5742556
Range5742492
Interquartile range (IQR)658330

Descriptive statistics

Standard deviation1152269.9
Coefficient of variation (CV)1.695955
Kurtosis9.1208762
Mean679422.45
Median Absolute Deviation (MAD)180712
Skewness2.8804005
Sum33291700
Variance1.3277259 × 1012
MonotonicityNot monotonic
2023-12-12T22:05:18.548892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
735713 1
 
2.0%
536578 1
 
2.0%
70357 1
 
2.0%
527737 1
 
2.0%
212298 1
 
2.0%
259 1
 
2.0%
1799898 1
 
2.0%
2653 1
 
2.0%
1930224 1
 
2.0%
959664 1
 
2.0%
Other values (39) 39
76.5%
(Missing) 2
 
3.9%
ValueCountFrequency (%)
64 1
2.0%
125 1
2.0%
259 1
2.0%
1138 1
2.0%
2653 1
2.0%
2669 1
2.0%
33887 1
2.0%
34021 1
2.0%
36883 1
2.0%
37215 1
2.0%
ValueCountFrequency (%)
5742556 1
2.0%
4468446 1
2.0%
3245951 1
2.0%
2009647 1
2.0%
1930224 1
2.0%
1801251 1
2.0%
1799898 1
2.0%
1561776 1
2.0%
1085362 1
2.0%
959664 1
2.0%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct49
Distinct (%)100.0%
Missing2
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean1.6355107 × 108
Minimum18483
Maximum2.0313909 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T22:05:18.736195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum18483
5-th percentile126299.6
Q18981125
median49928653
Q31.6737027 × 108
95-th percentile5.2963179 × 108
Maximum2.0313909 × 109
Range2.0313724 × 109
Interquartile range (IQR)1.5838914 × 108

Descriptive statistics

Standard deviation3.2833977 × 108
Coefficient of variation (CV)2.0075672
Kurtosis22.421849
Mean1.6355107 × 108
Median Absolute Deviation (MAD)49254337
Skewness4.3297967
Sum8.0140024 × 109
Variance1.07807 × 1017
MonotonicityNot monotonic
2023-12-12T22:05:18.933588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
148773512 1
 
2.0%
139645375 1
 
2.0%
6731440 1
 
2.0%
167370269 1
 
2.0%
71421641 1
 
2.0%
75646 1
 
2.0%
265643316 1
 
2.0%
674316 1
 
2.0%
522852816 1
 
2.0%
164423391 1
 
2.0%
Other values (39) 39
76.5%
(Missing) 2
 
3.9%
ValueCountFrequency (%)
18483 1
2.0%
28216 1
2.0%
75646 1
2.0%
202280 1
2.0%
534305 1
2.0%
674316 1
2.0%
5646955 1
2.0%
6731440 1
2.0%
7324595 1
2.0%
7364105 1
2.0%
ValueCountFrequency (%)
2031390876 1
2.0%
945454740 1
2.0%
534151106 1
2.0%
522852816 1
2.0%
520210887 1
2.0%
330951936 1
2.0%
289659833 1
2.0%
265643316 1
2.0%
256916706 1
2.0%
211995418 1
2.0%

Interactions

2023-12-12T22:05:16.117880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:15.431889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:15.734908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:16.220654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:15.536160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:15.883615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:16.673123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:15.630299image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T22:05:16.013039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T22:05:19.067597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명승차하차인키로
역명1.0001.0001.0001.000
승차1.0001.0000.5580.364
하차1.0000.5581.0000.915
인키로1.0000.3640.9151.000
2023-12-12T22:05:19.184966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.0850.053
하차0.0851.0000.979
인키로0.0530.9791.000

Missing values

2023-12-12T22:05:16.841609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T22:05:16.933191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T22:05:17.040415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0강릉<NA>1561776330951936
1검암216626418483
2경산8335368838791331
3계룡113429872018797364
4곡성5371372157364105
5공주31053528207682837
6광명452743616404941477521
7광주송정1116452009647534151106
8구례구2189340219303339
9구포2572486056145176697
역명승차하차인키로
41진영8309976029759961
42진주<NA>16442148877496
43창원130117939455730230
44창원중앙13185763394211995418
45천안아산15557621801251256916706
46청량리9482332669534305
47평창2399010096712532567
48포항<NA>1085362289659833
49행신770568<NA><NA>
50횡성20533639017324595