Overview

Dataset statistics

Number of variables4
Number of observations51
Missing cells14
Missing cells (%)6.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory37.6 B

Variable types

Text1
Numeric3

Dataset

Description역별 KTX 상행 여객 승하차 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068465/fileData.do

Alerts

하차 is highly overall correlated with 인키로High correlation
인키로 is highly overall correlated with 하차High correlation
승차 has 2 (3.9%) missing valuesMissing
하차 has 6 (11.8%) missing valuesMissing
인키로 has 6 (11.8%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:50:56.323703
Analysis finished2023-12-12 04:50:58.272611
Duration1.95 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct51
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size540.0 B
2023-12-12T13:50:58.484317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length2
Mean length2.4509804
Min length2

Characters and Unicode

Total characters125
Distinct characters70
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)100.0%

Sample

1st row강릉
2nd row검암
3rd row경산
4th row계룡
5th row곡성
ValueCountFrequency (%)
강릉 1
 
2.0%
순천 1
 
2.0%
양평 1
 
2.0%
여수엑스포 1
 
2.0%
여천 1
 
2.0%
영등포 1
 
2.0%
오송 1
 
2.0%
용산 1
 
2.0%
울산 1
 
2.0%
익산 1
 
2.0%
Other values (41) 41
80.4%
2023-12-12T13:50:58.956712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
6.4%
6
 
4.8%
6
 
4.8%
5
 
4.0%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (60) 79
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 123
98.4%
Uppercase Letter 1
 
0.8%
Decimal Number 1
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Uppercase Letter
ValueCountFrequency (%)
T 1
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 123
98.4%
Latin 1
 
0.8%
Common 1
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Latin
ValueCountFrequency (%)
T 1
100.0%
Common
ValueCountFrequency (%)
2 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 123
98.4%
ASCII 2
 
1.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
ASCII
ValueCountFrequency (%)
T 1
50.0%
2 1
50.0%

승차
Real number (ℝ)

MISSING 

Distinct49
Distinct (%)100.0%
Missing2
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean688765.18
Minimum76
Maximum5827046
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T13:50:59.095770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum76
5-th percentile985.4
Q154376
median216129
Q3706879
95-th percentile2727537
Maximum5827046
Range5826970
Interquartile range (IQR)652503

Descriptive statistics

Standard deviation1165156.2
Coefficient of variation (CV)1.6916595
Kurtosis9.2130342
Mean688765.18
Median Absolute Deviation (MAD)188812
Skewness2.8883152
Sum33749494
Variance1.3575889 × 1012
MonotonicityNot monotonic
2023-12-12T13:50:59.257296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
752729 1
 
2.0%
541243 1
 
2.0%
90776 1
 
2.0%
506540 1
 
2.0%
219640 1
 
2.0%
249 1
 
2.0%
1907081 1
 
2.0%
14101 1
 
2.0%
1957545 1
 
2.0%
989507 1
 
2.0%
Other values (39) 39
76.5%
(Missing) 2
 
3.9%
ValueCountFrequency (%)
76 1
2.0%
249 1
2.0%
431 1
2.0%
1817 1
2.0%
6852 1
2.0%
14101 1
2.0%
32114 1
2.0%
33366 1
2.0%
34533 1
2.0%
40624 1
2.0%
ValueCountFrequency (%)
5827046 1
2.0%
4527555 1
2.0%
3215735 1
2.0%
1995240 1
2.0%
1957545 1
2.0%
1907081 1
2.0%
1771169 1
2.0%
1634532 1
2.0%
1115682 1
2.0%
989507 1
2.0%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct45
Distinct (%)100.0%
Missing6
Missing (%)11.8%
Infinite0
Infinite (%)0.0%
Mean749988.76
Minimum697
Maximum14147344
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T13:50:59.453116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum697
5-th percentile1528.6
Q111046
median37335
Q3245550
95-th percentile3991137.2
Maximum14147344
Range14146647
Interquartile range (IQR)234504

Descriptive statistics

Standard deviation2297930.2
Coefficient of variation (CV)3.0639529
Kurtosis27.435851
Mean749988.76
Median Absolute Deviation (MAD)35910
Skewness4.9682504
Sum33749494
Variance5.2804833 × 1012
MonotonicityNot monotonic
2023-12-12T13:50:59.691446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
123266 1
 
2.0%
32011 1
 
2.0%
697 1
 
2.0%
149077 1
 
2.0%
943731 1
 
2.0%
5134368 1
 
2.0%
199254 1
 
2.0%
373365 1
 
2.0%
74752 1
 
2.0%
13990 1
 
2.0%
Other values (35) 35
68.6%
(Missing) 6
 
11.8%
ValueCountFrequency (%)
697 1
2.0%
1294 1
2.0%
1425 1
2.0%
1943 1
2.0%
3640 1
2.0%
4086 1
2.0%
5448 1
2.0%
5500 1
2.0%
8058 1
2.0%
10015 1
2.0%
ValueCountFrequency (%)
14147344 1
2.0%
5134368 1
2.0%
4574649 1
2.0%
1657090 1
2.0%
1582696 1
2.0%
1475801 1
2.0%
943731 1
2.0%
818527 1
2.0%
721378 1
2.0%
578420 1
2.0%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct45
Distinct (%)100.0%
Missing6
Missing (%)11.8%
Infinite0
Infinite (%)0.0%
Mean1.8035968 × 108
Minimum234486
Maximum3.7494927 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2023-12-12T13:50:59.849747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum234486
5-th percentile460416.4
Q12205368
median6308384
Q353973277
95-th percentile9.9228424 × 108
Maximum3.7494927 × 109
Range3.7492582 × 109
Interquartile range (IQR)51767909

Descriptive statistics

Standard deviation6.0283636 × 108
Coefficient of variation (CV)3.3424121
Kurtosis29.292309
Mean1.8035968 × 108
Median Absolute Deviation (MAD)5922364
Skewness5.1734486
Sum8.1161854 × 109
Variance3.6341167 × 1017
MonotonicityNot monotonic
2023-12-12T13:51:00.000946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
24926453 1
 
2.0%
3062668 1
 
2.0%
234486 1
 
2.0%
43540653 1
 
2.0%
139283355 1
 
2.0%
1305008712 1
 
2.0%
53973277 1
 
2.0%
63970243 1
 
2.0%
16873772 1
 
2.0%
3765912 1
 
2.0%
Other values (35) 35
68.6%
(Missing) 6
 
11.8%
ValueCountFrequency (%)
234486 1
2.0%
386020 1
2.0%
442688 1
2.0%
531330 1
2.0%
969741 1
2.0%
1087206 1
2.0%
1088340 1
2.0%
1298570 1
2.0%
1321289 1
2.0%
1674647 1
2.0%
ValueCountFrequency (%)
3749492675 1
2.0%
1305008712 1
2.0%
1156636746 1
2.0%
334874235 1
2.0%
265572789 1
2.0%
210497000 1
2.0%
204827799 1
2.0%
163860144 1
2.0%
155406325 1
2.0%
139283355 1
2.0%

Interactions

2023-12-12T13:50:57.551871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:56.541236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:56.881795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:57.647712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:56.662112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:57.275463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:57.795063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:56.767216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:50:57.405831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:51:00.120113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명승차하차인키로
역명1.0001.0001.0001.000
승차1.0001.0000.4840.000
하차1.0000.4841.0001.000
인키로1.0000.0001.0001.000
2023-12-12T13:51:00.227067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.1820.199
하차0.1821.0000.981
인키로0.1990.9811.000

Missing values

2023-12-12T13:50:57.993447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:50:58.128477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:50:58.220136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0강릉1634532<NA><NA>
1검암76200525791027
2경산4062454481298570
3계룡115373105132001790
4곡성3211455001088340
5공주54376302854405050
6광명16909245746491156636746
7광주송정199524011421230356807
8구례구333661943531330
9구포47684236401087206
역명승차하차인키로
41진영1092941294386020
42진주157353<NA><NA>
43창원1829191425442688
44창원중앙750092117733269376
45천안아산17711691475801210497000
46청량리6852818527163860144
47평창92503246713062297
48포항1115682<NA><NA>
49행신<NA>721378204827799
50횡성66002192402205368