Overview

Dataset statistics

Number of variables6
Number of observations5330
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory270.8 KiB
Average record size in memory52.0 B

Variable types

Text2
Numeric4

Dataset

Description한국철도공사여객열차정보에대해2010에서2020년까지역사별승하차KTX상행및하행열차이용객의통계정보입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15108119/fileData.do

Alerts

하행_승차인원수 is highly overall correlated with 상행_하차인원수High correlation
하행_하차인원수 is highly overall correlated with 상행_승차인원수High correlation
상행_승차인원수 is highly overall correlated with 하행_하차인원수High correlation
상행_하차인원수 is highly overall correlated with 하행_승차인원수High correlation
하행_승차인원수 has 644 (12.1%) zerosZeros
하행_하차인원수 has 179 (3.4%) zerosZeros
상행_승차인원수 has 183 (3.4%) zerosZeros
상행_하차인원수 has 630 (11.8%) zerosZeros

Reproduction

Analysis started2023-12-12 16:11:06.538761
Analysis finished2023-12-12 16:11:09.133829
Duration2.6 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct132
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size41.8 KiB
2023-12-13T01:11:09.268112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters47970
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2010년 01월
2nd row2010년 01월
3rd row2010년 01월
4th row2010년 01월
5th row2010년 01월
ValueCountFrequency (%)
2020년 630
 
5.9%
2019년 584
 
5.5%
2018년 576
 
5.4%
2017년 477
 
4.5%
2013년 471
 
4.4%
2015년 471
 
4.4%
2016년 469
 
4.4%
2014년 468
 
4.4%
12월 463
 
4.3%
2012년 455
 
4.3%
Other values (13) 5596
52.5%
2023-12-13T01:11:09.677749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 10703
22.3%
1 7364
15.4%
2 7316
15.3%
5330
11.1%
5330
11.1%
5330
11.1%
9 1026
 
2.1%
8 1014
 
2.1%
7 914
 
1.9%
6 914
 
1.9%
Other values (3) 2729
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 31980
66.7%
Other Letter 10660
 
22.2%
Space Separator 5330
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 10703
33.5%
1 7364
23.0%
2 7316
22.9%
9 1026
 
3.2%
8 1014
 
3.2%
7 914
 
2.9%
6 914
 
2.9%
3 911
 
2.8%
5 910
 
2.8%
4 908
 
2.8%
Other Letter
ValueCountFrequency (%)
5330
50.0%
5330
50.0%
Space Separator
ValueCountFrequency (%)
5330
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 37310
77.8%
Hangul 10660
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 10703
28.7%
1 7364
19.7%
2 7316
19.6%
5330
14.3%
9 1026
 
2.7%
8 1014
 
2.7%
7 914
 
2.4%
6 914
 
2.4%
3 911
 
2.4%
5 910
 
2.4%
Hangul
ValueCountFrequency (%)
5330
50.0%
5330
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 37310
77.8%
Hangul 10660
 
22.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 10703
28.7%
1 7364
19.7%
2 7316
19.6%
5330
14.3%
9 1026
 
2.7%
8 1014
 
2.7%
7 914
 
2.4%
6 914
 
2.4%
3 911
 
2.4%
5 910
 
2.4%
Hangul
ValueCountFrequency (%)
5330
50.0%
5330
50.0%
Distinct55
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size41.8 KiB
2023-12-13T01:11:09.944324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.3803002
Min length2

Characters and Unicode

Total characters12687
Distinct characters71
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울
2nd row용산
3rd row영등포
4th row수원
5th row대전
ValueCountFrequency (%)
서울 132
 
2.5%
광명 132
 
2.5%
행신 132
 
2.5%
익산 132
 
2.5%
용산 132
 
2.5%
나주 132
 
2.5%
창원 132
 
2.5%
논산 132
 
2.5%
정읍 132
 
2.5%
계룡 132
 
2.5%
Other values (45) 4010
75.2%
2023-12-13T01:11:10.322999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1030
 
8.1%
678
 
5.3%
610
 
4.8%
574
 
4.5%
500
 
3.9%
492
 
3.9%
396
 
3.1%
384
 
3.0%
290
 
2.3%
274
 
2.2%
Other values (61) 7459
58.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 12687
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1030
 
8.1%
678
 
5.3%
610
 
4.8%
574
 
4.5%
500
 
3.9%
492
 
3.9%
396
 
3.1%
384
 
3.0%
290
 
2.3%
274
 
2.2%
Other values (61) 7459
58.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 12687
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1030
 
8.1%
678
 
5.3%
610
 
4.8%
574
 
4.5%
500
 
3.9%
492
 
3.9%
396
 
3.1%
384
 
3.0%
290
 
2.3%
274
 
2.2%
Other values (61) 7459
58.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12687
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1030
 
8.1%
678
 
5.3%
610
 
4.8%
574
 
4.5%
500
 
3.9%
492
 
3.9%
396
 
3.1%
384
 
3.0%
290
 
2.3%
274
 
2.2%
Other values (61) 7459
58.8%

하행_승차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct3546
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57711.536
Minimum0
Maximum1408589
Zeros644
Zeros (%)12.1%
Negative0
Negative (%)0.0%
Memory size47.0 KiB
2023-12-13T01:11:10.463439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1182
median1591.5
Q319791
95-th percentile288362.2
Maximum1408589
Range1408589
Interquartile range (IQR)19609

Descriptive statistics

Standard deviation180947.93
Coefficient of variation (CV)3.1353858
Kurtosis26.843396
Mean57711.536
Median Absolute Deviation (MAD)1591.5
Skewness5.0236399
Sum3.0760248 × 108
Variance3.2742154 × 1010
MonotonicityNot monotonic
2023-12-13T01:11:10.620121image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 644
 
12.1%
1 50
 
0.9%
2 18
 
0.3%
5 12
 
0.2%
9 11
 
0.2%
41 11
 
0.2%
37 10
 
0.2%
6 10
 
0.2%
38 9
 
0.2%
8 9
 
0.2%
Other values (3536) 4546
85.3%
ValueCountFrequency (%)
0 644
12.1%
1 50
 
0.9%
2 18
 
0.3%
3 7
 
0.1%
4 8
 
0.2%
5 12
 
0.2%
6 10
 
0.2%
7 5
 
0.1%
8 9
 
0.2%
9 11
 
0.2%
ValueCountFrequency (%)
1408589 1
< 0.1%
1354245 1
< 0.1%
1351184 1
< 0.1%
1345961 1
< 0.1%
1331063 1
< 0.1%
1311093 1
< 0.1%
1297578 1
< 0.1%
1291957 1
< 0.1%
1284399 1
< 0.1%
1277939 1
< 0.1%

하행_하차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct4740
Distinct (%)88.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57347.875
Minimum0
Maximum693168
Zeros179
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size47.0 KiB
2023-12-13T01:11:10.765064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10
Q13676.25
median15646
Q352105
95-th percentile284314.5
Maximum693168
Range693168
Interquartile range (IQR)48428.75

Descriptive statistics

Standard deviation108433.16
Coefficient of variation (CV)1.8907964
Kurtosis10.745773
Mean57347.875
Median Absolute Deviation (MAD)15046
Skewness3.1801588
Sum3.0566417 × 108
Variance1.1757749 × 1010
MonotonicityNot monotonic
2023-12-13T01:11:10.915557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 179
 
3.4%
1 31
 
0.6%
2 17
 
0.3%
11 11
 
0.2%
10 10
 
0.2%
8 10
 
0.2%
12 8
 
0.2%
16 7
 
0.1%
13 7
 
0.1%
7 6
 
0.1%
Other values (4730) 5044
94.6%
ValueCountFrequency (%)
0 179
3.4%
1 31
 
0.6%
2 17
 
0.3%
3 5
 
0.1%
4 3
 
0.1%
5 5
 
0.1%
6 3
 
0.1%
7 6
 
0.1%
8 10
 
0.2%
9 6
 
0.1%
ValueCountFrequency (%)
693168 1
< 0.1%
679711 1
< 0.1%
671027 1
< 0.1%
667878 1
< 0.1%
665728 1
< 0.1%
665158 1
< 0.1%
664584 1
< 0.1%
656957 1
< 0.1%
650780 1
< 0.1%
644080 1
< 0.1%

상행_승차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct4762
Distinct (%)89.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58006.363
Minimum0
Maximum710283
Zeros183
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size47.0 KiB
2023-12-13T01:11:11.067457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.45
Q13825.75
median16208.5
Q352103.25
95-th percentile285638.1
Maximum710283
Range710283
Interquartile range (IQR)48277.5

Descriptive statistics

Standard deviation109591.26
Coefficient of variation (CV)1.8892972
Kurtosis10.765298
Mean58006.363
Median Absolute Deviation (MAD)15435.5
Skewness3.1838495
Sum3.0917391 × 108
Variance1.2010244 × 1010
MonotonicityNot monotonic
2023-12-13T01:11:11.231453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 183
 
3.4%
1 36
 
0.7%
2 12
 
0.2%
9 10
 
0.2%
4 10
 
0.2%
10 9
 
0.2%
5 9
 
0.2%
11 9
 
0.2%
8 8
 
0.2%
13 8
 
0.2%
Other values (4752) 5036
94.5%
ValueCountFrequency (%)
0 183
3.4%
1 36
 
0.7%
2 12
 
0.2%
3 7
 
0.1%
4 10
 
0.2%
5 9
 
0.2%
6 5
 
0.1%
7 5
 
0.1%
8 8
 
0.2%
9 10
 
0.2%
ValueCountFrequency (%)
710283 1
< 0.1%
703494 1
< 0.1%
703384 1
< 0.1%
672877 1
< 0.1%
669181 1
< 0.1%
665730 1
< 0.1%
664317 1
< 0.1%
664195 1
< 0.1%
662751 1
< 0.1%
661586 1
< 0.1%

상행_하차인원수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct3530
Distinct (%)66.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58379.617
Minimum0
Maximum1411126
Zeros630
Zeros (%)11.8%
Negative0
Negative (%)0.0%
Memory size47.0 KiB
2023-12-13T01:11:11.426739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1163
median1537.5
Q319048.75
95-th percentile293824.5
Maximum1411126
Range1411126
Interquartile range (IQR)18885.75

Descriptive statistics

Standard deviation184341.26
Coefficient of variation (CV)3.1576306
Kurtosis26.768975
Mean58379.617
Median Absolute Deviation (MAD)1537.5
Skewness5.0207773
Sum3.1116336 × 108
Variance3.3981702 × 1010
MonotonicityNot monotonic
2023-12-13T01:11:11.844015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 630
 
11.8%
1 67
 
1.3%
2 30
 
0.6%
3 16
 
0.3%
4 13
 
0.2%
10 10
 
0.2%
6 10
 
0.2%
5 10
 
0.2%
99 9
 
0.2%
14 8
 
0.2%
Other values (3520) 4527
84.9%
ValueCountFrequency (%)
0 630
11.8%
1 67
 
1.3%
2 30
 
0.6%
3 16
 
0.3%
4 13
 
0.2%
5 10
 
0.2%
6 10
 
0.2%
7 7
 
0.1%
8 6
 
0.1%
9 7
 
0.1%
ValueCountFrequency (%)
1411126 1
< 0.1%
1379432 1
< 0.1%
1372094 1
< 0.1%
1365292 1
< 0.1%
1360470 1
< 0.1%
1349659 1
< 0.1%
1327643 1
< 0.1%
1322281 1
< 0.1%
1301637 1
< 0.1%
1293797 1
< 0.1%

Interactions

2023-12-13T01:11:08.393403image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:06.943900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.442383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.924228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.524471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.045160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.552394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.034999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.628032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.154155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.679827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.162381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.767748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.336574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:07.791466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:11:08.281573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:11:11.933055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
정차역하행_승차인원수하행_하차인원수상행_승차인원수상행_하차인원수
정차역1.0000.8440.9100.9120.850
하행_승차인원수0.8441.0000.4560.4490.962
하행_하차인원수0.9100.4561.0000.9960.335
상행_승차인원수0.9120.4490.9961.0000.337
상행_하차인원수0.8500.9620.3350.3371.000
2023-12-13T01:11:12.060661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
하행_승차인원수하행_하차인원수상행_승차인원수상행_하차인원수
하행_승차인원수1.0000.0610.0660.997
하행_하차인원수0.0611.0000.9980.064
상행_승차인원수0.0660.9981.0000.069
상행_하차인원수0.9970.0640.0691.000

Missing values

2023-12-13T01:11:08.945163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:11:09.070670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

운행년월정차역하행_승차인원수하행_하차인원수상행_승차인원수상행_하차인원수
02010년 01월서울837659103263893007
12010년 01월용산1837991986202844
22010년 01월영등포0001
32010년 01월수원1004
42010년 01월대전127187214317226219129055
52010년 01월동대구8512941607543683381403
62010년 01월경산0030
72010년 01월밀양573945833492644778
82010년 01월구포336125795126044265
92010년 01월부산14558494735461
운행년월정차역하행_승차인원수하행_하차인원수상행_승차인원수상행_하차인원수
53202020년 12월신경주236020738197842391
53212020년 12월울산810069541656118666
53222020년 12월상봉701774837972
53232020년 12월창원중앙6683100631489527
53242020년 12월공주1326259723591143
53252020년 12월포항051350510160
53262020년 12월횡성68139043803669
53272020년 12월둔내17021111929182
53282020년 12월평창74940193794726
53292020년 12월진부52951354941542