Overview

Dataset statistics

Number of variables5
Number of observations51
Missing cells9
Missing cells (%)3.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.3 KiB
Average record size in memory46.6 B

Variable types

Text1
Numeric3
Categorical1

Dataset

Description역별 여객 승하차 실적(KTX 하행)
Author충청남도
URLhttps://alldam.chungnam.go.kr/bigdata/collect/view.chungnam?menuCd=DOM_000000201001001000&apiIdx=2621

Alerts

기준년도 has constant value ""Constant
인킬로미터 is highly overall correlated with 하차High correlation
하차 is highly overall correlated with 인킬로미터High correlation
인킬로미터 has 2 (3.9%) missing valuesMissing
승차 has 5 (9.8%) missing valuesMissing
하차 has 2 (3.9%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2024-01-09 23:19:44.019412
Analysis finished2024-01-09 23:19:45.102489
Duration1.08 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct51
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size540.0 B
2024-01-10T08:19:45.252153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length2
Mean length2.4509804
Min length2

Characters and Unicode

Total characters125
Distinct characters70
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)100.0%

Sample

1st row강릉
2nd row검암
3rd row경산
4th row계룡
5th row곡성
ValueCountFrequency (%)
강릉 1
 
2.0%
순천 1
 
2.0%
양평 1
 
2.0%
여수엑스포 1
 
2.0%
여천 1
 
2.0%
영등포 1
 
2.0%
오송 1
 
2.0%
용산 1
 
2.0%
울산 1
 
2.0%
익산 1
 
2.0%
Other values (41) 41
80.4%
2024-01-10T08:19:45.585331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
6.4%
6
 
4.8%
6
 
4.8%
5
 
4.0%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (60) 79
63.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 123
98.4%
Uppercase Letter 1
 
0.8%
Decimal Number 1
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Uppercase Letter
ValueCountFrequency (%)
T 1
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 123
98.4%
Latin 1
 
0.8%
Common 1
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
Latin
ValueCountFrequency (%)
T 1
100.0%
Common
ValueCountFrequency (%)
2 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 123
98.4%
ASCII 2
 
1.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
6.5%
6
 
4.9%
6
 
4.9%
5
 
4.1%
5
 
4.1%
4
 
3.3%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
Other values (58) 77
62.6%
ASCII
ValueCountFrequency (%)
T 1
50.0%
2 1
50.0%

인킬로미터
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct49
Distinct (%)100.0%
Missing2
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean1.6355107 × 108
Minimum18483
Maximum2.0313909 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2024-01-10T08:19:45.707309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum18483
5-th percentile126299.6
Q18981125
median49928653
Q31.6737027 × 108
95-th percentile5.2963179 × 108
Maximum2.0313909 × 109
Range2.0313724 × 109
Interquartile range (IQR)1.5838914 × 108

Descriptive statistics

Standard deviation3.2833977 × 108
Coefficient of variation (CV)2.0075672
Kurtosis22.421849
Mean1.6355107 × 108
Median Absolute Deviation (MAD)49254337
Skewness4.3297967
Sum8.0140024 × 109
Variance1.07807 × 1017
MonotonicityNot monotonic
2024-01-10T08:19:46.081297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
148773512 1
 
2.0%
139645375 1
 
2.0%
6731440 1
 
2.0%
167370269 1
 
2.0%
71421641 1
 
2.0%
75646 1
 
2.0%
265643316 1
 
2.0%
674316 1
 
2.0%
522852816 1
 
2.0%
164423391 1
 
2.0%
Other values (39) 39
76.5%
(Missing) 2
 
3.9%
ValueCountFrequency (%)
18483 1
2.0%
28216 1
2.0%
75646 1
2.0%
202280 1
2.0%
534305 1
2.0%
674316 1
2.0%
5646955 1
2.0%
6731440 1
2.0%
7324595 1
2.0%
7364105 1
2.0%
ValueCountFrequency (%)
2031390876 1
2.0%
945454740 1
2.0%
534151106 1
2.0%
522852816 1
2.0%
520210887 1
2.0%
330951936 1
2.0%
289659833 1
2.0%
265643316 1
2.0%
256916706 1
2.0%
211995418 1
2.0%

승차
Real number (ℝ)

MISSING 

Distinct46
Distinct (%)100.0%
Missing5
Missing (%)9.8%
Infinite0
Infinite (%)0.0%
Mean723732.61
Minimum1
Maximum13780902
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2024-01-10T08:19:46.197046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile947.75
Q112429.25
median35051
Q3233942.25
95-th percentile3813676
Maximum13780902
Range13780901
Interquartile range (IQR)221513

Descriptive statistics

Standard deviation2216096.6
Coefficient of variation (CV)3.0620378
Kurtosis27.945271
Mean723732.61
Median Absolute Deviation (MAD)33306
Skewness5.0065796
Sum33291700
Variance4.911084 × 1012
MonotonicityNot monotonic
2024-01-10T08:19:46.318071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
130290 1
 
2.0%
32307 1
 
2.0%
476 1
 
2.0%
147009 1
 
2.0%
932884 1
 
2.0%
4908924 1
 
2.0%
178781 1
 
2.0%
382861 1
 
2.0%
59722 1
 
2.0%
14696 1
 
2.0%
Other values (36) 36
70.6%
(Missing) 5
 
9.8%
ValueCountFrequency (%)
1 1
2.0%
476 1
2.0%
830 1
2.0%
1301 1
2.0%
2189 1
2.0%
2572 1
2.0%
4015 1
2.0%
5371 1
2.0%
8335 1
2.0%
8570 1
2.0%
ValueCountFrequency (%)
13780902 1
2.0%
4908924 1
2.0%
4527436 1
2.0%
1672396 1
2.0%
1555762 1
2.0%
1527006 1
2.0%
948233 1
2.0%
932884 1
2.0%
770568 1
2.0%
548810 1
2.0%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct49
Distinct (%)100.0%
Missing2
Missing (%)3.9%
Infinite0
Infinite (%)0.0%
Mean679422.45
Minimum64
Maximum5742556
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size591.0 B
2024-01-10T08:19:46.435881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum64
5-th percentile610.6
Q152820
median206069
Q3711150
95-th percentile2751429.4
Maximum5742556
Range5742492
Interquartile range (IQR)658330

Descriptive statistics

Standard deviation1152269.9
Coefficient of variation (CV)1.695955
Kurtosis9.1208762
Mean679422.45
Median Absolute Deviation (MAD)180712
Skewness2.8804005
Sum33291700
Variance1.3277259 × 1012
MonotonicityNot monotonic
2024-01-10T08:19:46.553770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
735713 1
 
2.0%
536578 1
 
2.0%
70357 1
 
2.0%
527737 1
 
2.0%
212298 1
 
2.0%
259 1
 
2.0%
1799898 1
 
2.0%
2653 1
 
2.0%
1930224 1
 
2.0%
959664 1
 
2.0%
Other values (39) 39
76.5%
(Missing) 2
 
3.9%
ValueCountFrequency (%)
64 1
2.0%
125 1
2.0%
259 1
2.0%
1138 1
2.0%
2653 1
2.0%
2669 1
2.0%
33887 1
2.0%
34021 1
2.0%
36883 1
2.0%
37215 1
2.0%
ValueCountFrequency (%)
5742556 1
2.0%
4468446 1
2.0%
3245951 1
2.0%
2009647 1
2.0%
1930224 1
2.0%
1801251 1
2.0%
1799898 1
2.0%
1561776 1
2.0%
1085362 1
2.0%
959664 1
2.0%

기준년도
Categorical

CONSTANT 

Distinct1
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size540.0 B
2018
51 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2018
2nd row2018
3rd row2018
4th row2018
5th row2018

Common Values

ValueCountFrequency (%)
2018 51
100.0%

Length

2024-01-10T08:19:46.663887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T08:19:46.737417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2018 51
100.0%

Interactions

2024-01-10T08:19:44.656073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.158607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.411575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.735471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.253093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.497763image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.809859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.333062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-10T08:19:44.581930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T08:19:46.786278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명인킬로미터승차하차
역명1.0001.0001.0001.000
인킬로미터1.0001.0000.3640.915
승차1.0000.3641.0000.558
하차1.0000.9150.5581.000
2024-01-10T08:19:46.862077image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
인킬로미터승차하차
인킬로미터1.0000.0530.979
승차0.0531.0000.085
하차0.9790.0851.000

Missing values

2024-01-10T08:19:44.904169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T08:19:44.980729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-01-10T08:19:45.057842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명인킬로미터승차하차기준년도
0강릉330951936<NA>15617762018
1검암1848321662642018
2경산87913318335368832018
3계룡1879736411342987202018
4곡성73641055371372152018
5공주768283731053528202018
6광명4147752145274361640492018
7광주송정53415110611164520096472018
8구례구93033392189340212018
9구포14517669725724860562018
역명인킬로미터승차하차기준년도
41진영29759961830997602018
42진주48877496<NA>1644212018
43창원5573023013011793942018
44창원중앙211995418131857633942018
45천안아산256916706155576218012512018
46청량리53430594823326692018
47평창12532567239901009672018
48포항289659833<NA>10853622018
49행신<NA>770568<NA>2018
50횡성732459520533639012018