Overview

Dataset statistics

Number of variables4
Number of observations59
Missing cells13
Missing cells (%)5.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.1 KiB
Average record size in memory37.2 B

Variable types

Text1
Numeric3

Dataset

Description역별 ITX 새마을 하행 여객 승하차 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068477/fileData.do

Alerts

승차 is highly overall correlated with 하차 and 1 other fieldsHigh correlation
하차 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
인키로 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
승차 has 7 (11.9%) missing valuesMissing
하차 has 3 (5.1%) missing valuesMissing
인키로 has 3 (5.1%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 16:42:57.839774
Analysis finished2023-12-12 16:42:59.190046
Duration1.35 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct59
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size604.0 B
2023-12-13T01:42:59.378548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2711864
Min length2

Characters and Unicode

Total characters134
Distinct characters72
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)100.0%

Sample

1st row강경
2nd row경산
3rd row계룡
4th row곡성
5th row광주
ValueCountFrequency (%)
강경 1
 
1.7%
양평 1
 
1.7%
여천 1
 
1.7%
영동 1
 
1.7%
영등포 1
 
1.7%
영주 1
 
1.7%
왜관 1
 
1.7%
용문 1
 
1.7%
용산 1
 
1.7%
원동 1
 
1.7%
Other values (49) 49
83.1%
2023-12-13T01:42:59.779437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7
 
5.2%
7
 
5.2%
6
 
4.5%
6
 
4.5%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 83
61.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 134
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7
 
5.2%
7
 
5.2%
6
 
4.5%
6
 
4.5%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 83
61.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 134
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7
 
5.2%
7
 
5.2%
6
 
4.5%
6
 
4.5%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 83
61.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 134
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7
 
5.2%
7
 
5.2%
6
 
4.5%
6
 
4.5%
5
 
3.7%
5
 
3.7%
4
 
3.0%
4
 
3.0%
4
 
3.0%
3
 
2.2%
Other values (62) 83
61.9%

승차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct52
Distinct (%)100.0%
Missing7
Missing (%)11.9%
Infinite0
Infinite (%)0.0%
Mean75749
Minimum52
Maximum652819
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size663.0 B
2023-12-13T01:42:59.931445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum52
5-th percentile175
Q11659.25
median7837.5
Q372483.75
95-th percentile425197.35
Maximum652819
Range652767
Interquartile range (IQR)70824.5

Descriptive statistics

Standard deviation149920.56
Coefficient of variation (CV)1.9791755
Kurtosis6.9364556
Mean75749
Median Absolute Deviation (MAD)7432
Skewness2.6703941
Sum3938948
Variance2.2476175 × 1010
MonotonicityNot monotonic
2023-12-13T01:43:00.075826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
131 1
 
1.7%
614924 1
 
1.7%
15924 1
 
1.7%
1726 1
 
1.7%
282977 1
 
1.7%
1070 1
 
1.7%
6949 1
 
1.7%
97240 1
 
1.7%
52 1
 
1.7%
3371 1
 
1.7%
Other values (42) 42
71.2%
(Missing) 7
 
11.9%
ValueCountFrequency (%)
52 1
1.7%
124 1
1.7%
131 1
1.7%
211 1
1.7%
261 1
1.7%
309 1
1.7%
502 1
1.7%
965 1
1.7%
1022 1
1.7%
1070 1
1.7%
ValueCountFrequency (%)
652819 1
1.7%
614924 1
1.7%
465338 1
1.7%
392355 1
1.7%
282977 1
1.7%
202350 1
1.7%
189137 1
1.7%
172449 1
1.7%
164732 1
1.7%
113944 1
1.7%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct56
Distinct (%)100.0%
Missing3
Missing (%)5.1%
Infinite0
Infinite (%)0.0%
Mean70338.357
Minimum35
Maximum371786
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size663.0 B
2023-12-13T01:43:00.231468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum35
5-th percentile3125.5
Q19973.75
median29067.5
Q375821.5
95-th percentile277075.25
Maximum371786
Range371751
Interquartile range (IQR)65847.75

Descriptive statistics

Standard deviation95944.951
Coefficient of variation (CV)1.3640488
Kurtosis2.3873251
Mean70338.357
Median Absolute Deviation (MAD)20599
Skewness1.81144
Sum3938948
Variance9.2054336 × 109
MonotonicityNot monotonic
2023-12-13T01:43:00.400271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6609 1
 
1.7%
11985 1
 
1.7%
60216 1
 
1.7%
1187 1
 
1.7%
14307 1
 
1.7%
9070 1
 
1.7%
3449 1
 
1.7%
35 1
 
1.7%
24199 1
 
1.7%
103937 1
 
1.7%
Other values (46) 46
78.0%
(Missing) 3
 
5.1%
ValueCountFrequency (%)
35 1
1.7%
1187 1
1.7%
2155 1
1.7%
3449 1
1.7%
5165 1
1.7%
5925 1
1.7%
6117 1
1.7%
6609 1
1.7%
8189 1
1.7%
8748 1
1.7%
ValueCountFrequency (%)
371786 1
1.7%
359589 1
1.7%
307355 1
1.7%
266982 1
1.7%
265317 1
1.7%
218381 1
1.7%
209910 1
1.7%
202642 1
1.7%
196250 1
1.7%
159109 1
1.7%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct56
Distinct (%)100.0%
Missing3
Missing (%)5.1%
Infinite0
Infinite (%)0.0%
Mean9306861.4
Minimum25062
Maximum54443135
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size663.0 B
2023-12-13T01:43:00.547722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum25062
5-th percentile253027
Q11241776.8
median4854909
Q310036954
95-th percentile36116554
Maximum54443135
Range54418073
Interquartile range (IQR)8795177.2

Descriptive statistics

Standard deviation12699324
Coefficient of variation (CV)1.3645119
Kurtosis4.5485541
Mean9306861.4
Median Absolute Deviation (MAD)3704099.5
Skewness2.1971654
Sum5.2118424 × 108
Variance1.6127282 × 1014
MonotonicityNot monotonic
2023-12-13T01:43:00.689344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
157032 1
 
1.7%
3955950 1
 
1.7%
7170204 1
 
1.7%
164503 1
 
1.7%
1687717 1
 
1.7%
806447 1
 
1.7%
282535 1
 
1.7%
25062 1
 
1.7%
2217838 1
 
1.7%
10506793 1
 
1.7%
Other values (46) 46
78.0%
(Missing) 3
 
5.1%
ValueCountFrequency (%)
25062 1
1.7%
157032 1
1.7%
164503 1
1.7%
282535 1
1.7%
530876 1
1.7%
685418 1
1.7%
806447 1
1.7%
834550 1
1.7%
918991 1
1.7%
951555 1
1.7%
ValueCountFrequency (%)
54443135 1
1.7%
50741736 1
1.7%
46888154 1
1.7%
32526020 1
1.7%
30612428 1
1.7%
26020783 1
1.7%
24169617 1
1.7%
23750141 1
1.7%
22659402 1
1.7%
15224689 1
1.7%

Interactions

2023-12-13T01:42:58.587051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:57.989249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.310465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.690747image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.101310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.400990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.804733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.216496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:42:58.501552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:43:00.785797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명승차하차인키로
역명1.0001.0001.0001.000
승차1.0001.0000.7790.706
하차1.0000.7791.0000.856
인키로1.0000.7060.8561.000
2023-12-13T01:43:00.886964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.7720.707
하차0.7721.0000.959
인키로0.7070.9591.000

Missing values

2023-12-13T01:42:58.925851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:42:59.023265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T01:42:59.126544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0강경2527183351569804
1경산24001301384104249
2계룡8964366835476976
3곡성200892491202920
4광주<NA>7528313942901
5광주송정8726319845468174
6구례구102251651254729
7구미39235520991022659402
8구포516020264232526020
9김제14700493596462663
역명승차하차인키로
49진주<NA>161644629693
50창원1200149583139901
51창원중앙3981325436297130
52천안18913726698226020783
53청도594011909959543
54청량리66664<NA><NA>
55평택8994315910915224689
56풍기2119108965992
57함안3096117834550
58함평26181891640924