Overview

Dataset statistics

Number of variables4
Number of observations140
Missing cells70
Missing cells (%)12.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.9 KiB
Average record size in memory35.9 B

Variable types

Text1
Numeric3

Dataset

Description역별 새마을 상행 여객 승하차 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068472/fileData.do

Alerts

승차 is highly overall correlated with 하차 and 1 other fieldsHigh correlation
하차 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
인키로 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
승차 has 22 (15.7%) missing valuesMissing
하차 has 24 (17.1%) missing valuesMissing
인키로 has 24 (17.1%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 07:14:23.357169
Analysis finished2023-12-12 07:14:24.857771
Duration1.5 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct140
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-12T16:14:25.187835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2785714
Min length2

Characters and Unicode

Total characters319
Distinct characters134
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique140 ?
Unique (%)100.0%

Sample

1st row가평
2nd row간석
3rd row경산
4th row경주
5th row곡성
ValueCountFrequency (%)
가평 1
 
0.7%
오근장 1
 
0.7%
전곡 1
 
0.7%
장항 1
 
0.7%
장성 1
 
0.7%
임진강 1
 
0.7%
익산 1
 
0.7%
의정부 1
 
0.7%
의성 1
 
0.7%
예천 1
 
0.7%
Other values (130) 130
92.9%
2023-12-12T16:14:25.773851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 319
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 319
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 319
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

승차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct114
Distinct (%)96.6%
Missing22
Missing (%)15.7%
Infinite0
Infinite (%)0.0%
Mean10991.669
Minimum2
Maximum142570
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T16:14:25.952859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile28.35
Q1330.5
median861.5
Q33240.5
95-th percentile66567.45
Maximum142570
Range142568
Interquartile range (IQR)2910

Descriptive statistics

Standard deviation27238.301
Coefficient of variation (CV)2.4780859
Kurtosis11.228797
Mean10991.669
Median Absolute Deviation (MAD)794.5
Skewness3.3326379
Sum1297017
Variance7.4192505 × 108
MonotonicityNot monotonic
2023-12-12T16:14:26.112748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9 3
 
2.1%
39 2
 
1.4%
335 2
 
1.4%
60096 1
 
0.7%
56 1
 
0.7%
192 1
 
0.7%
18557 1
 
0.7%
232 1
 
0.7%
757 1
 
0.7%
71330 1
 
0.7%
Other values (104) 104
74.3%
(Missing) 22
 
15.7%
ValueCountFrequency (%)
2 1
 
0.7%
8 1
 
0.7%
9 3
2.1%
19 1
 
0.7%
30 1
 
0.7%
36 1
 
0.7%
39 2
1.4%
47 1
 
0.7%
50 1
 
0.7%
52 1
 
0.7%
ValueCountFrequency (%)
142570 1
0.7%
138928 1
0.7%
118062 1
0.7%
110782 1
0.7%
88268 1
0.7%
71330 1
0.7%
65727 1
0.7%
63091 1
0.7%
60096 1
0.7%
53179 1
0.7%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct111
Distinct (%)95.7%
Missing24
Missing (%)17.1%
Infinite0
Infinite (%)0.0%
Mean11181.181
Minimum2
Maximum296107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T16:14:26.632666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile12.75
Q1146
median539.5
Q32938.75
95-th percentile39084.25
Maximum296107
Range296105
Interquartile range (IQR)2792.75

Descriptive statistics

Standard deviation42687.721
Coefficient of variation (CV)3.8178186
Kurtosis30.790851
Mean11181.181
Median Absolute Deviation (MAD)507.5
Skewness5.4717771
Sum1297017
Variance1.8222415 × 109
MonotonicityNot monotonic
2023-12-12T16:14:26.837859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
269 2
 
1.4%
11 2
 
1.4%
2 2
 
1.4%
150 2
 
1.4%
297 2
 
1.4%
304 1
 
0.7%
34 1
 
0.7%
362 1
 
0.7%
8934 1
 
0.7%
772 1
 
0.7%
Other values (101) 101
72.1%
(Missing) 24
 
17.1%
ValueCountFrequency (%)
2 2
1.4%
10 1
0.7%
11 2
1.4%
12 1
0.7%
13 1
0.7%
19 1
0.7%
20 1
0.7%
24 1
0.7%
26 1
0.7%
30 1
0.7%
ValueCountFrequency (%)
296107 1
0.7%
258496 1
0.7%
218871 1
0.7%
104412 1
0.7%
58699 1
0.7%
46183 1
0.7%
36718 1
0.7%
34824 1
0.7%
25771 1
0.7%
22894 1
0.7%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct116
Distinct (%)100.0%
Missing24
Missing (%)17.1%
Infinite0
Infinite (%)0.0%
Mean1480503.7
Minimum402
Maximum33241899
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T16:14:27.034333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum402
5-th percentile2475.25
Q131042.75
median127892
Q3538203.75
95-th percentile5497905
Maximum33241899
Range33241497
Interquartile range (IQR)507161

Descriptive statistics

Standard deviation5068720.5
Coefficient of variation (CV)3.4236459
Kurtosis25.202908
Mean1480503.7
Median Absolute Deviation (MAD)121553
Skewness4.9607025
Sum1.7173843 × 108
Variance2.5691927 × 1013
MonotonicityNot monotonic
2023-12-12T16:14:27.199267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
590191 1
 
0.7%
1810772 1
 
0.7%
682564 1
 
0.7%
7737 1
 
0.7%
9544 1
 
0.7%
697211 1
 
0.7%
204737 1
 
0.7%
17287 1
 
0.7%
1150153 1
 
0.7%
280487 1
 
0.7%
Other values (106) 106
75.7%
(Missing) 24
 
17.1%
ValueCountFrequency (%)
402 1
0.7%
666 1
0.7%
838 1
0.7%
943 1
0.7%
1980 1
0.7%
2146 1
0.7%
2585 1
0.7%
3151 1
0.7%
3699 1
0.7%
4349 1
0.7%
ValueCountFrequency (%)
33241899 1
0.7%
29910355 1
0.7%
23596447 1
0.7%
20095664 1
0.7%
9280010 1
0.7%
9132696 1
0.7%
4286308 1
0.7%
2937441 1
0.7%
2798996 1
0.7%
2686387 1
0.7%

Interactions

2023-12-12T16:14:24.206827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:23.505030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:23.852902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:24.323496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:23.600538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:23.987652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:24.446370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:23.725777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:14:24.087660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:14:27.295357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.9630.770
하차0.9631.0000.937
인키로0.7700.9371.000
2023-12-12T16:14:27.385963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.5880.560
하차0.5881.0000.945
인키로0.5600.9451.000

Missing values

2023-12-12T16:14:24.600808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:14:24.700519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T16:14:24.793322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0가평1601<NA><NA>
1간석<NA>38297907
2경산2<NA><NA>
3경주325102585
4곡성32741604188226
5광양118418267036
6광주송정146615535927
7광천404204425449064
8구례구1692427107259
9구미794607121154
역명승차하차인키로
130포항148115036242
131풍기189713730979
132하동220129738531
133하양30<NA><NA>
134함안329<NA><NA>
135함평235111980
136호계<NA>3111870901
137홍성142570228942017034
138화명<NA>4912708
139화본<NA><NA><NA>