Overview

Dataset statistics

Number of variables4
Number of observations236
Missing cells39
Missing cells (%)4.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.2 KiB
Average record size in memory35.6 B

Variable types

Text1
Numeric3

Dataset

Description역별 무궁화 하행 여객 승하차 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068480/fileData.do

Alerts

승차 is highly overall correlated with 하차 and 1 other fieldsHigh correlation
하차 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
인키로 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
승차 has 9 (3.8%) missing valuesMissing
하차 has 14 (5.9%) missing valuesMissing
인키로 has 16 (6.8%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 06:02:02.115492
Analysis finished2023-12-12 06:02:03.572877
Duration1.46 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct236
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size2.0 KiB
2023-12-12T15:02:03.915558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2288136
Min length2

Characters and Unicode

Total characters526
Distinct characters181
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique236 ?
Unique (%)100.0%

Sample

1st row가평
2nd row각계
3rd row간석
4th row강경
5th row강구
ValueCountFrequency (%)
가평 1
 
0.4%
의정부 1
 
0.4%
전곡 1
 
0.4%
용궁 1
 
0.4%
용문 1
 
0.4%
용산 1
 
0.4%
웅천 1
 
0.4%
원동 1
 
0.4%
원주 1
 
0.4%
월내 1
 
0.4%
Other values (226) 226
95.8%
2023-12-12T15:02:04.730766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 526
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 526
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 526
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
4.9%
18
 
3.4%
17
 
3.2%
16
 
3.0%
12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
8
 
1.5%
8
 
1.5%
Other values (171) 387
73.6%

승차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct224
Distinct (%)98.7%
Missing9
Missing (%)3.8%
Infinite0
Infinite (%)0.0%
Mean125916.1
Minimum11
Maximum3538377
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T15:02:04.881635image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile373.8
Q11731.5
median12325
Q357348
95-th percentile642973.2
Maximum3538377
Range3538366
Interquartile range (IQR)55616.5

Descriptive statistics

Standard deviation396715.35
Coefficient of variation (CV)3.1506324
Kurtosis35.005259
Mean125916.1
Median Absolute Deviation (MAD)11730
Skewness5.4469172
Sum28582955
Variance1.5738307 × 1011
MonotonicityNot monotonic
2023-12-12T15:02:05.050059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
480 2
 
0.8%
8271 2
 
0.8%
464 2
 
0.8%
28870 1
 
0.4%
1560978 1
 
0.4%
8497 1
 
0.4%
45700 1
 
0.4%
136267 1
 
0.4%
16953 1
 
0.4%
4986 1
 
0.4%
Other values (214) 214
90.7%
(Missing) 9
 
3.8%
ValueCountFrequency (%)
11 1
0.4%
27 1
0.4%
54 1
0.4%
157 1
0.4%
188 1
0.4%
210 1
0.4%
216 1
0.4%
218 1
0.4%
242 1
0.4%
264 1
0.4%
ValueCountFrequency (%)
3538377 1
0.4%
2475664 1
0.4%
2401883 1
0.4%
1560978 1
0.4%
1488261 1
0.4%
1400512 1
0.4%
1266720 1
0.4%
1132277 1
0.4%
1112067 1
0.4%
703420 1
0.4%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct221
Distinct (%)99.5%
Missing14
Missing (%)5.9%
Infinite0
Infinite (%)0.0%
Mean128752.05
Minimum3
Maximum2105604
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T15:02:05.194339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile432
Q13786.75
median25844
Q389442.5
95-th percentile542673.95
Maximum2105604
Range2105601
Interquartile range (IQR)85655.75

Descriptive statistics

Standard deviation305464.43
Coefficient of variation (CV)2.3725015
Kurtosis18.657911
Mean128752.05
Median Absolute Deviation (MAD)24760
Skewness4.1437176
Sum28582955
Variance9.3308518 × 1010
MonotonicityNot monotonic
2023-12-12T15:02:05.344493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3988 2
 
0.8%
378 1
 
0.4%
190212 1
 
0.4%
906 1
 
0.4%
79409 1
 
0.4%
390 1
 
0.4%
42281 1
 
0.4%
37624 1
 
0.4%
543879 1
 
0.4%
14363 1
 
0.4%
Other values (211) 211
89.4%
(Missing) 14
 
5.9%
ValueCountFrequency (%)
3 1
0.4%
36 1
0.4%
125 1
0.4%
132 1
0.4%
188 1
0.4%
285 1
0.4%
325 1
0.4%
330 1
0.4%
350 1
0.4%
378 1
0.4%
ValueCountFrequency (%)
2105604 1
0.4%
1728405 1
0.4%
1667795 1
0.4%
1611316 1
0.4%
1532307 1
0.4%
1292506 1
0.4%
1219318 1
0.4%
1045925 1
0.4%
871090 1
0.4%
569717 1
0.4%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct220
Distinct (%)100.0%
Missing16
Missing (%)6.8%
Infinite0
Infinite (%)0.0%
Mean12289912
Minimum5203
Maximum2.3893947 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.2 KiB
2023-12-12T15:02:05.499629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5203
5-th percentile58285
Q1404207.75
median2379865.5
Q39044927.8
95-th percentile52554805
Maximum2.3893947 × 108
Range2.3893426 × 108
Interquartile range (IQR)8640720

Descriptive statistics

Standard deviation29840010
Coefficient of variation (CV)2.4280084
Kurtosis24.362273
Mean12289912
Median Absolute Deviation (MAD)2231148.5
Skewness4.5349621
Sum2.7037807 × 109
Variance8.9042617 × 1014
MonotonicityNot monotonic
2023-12-12T15:02:05.663597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
298060 1
 
0.4%
116753 1
 
0.4%
5093562 1
 
0.4%
53736 1
 
0.4%
5389230 1
 
0.4%
1241042 1
 
0.4%
52315673 1
 
0.4%
646184 1
 
0.4%
153015 1
 
0.4%
3680815 1
 
0.4%
Other values (210) 210
89.0%
(Missing) 16
 
6.8%
ValueCountFrequency (%)
5203 1
0.4%
11825 1
0.4%
12581 1
0.4%
18713 1
0.4%
24689 1
0.4%
28611 1
0.4%
30817 1
0.4%
39307 1
0.4%
51833 1
0.4%
53736 1
0.4%
ValueCountFrequency (%)
238939466 1
0.4%
190739472 1
0.4%
136332243 1
0.4%
133692590 1
0.4%
131565790 1
0.4%
125432319 1
0.4%
103493900 1
0.4%
88122053 1
0.4%
86626547 1
0.4%
58472706 1
0.4%

Interactions

2023-12-12T15:02:02.936654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.290846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.640151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:03.040091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.424004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.749784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:03.127623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.531405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:02:02.844535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:02:05.754835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.7610.677
하차0.7611.0000.941
인키로0.6770.9411.000
2023-12-12T15:02:05.847732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.7250.679
하차0.7251.0000.969
인키로0.6790.9691.000

Missing values

2023-12-12T15:02:03.263459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:02:03.383252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T15:02:03.493174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0가평<NA>3319589437
1각계<NA><NA><NA>
2간석919<NA><NA>
3강경184209346210002172
4강구43813037530516
5강릉16868389062978561
6개포6429311825
7건천14283475151638
8경산45167656971741841114
9경주21577322905014512392
역명승차하차인키로
226현동90013212581
227호계20183916269612182775
228홍성5638336190032621121
229화명32238320783270452
230화본54472499234400
231화순52997109720267
232황간12332277162969822
233횡천6886455562171
234효천36264467516922
235희방사2422680179105