Overview

Dataset statistics

Number of variables4
Number of observations140
Missing cells60
Missing cells (%)10.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.9 KiB
Average record size in memory35.9 B

Variable types

Text1
Numeric3

Dataset

Description새마을 하행 역별 승차, 하차 등 실적 입니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/15068474/fileData.do

Alerts

승차 is highly overall correlated with 하차 and 1 other fieldsHigh correlation
하차 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
인키로 is highly overall correlated with 승차 and 1 other fieldsHigh correlation
승차 has 20 (14.3%) missing valuesMissing
하차 has 20 (14.3%) missing valuesMissing
인키로 has 20 (14.3%) missing valuesMissing
역명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 02:11:03.828484
Analysis finished2023-12-12 02:11:05.860210
Duration2.03 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역명
Text

UNIQUE 

Distinct140
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-12T11:11:06.205218image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.2785714
Min length2

Characters and Unicode

Total characters319
Distinct characters134
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique140 ?
Unique (%)100.0%

Sample

1st row가평
2nd row간석
3rd row경산
4th row경주
5th row곡성
ValueCountFrequency (%)
가평 1
 
0.7%
오근장 1
 
0.7%
전곡 1
 
0.7%
장항 1
 
0.7%
장성 1
 
0.7%
임진강 1
 
0.7%
익산 1
 
0.7%
의정부 1
 
0.7%
의성 1
 
0.7%
예천 1
 
0.7%
Other values (130) 130
92.9%
2023-12-12T11:11:06.796444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 319
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 319
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 319
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
21
 
6.6%
14
 
4.4%
12
 
3.8%
8
 
2.5%
8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
6
 
1.9%
6
 
1.9%
Other values (124) 223
69.9%

승차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct116
Distinct (%)96.7%
Missing20
Missing (%)14.3%
Infinite0
Infinite (%)0.0%
Mean10981.425
Minimum2
Maximum264515
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T11:11:06.988554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile9.95
Q1158.75
median557
Q33269.25
95-th percentile46265.05
Maximum264515
Range264513
Interquartile range (IQR)3110.5

Descriptive statistics

Standard deviation40105.48
Coefficient of variation (CV)3.6521198
Kurtosis28.143808
Mean10981.425
Median Absolute Deviation (MAD)528.5
Skewness5.2323423
Sum1317771
Variance1.6084495 × 109
MonotonicityNot monotonic
2023-12-12T11:11:07.154259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6 2
 
1.4%
9 2
 
1.4%
269 2
 
1.4%
2 2
 
1.4%
4717 1
 
0.7%
5993 1
 
0.7%
34 1
 
0.7%
380 1
 
0.7%
6161 1
 
0.7%
861 1
 
0.7%
Other values (106) 106
75.7%
(Missing) 20
 
14.3%
ValueCountFrequency (%)
2 2
1.4%
6 2
1.4%
9 2
1.4%
10 1
0.7%
11 1
0.7%
15 1
0.7%
18 1
0.7%
19 1
0.7%
24 1
0.7%
27 1
0.7%
ValueCountFrequency (%)
264515 1
0.7%
247151 1
0.7%
215176 1
0.7%
120300 1
0.7%
66330 1
0.7%
46817 1
0.7%
46236 1
0.7%
42839 1
0.7%
35089 1
0.7%
27089 1
0.7%

하차
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct118
Distinct (%)98.3%
Missing20
Missing (%)14.3%
Infinite0
Infinite (%)0.0%
Mean10981.425
Minimum2
Maximum162739
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T11:11:07.293928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile29.2
Q1333.75
median938
Q33295.5
95-th percentile66185.75
Maximum162739
Range162737
Interquartile range (IQR)2961.75

Descriptive statistics

Standard deviation27744.848
Coefficient of variation (CV)2.5265253
Kurtosis14.276541
Mean10981.425
Median Absolute Deviation (MAD)765
Skewness3.6071349
Sum1317771
Variance7.6977661 × 108
MonotonicityNot monotonic
2023-12-12T11:11:07.473988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
335 2
 
1.4%
686 2
 
1.4%
2054 1
 
0.7%
16030 1
 
0.7%
192 1
 
0.7%
22210 1
 
0.7%
197 1
 
0.7%
1130 1
 
0.7%
79785 1
 
0.7%
437 1
 
0.7%
Other values (108) 108
77.1%
(Missing) 20
 
14.3%
ValueCountFrequency (%)
2 1
0.7%
5 1
0.7%
9 1
0.7%
12 1
0.7%
13 1
0.7%
14 1
0.7%
30 1
0.7%
36 1
0.7%
39 1
0.7%
43 1
0.7%
ValueCountFrequency (%)
162739 1
0.7%
156560 1
0.7%
96562 1
0.7%
95639 1
0.7%
82788 1
0.7%
79785 1
0.7%
65470 1
0.7%
64966 1
0.7%
58917 1
0.7%
55406 1
0.7%

인키로
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct120
Distinct (%)100.0%
Missing20
Missing (%)14.3%
Infinite0
Infinite (%)0.0%
Mean1512584.8
Minimum161
Maximum32592545
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2023-12-12T11:11:07.682838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum161
5-th percentile3019.8
Q164831.5
median195457
Q3576723.75
95-th percentile7610851
Maximum32592545
Range32592384
Interquartile range (IQR)511892.25

Descriptive statistics

Standard deviation3939030.5
Coefficient of variation (CV)2.6041717
Kurtosis34.523384
Mean1512584.8
Median Absolute Deviation (MAD)173502
Skewness5.1933887
Sum1.8151018 × 108
Variance1.5515961 × 1013
MonotonicityNot monotonic
2023-12-12T11:11:07.841937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
68674 1
 
0.7%
43622 1
 
0.7%
2817275 1
 
0.7%
44830 1
 
0.7%
29791 1
 
0.7%
6226435 1
 
0.7%
181929 1
 
0.7%
79352 1
 
0.7%
182001 1
 
0.7%
2401519 1
 
0.7%
Other values (110) 110
78.6%
(Missing) 20
 
14.3%
ValueCountFrequency (%)
161 1
0.7%
390 1
0.7%
400 1
0.7%
833 1
0.7%
1788 1
0.7%
2028 1
0.7%
3072 1
0.7%
3495 1
0.7%
4120 1
0.7%
6720 1
0.7%
ValueCountFrequency (%)
32592545 1
0.7%
16964026 1
0.7%
13793434 1
0.7%
8925361 1
0.7%
8500276 1
0.7%
8034133 1
0.7%
7588573 1
0.7%
7423940 1
0.7%
6226435 1
0.7%
5479949 1
0.7%

Interactions

2023-12-12T11:11:05.100388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.010231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.683456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:05.235386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.428164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.787149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:05.383196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.573541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:11:04.932361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:11:07.940039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.6820.758
하차0.6821.0000.841
인키로0.7580.8411.000
2023-12-12T11:11:08.067017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승차하차인키로
승차1.0000.5830.559
하차0.5831.0000.894
인키로0.5590.8941.000

Missing values

2023-12-12T11:11:05.552290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:11:05.661811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T11:11:05.794253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역명승차하차인키로
0가평315<NA><NA>
1간석382<NA><NA>
2경산<NA>2400
3경주1034689447
4곡성9864161488286
5광양182992365383
6광주송정1781428330992
7광천5379449134557922
8구례구2861268318511
9구미524721143908
역명승차하차인키로
130포항5601866450846
131풍기1521652373561
132하동4701928250126
133하양<NA>306720
134함안<NA>33080441
135함평933760645
136호계29<NA><NA>
137홍성2708915656013793434
138화명49<NA><NA>
139화본34134176214