Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 140 |
Missing cells | 70 |
Missing cells (%) | 12.5% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 4.9 KiB |
Average record size in memory | 35.9 B |
Variable types
Text | 1 |
---|---|
Numeric | 3 |
Dataset
Description | 역별 새마을 상행 여객 승하차 실적 입니다. |
---|---|
Author | 한국철도공사 |
URL | https://www.data.go.kr/data/15068472/fileData.do |
승차 is highly overall correlated with 하차 and 1 other fields | High correlation |
하차 is highly overall correlated with 승차 and 1 other fields | High correlation |
인키로 is highly overall correlated with 승차 and 1 other fields | High correlation |
승차 has 22 (15.7%) missing values | Missing |
하차 has 24 (17.1%) missing values | Missing |
인키로 has 24 (17.1%) missing values | Missing |
역명 has unique values | Unique |
Reproduction
Analysis started | 2023-12-12 07:14:23.357169 |
---|---|
Analysis finished | 2023-12-12 07:14:24.857771 |
Duration | 1.5 second |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
역명
Text
UNIQUE
 
Distinct | 140 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 1.2 KiB |
Value | Count | Frequency (%) |
가평 | 1 | 0.7% |
오근장 | 1 | 0.7% |
전곡 | 1 | 0.7% |
장항 | 1 | 0.7% |
장성 | 1 | 0.7% |
임진강 | 1 | 0.7% |
익산 | 1 | 0.7% |
의정부 | 1 | 0.7% |
의성 | 1 | 0.7% |
예천 | 1 | 0.7% |
Other values (130) | 130 |
Most occurring characters
Value | Count | Frequency (%) |
천 | 21 | 6.6% |
산 | 14 | 4.4% |
주 | 12 | 3.8% |
동 | 8 | 2.5% |
원 | 8 | 2.5% |
평 | 7 | 2.2% |
성 | 7 | 2.2% |
양 | 7 | 2.2% |
영 | 6 | 1.9% |
진 | 6 | 1.9% |
Other values (124) | 223 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 319 |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
천 | 21 | 6.6% |
산 | 14 | 4.4% |
주 | 12 | 3.8% |
동 | 8 | 2.5% |
원 | 8 | 2.5% |
평 | 7 | 2.2% |
성 | 7 | 2.2% |
양 | 7 | 2.2% |
영 | 6 | 1.9% |
진 | 6 | 1.9% |
Other values (124) | 223 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 319 |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
천 | 21 | 6.6% |
산 | 14 | 4.4% |
주 | 12 | 3.8% |
동 | 8 | 2.5% |
원 | 8 | 2.5% |
평 | 7 | 2.2% |
성 | 7 | 2.2% |
양 | 7 | 2.2% |
영 | 6 | 1.9% |
진 | 6 | 1.9% |
Other values (124) | 223 |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 319 |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
천 | 21 | 6.6% |
산 | 14 | 4.4% |
주 | 12 | 3.8% |
동 | 8 | 2.5% |
원 | 8 | 2.5% |
평 | 7 | 2.2% |
성 | 7 | 2.2% |
양 | 7 | 2.2% |
영 | 6 | 1.9% |
진 | 6 | 1.9% |
Other values (124) | 223 |
승차
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 114 |
---|---|
Distinct (%) | 96.6% |
Missing | 22 |
Missing (%) | 15.7% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 10991.669 |
Minimum | 2 |
---|---|
Maximum | 142570 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.4 KiB |
Quantile statistics
Minimum | 2 |
---|---|
5-th percentile | 28.35 |
Q1 | 330.5 |
median | 861.5 |
Q3 | 3240.5 |
95-th percentile | 66567.45 |
Maximum | 142570 |
Range | 142568 |
Interquartile range (IQR) | 2910 |
Descriptive statistics
Standard deviation | 27238.301 |
---|---|
Coefficient of variation (CV) | 2.4780859 |
Kurtosis | 11.228797 |
Mean | 10991.669 |
Median Absolute Deviation (MAD) | 794.5 |
Skewness | 3.3326379 |
Sum | 1297017 |
Variance | 7.4192505 × 108 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
9 | 3 | 2.1% |
39 | 2 | 1.4% |
335 | 2 | 1.4% |
60096 | 1 | 0.7% |
56 | 1 | 0.7% |
192 | 1 | 0.7% |
18557 | 1 | 0.7% |
232 | 1 | 0.7% |
757 | 1 | 0.7% |
71330 | 1 | 0.7% |
Other values (104) | 104 | |
(Missing) | 22 | 15.7% |
Value | Count | Frequency (%) |
2 | 1 | 0.7% |
8 | 1 | 0.7% |
9 | 3 | |
19 | 1 | 0.7% |
30 | 1 | 0.7% |
36 | 1 | 0.7% |
39 | 2 | |
47 | 1 | 0.7% |
50 | 1 | 0.7% |
52 | 1 | 0.7% |
Value | Count | Frequency (%) |
142570 | 1 | |
138928 | 1 | |
118062 | 1 | |
110782 | 1 | |
88268 | 1 | |
71330 | 1 | |
65727 | 1 | |
63091 | 1 | |
60096 | 1 | |
53179 | 1 |
하차
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 111 |
---|---|
Distinct (%) | 95.7% |
Missing | 24 |
Missing (%) | 17.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 11181.181 |
Minimum | 2 |
---|---|
Maximum | 296107 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.4 KiB |
Quantile statistics
Minimum | 2 |
---|---|
5-th percentile | 12.75 |
Q1 | 146 |
median | 539.5 |
Q3 | 2938.75 |
95-th percentile | 39084.25 |
Maximum | 296107 |
Range | 296105 |
Interquartile range (IQR) | 2792.75 |
Descriptive statistics
Standard deviation | 42687.721 |
---|---|
Coefficient of variation (CV) | 3.8178186 |
Kurtosis | 30.790851 |
Mean | 11181.181 |
Median Absolute Deviation (MAD) | 507.5 |
Skewness | 5.4717771 |
Sum | 1297017 |
Variance | 1.8222415 × 109 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
269 | 2 | 1.4% |
11 | 2 | 1.4% |
2 | 2 | 1.4% |
150 | 2 | 1.4% |
297 | 2 | 1.4% |
304 | 1 | 0.7% |
34 | 1 | 0.7% |
362 | 1 | 0.7% |
8934 | 1 | 0.7% |
772 | 1 | 0.7% |
Other values (101) | 101 | |
(Missing) | 24 | 17.1% |
Value | Count | Frequency (%) |
2 | 2 | |
10 | 1 | |
11 | 2 | |
12 | 1 | |
13 | 1 | |
19 | 1 | |
20 | 1 | |
24 | 1 | |
26 | 1 | |
30 | 1 |
Value | Count | Frequency (%) |
296107 | 1 | |
258496 | 1 | |
218871 | 1 | |
104412 | 1 | |
58699 | 1 | |
46183 | 1 | |
36718 | 1 | |
34824 | 1 | |
25771 | 1 | |
22894 | 1 |
인키로
Real number (ℝ)
HIGH CORRELATION
  MISSING
 
Distinct | 116 |
---|---|
Distinct (%) | 100.0% |
Missing | 24 |
Missing (%) | 17.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1480503.7 |
Minimum | 402 |
---|---|
Maximum | 33241899 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 1.4 KiB |
Quantile statistics
Minimum | 402 |
---|---|
5-th percentile | 2475.25 |
Q1 | 31042.75 |
median | 127892 |
Q3 | 538203.75 |
95-th percentile | 5497905 |
Maximum | 33241899 |
Range | 33241497 |
Interquartile range (IQR) | 507161 |
Descriptive statistics
Standard deviation | 5068720.5 |
---|---|
Coefficient of variation (CV) | 3.4236459 |
Kurtosis | 25.202908 |
Mean | 1480503.7 |
Median Absolute Deviation (MAD) | 121553 |
Skewness | 4.9607025 |
Sum | 1.7173843 × 108 |
Variance | 2.5691927 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
590191 | 1 | 0.7% |
1810772 | 1 | 0.7% |
682564 | 1 | 0.7% |
7737 | 1 | 0.7% |
9544 | 1 | 0.7% |
697211 | 1 | 0.7% |
204737 | 1 | 0.7% |
17287 | 1 | 0.7% |
1150153 | 1 | 0.7% |
280487 | 1 | 0.7% |
Other values (106) | 106 | |
(Missing) | 24 | 17.1% |
Value | Count | Frequency (%) |
402 | 1 | |
666 | 1 | |
838 | 1 | |
943 | 1 | |
1980 | 1 | |
2146 | 1 | |
2585 | 1 | |
3151 | 1 | |
3699 | 1 | |
4349 | 1 |
Value | Count | Frequency (%) |
33241899 | 1 | |
29910355 | 1 | |
23596447 | 1 | |
20095664 | 1 | |
9280010 | 1 | |
9132696 | 1 | |
4286308 | 1 | |
2937441 | 1 | |
2798996 | 1 | |
2686387 | 1 |
승차 | 하차 | 인키로 | |
---|---|---|---|
승차 | 1.000 | 0.963 | 0.770 |
하차 | 0.963 | 1.000 | 0.937 |
인키로 | 0.770 | 0.937 | 1.000 |
승차 | 하차 | 인키로 | |
---|---|---|---|
승차 | 1.000 | 0.588 | 0.560 |
하차 | 0.588 | 1.000 | 0.945 |
인키로 | 0.560 | 0.945 | 1.000 |
역명 | 승차 | 하차 | 인키로 | |
---|---|---|---|---|
0 | 가평 | 1601 | <NA> | <NA> |
1 | 간석 | <NA> | 382 | 97907 |
2 | 경산 | 2 | <NA> | <NA> |
3 | 경주 | 325 | 10 | 2585 |
4 | 곡성 | 3274 | 1604 | 188226 |
5 | 광양 | 1184 | 182 | 67036 |
6 | 광주송정 | 1466 | 155 | 35927 |
7 | 광천 | 40420 | 4425 | 449064 |
8 | 구례구 | 1692 | 427 | 107259 |
9 | 구미 | 794 | 607 | 121154 |
역명 | 승차 | 하차 | 인키로 | |
---|---|---|---|---|
130 | 포항 | 1481 | 150 | 36242 |
131 | 풍기 | 1897 | 137 | 30979 |
132 | 하동 | 2201 | 297 | 38531 |
133 | 하양 | 30 | <NA> | <NA> |
134 | 함안 | 329 | <NA> | <NA> |
135 | 함평 | 235 | 11 | 1980 |
136 | 호계 | <NA> | 311 | 1870901 |
137 | 홍성 | 142570 | 22894 | 2017034 |
138 | 화명 | <NA> | 49 | 12708 |
139 | 화본 | <NA> | <NA> | <NA> |