Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells111
Missing cells (%)0.2%
Duplicate rows14
Duplicate rows (%)0.1%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text2
Numeric1
Categorical2

Dataset

Description한국철도공사에서 매년 발간되는 통계연보의 수도권 전철 역간 여객발착 수송량 대한 데이터로 출발역,도착역,인원,단위 항목을 제공합니다.
Author한국철도공사
URLhttps://www.data.go.kr/data/3037647/fileData.do

Alerts

단위 has constant value ""Constant
연도 has constant value ""Constant
Dataset has 14 (0.1%) duplicate rowsDuplicates
도착역 has 111 (1.1%) missing valuesMissing
인원 has 2026 (20.3%) zerosZeros

Reproduction

Analysis started2023-12-12 21:51:27.629325
Analysis finished2023-12-12 21:51:28.189260
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

출발
Text

Distinct337
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T06:51:28.433856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length2
Mean length3.2565
Min length2

Characters and Unicode

Total characters32565
Distinct characters237
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row송내
2nd row안국
3rd row충무로(4)
4th row옥수(3)
5th row오산대
ValueCountFrequency (%)
수서 46
 
0.5%
선릉(2 45
 
0.4%
종합운동장 44
 
0.4%
기흥 43
 
0.4%
봉명 43
 
0.4%
창동(4 42
 
0.4%
김유정 42
 
0.4%
용답 42
 
0.4%
가좌 41
 
0.4%
당산 41
 
0.4%
Other values (327) 9571
95.7%
2023-12-13T06:51:28.938426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 1778
 
5.5%
) 1778
 
5.5%
1153
 
3.5%
807
 
2.5%
771
 
2.4%
638
 
2.0%
620
 
1.9%
606
 
1.9%
578
 
1.8%
578
 
1.8%
Other values (227) 23258
71.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 27291
83.8%
Open Punctuation 1778
 
5.5%
Close Punctuation 1778
 
5.5%
Decimal Number 1544
 
4.7%
Uppercase Letter 174
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1153
 
4.2%
807
 
3.0%
771
 
2.8%
638
 
2.3%
620
 
2.3%
606
 
2.2%
578
 
2.1%
578
 
2.1%
550
 
2.0%
486
 
1.8%
Other values (218) 20504
75.1%
Decimal Number
ValueCountFrequency (%)
2 533
34.5%
3 519
33.6%
4 299
19.4%
1 169
 
10.9%
5 24
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
D 116
66.7%
P 58
33.3%
Open Punctuation
ValueCountFrequency (%)
( 1778
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1778
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 27291
83.8%
Common 5100
 
15.7%
Latin 174
 
0.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1153
 
4.2%
807
 
3.0%
771
 
2.8%
638
 
2.3%
620
 
2.3%
606
 
2.2%
578
 
2.1%
578
 
2.1%
550
 
2.0%
486
 
1.8%
Other values (218) 20504
75.1%
Common
ValueCountFrequency (%)
( 1778
34.9%
) 1778
34.9%
2 533
 
10.5%
3 519
 
10.2%
4 299
 
5.9%
1 169
 
3.3%
5 24
 
0.5%
Latin
ValueCountFrequency (%)
D 116
66.7%
P 58
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 27291
83.8%
ASCII 5274
 
16.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 1778
33.7%
) 1778
33.7%
2 533
 
10.1%
3 519
 
9.8%
4 299
 
5.7%
1 169
 
3.2%
D 116
 
2.2%
P 58
 
1.1%
5 24
 
0.5%
Hangul
ValueCountFrequency (%)
1153
 
4.2%
807
 
3.0%
771
 
2.8%
638
 
2.3%
620
 
2.3%
606
 
2.2%
578
 
2.1%
578
 
2.1%
550
 
2.0%
486
 
1.8%
Other values (218) 20504
75.1%

도착역
Text

MISSING 

Distinct294
Distinct (%)3.0%
Missing111
Missing (%)1.1%
Memory size156.2 KiB
2023-12-13T06:51:29.416042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.6492062
Min length2

Characters and Unicode

Total characters26198
Distinct characters211
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row오류동
2nd row도봉산
3rd row원흥
4th row경마공원
5th row덕계
ValueCountFrequency (%)
세마 49
 
0.5%
신해운대 49
 
0.5%
독산 48
 
0.5%
화전 46
 
0.5%
신촌 46
 
0.5%
석계 46
 
0.5%
연수 46
 
0.5%
상록수 46
 
0.5%
중랑 46
 
0.5%
아신 44
 
0.4%
Other values (284) 9423
95.3%
2023-12-13T06:51:30.077778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
878
 
3.4%
850
 
3.2%
759
 
2.9%
650
 
2.5%
583
 
2.2%
576
 
2.2%
519
 
2.0%
434
 
1.7%
430
 
1.6%
) 426
 
1.6%
Other values (201) 20093
76.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 25346
96.7%
Close Punctuation 426
 
1.6%
Open Punctuation 426
 
1.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
878
 
3.5%
850
 
3.4%
759
 
3.0%
650
 
2.6%
583
 
2.3%
576
 
2.3%
519
 
2.0%
434
 
1.7%
430
 
1.7%
390
 
1.5%
Other values (199) 19277
76.1%
Close Punctuation
ValueCountFrequency (%)
) 426
100.0%
Open Punctuation
ValueCountFrequency (%)
( 426
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 25346
96.7%
Common 852
 
3.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
878
 
3.5%
850
 
3.4%
759
 
3.0%
650
 
2.6%
583
 
2.3%
576
 
2.3%
519
 
2.0%
434
 
1.7%
430
 
1.7%
390
 
1.5%
Other values (199) 19277
76.1%
Common
ValueCountFrequency (%)
) 426
50.0%
( 426
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 25346
96.7%
ASCII 852
 
3.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
878
 
3.5%
850
 
3.4%
759
 
3.0%
650
 
2.6%
583
 
2.3%
576
 
2.3%
519
 
2.0%
434
 
1.7%
430
 
1.7%
390
 
1.5%
Other values (199) 19277
76.1%
ASCII
ValueCountFrequency (%)
) 426
50.0%
( 426
50.0%

인원
Real number (ℝ)

ZEROS 

Distinct3767
Distinct (%)37.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5059.5196
Minimum0
Maximum418320
Zeros2026
Zeros (%)20.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:51:30.225486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q110
median278.5
Q32001.75
95-th percentile23093.3
Maximum418320
Range418320
Interquartile range (IQR)1991.75

Descriptive statistics

Standard deviation19547.631
Coefficient of variation (CV)3.863535
Kurtosis106.72548
Mean5059.5196
Median Absolute Deviation (MAD)278.5
Skewness8.8002708
Sum50595196
Variance3.8210987 × 108
MonotonicityNot monotonic
2023-12-13T06:51:30.384614image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2026
 
20.3%
1 93
 
0.9%
2 68
 
0.7%
3 53
 
0.5%
4 52
 
0.5%
6 49
 
0.5%
5 45
 
0.4%
7 41
 
0.4%
9 39
 
0.4%
14 33
 
0.3%
Other values (3757) 7501
75.0%
ValueCountFrequency (%)
0 2026
20.3%
1 93
 
0.9%
2 68
 
0.7%
3 53
 
0.5%
4 52
 
0.5%
5 45
 
0.4%
6 49
 
0.5%
7 41
 
0.4%
8 32
 
0.3%
9 39
 
0.4%
ValueCountFrequency (%)
418320 1
< 0.1%
381920 1
< 0.1%
339350 1
< 0.1%
337803 1
< 0.1%
298012 1
< 0.1%
292246 1
< 0.1%
263491 1
< 0.1%
260666 1
< 0.1%
259045 1
< 0.1%
249718 1
< 0.1%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
10000 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
10000
100.0%

Length

2023-12-13T06:51:30.537618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:51:30.951339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
10000
100.0%

연도
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2022
10000 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022
2nd row2022
3rd row2022
4th row2022
5th row2022

Common Values

ValueCountFrequency (%)
2022 10000
100.0%

Length

2023-12-13T06:51:31.056576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:51:31.153193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2022 10000
100.0%

Interactions

2023-12-13T06:51:27.929085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2023-12-13T06:51:28.039424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:51:28.146473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

출발도착역인원단위연도
87486송내오류동516302022
20727안국도봉산78462022
33215충무로(4)원흥02022
22677옥수(3)경마공원17412022
78352오산대덕계412022
41771석계선정릉68642022
31340길음대야미13882022
44588대모산입구덕소4322022
51773서울(경의선)남춘천22022
27776오금(3)대야미972022
출발도착역인원단위연도
39173서빙고신현132022
88927제물포가산디지털단지242772022
77951세마세류101432022
84866안산소래포구320522022
14316홍대입구화전02022
97740기흥대모산입구71382022
50446오빈대곡82022
13890당산인하대3592022
92660서울숲망양02022
45579신길병점95092022

Duplicate rows

Most frequently occurring

출발도착역인원단위연도# duplicates
0강남<NA>020222
1경마공원<NA>020222
2교대(2)<NA>020222
3대모산입구<NA>020222
4사릉<NA>020222
5서울(4)<NA>020222
6양천구청<NA>020222
7역삼<NA>020222
8온수<NA>020222
9운천<NA>020222