Overview

Dataset statistics

Number of variables14
Number of observations124
Missing cells623
Missing cells (%)35.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.8 KiB
Average record size in memory114.1 B

Variable types

Categorical1
Numeric1
Text1
Unsupported11

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13294/F/1/datasetView.do

Alerts

역번호 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 역번호High correlation
역번호 has 5 (4.0%) missing valuesMissing
역 명 has 5 (4.0%) missing valuesMissing
턴스타일게이트 has 9 (7.3%) missing valuesMissing
Unnamed: 4 has 9 (7.3%) missing valuesMissing
Unnamed: 5 has 112 (90.3%) missing valuesMissing
Unnamed: 6 has 9 (7.3%) missing valuesMissing
Unnamed: 7 has 9 (7.3%) missing valuesMissing
슬림게이트 has 87 (70.2%) missing valuesMissing
Unnamed: 9 has 87 (70.2%) missing valuesMissing
Unnamed: 10 has 117 (94.4%) missing valuesMissing
Unnamed: 11 has 87 (70.2%) missing valuesMissing
Unnamed: 12 has 87 (70.2%) missing valuesMissing
턴스타일게이트 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
슬림게이트 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
스피드게이트 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 16:49:44.116742
Analysis finished2024-04-29 16:49:45.162956
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)7.3%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
2호선
50 
3호선
33 
4호선
26 
1호선
10 
<NA>
 
1
Other values (4)
 
4

Length

Max length6
Median length3
Mean length3.1048387
Min length3

Unique

Unique5 ?
Unique (%)4.0%

Sample

1st row<NA>
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 50
40.3%
3호선 33
26.6%
4호선 26
21.0%
1호선 10
 
8.1%
<NA> 1
 
0.8%
1호선 합계 1
 
0.8%
2호선 합계 1
 
0.8%
3호선 합계 1
 
0.8%
4호선 합계 1
 
0.8%

Length

2024-04-30T01:49:45.221736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:49:45.340003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 51
39.8%
3호선 34
26.6%
4호선 27
21.1%
1호선 11
 
8.6%
합계 4
 
3.1%
na 1
 
0.8%

역번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct119
Distinct (%)100.0%
Missing5
Missing (%)4.0%
Infinite0
Infinite (%)0.0%
Mean290.12605
Minimum150
Maximum434
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2024-04-30T01:49:45.473330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile155.9
Q1220.5
median250
Q3338.5
95-th percentile428.1
Maximum434
Range284
Interquartile range (IQR)118

Descriptive statistics

Standard deviation87.25227
Coefficient of variation (CV)0.30073918
Kurtosis-1.1541503
Mean290.12605
Median Absolute Deviation (MAD)68
Skewness0.27063256
Sum34525
Variance7612.9586
MonotonicityStrictly increasing
2024-04-30T01:49:45.593454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
151 1
 
0.8%
338 1
 
0.8%
337 1
 
0.8%
336 1
 
0.8%
335 1
 
0.8%
334 1
 
0.8%
333 1
 
0.8%
332 1
 
0.8%
331 1
 
0.8%
330 1
 
0.8%
Other values (109) 109
87.9%
(Missing) 5
 
4.0%
ValueCountFrequency (%)
150 1
0.8%
151 1
0.8%
152 1
0.8%
153 1
0.8%
154 1
0.8%
155 1
0.8%
156 1
0.8%
157 1
0.8%
158 1
0.8%
159 1
0.8%
ValueCountFrequency (%)
434 1
0.8%
433 1
0.8%
432 1
0.8%
431 1
0.8%
430 1
0.8%
429 1
0.8%
428 1
0.8%
427 1
0.8%
426 1
0.8%
425 1
0.8%

역 명
Text

MISSING 

Distinct118
Distinct (%)99.2%
Missing5
Missing (%)4.0%
Memory size1.1 KiB
2024-04-30T01:49:45.867511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length3.4957983
Min length2

Characters and Unicode

Total characters416
Distinct characters147
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique117 ?
Unique (%)98.3%

Sample

1st row서울역(1)
2nd row시청(1)
3rd row종각
4th row종로3가(1)
5th row종로5가
ValueCountFrequency (%)
3
 
2.2%
3
 
2.2%
동대문역사문화공원 2
 
1.5%
2
 
1.5%
2
 
1.5%
1
 
0.7%
1
 
0.7%
남부터미널 1
 
0.7%
1
 
0.7%
1
 
0.7%
Other values (119) 119
87.5%
2024-04-30T01:49:46.286613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
 
7.7%
21
 
5.0%
14
 
3.4%
13
 
3.1%
13
 
3.1%
( 13
 
3.1%
) 13
 
3.1%
9
 
2.2%
8
 
1.9%
7
 
1.7%
Other values (137) 273
65.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 339
81.5%
Space Separator 32
 
7.7%
Decimal Number 19
 
4.6%
Open Punctuation 13
 
3.1%
Close Punctuation 13
 
3.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
6.2%
14
 
4.1%
13
 
3.8%
13
 
3.8%
9
 
2.7%
8
 
2.4%
7
 
2.1%
7
 
2.1%
6
 
1.8%
6
 
1.8%
Other values (129) 235
69.3%
Decimal Number
ValueCountFrequency (%)
3 6
31.6%
1 5
26.3%
2 5
26.3%
4 2
 
10.5%
5 1
 
5.3%
Space Separator
ValueCountFrequency (%)
32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 339
81.5%
Common 77
 
18.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
6.2%
14
 
4.1%
13
 
3.8%
13
 
3.8%
9
 
2.7%
8
 
2.4%
7
 
2.1%
7
 
2.1%
6
 
1.8%
6
 
1.8%
Other values (129) 235
69.3%
Common
ValueCountFrequency (%)
32
41.6%
( 13
16.9%
) 13
16.9%
3 6
 
7.8%
1 5
 
6.5%
2 5
 
6.5%
4 2
 
2.6%
5 1
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 339
81.5%
ASCII 77
 
18.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32
41.6%
( 13
16.9%
) 13
16.9%
3 6
 
7.8%
1 5
 
6.5%
2 5
 
6.5%
4 2
 
2.6%
5 1
 
1.3%
Hangul
ValueCountFrequency (%)
21
 
6.2%
14
 
4.1%
13
 
3.8%
13
 
3.8%
9
 
2.7%
8
 
2.4%
7
 
2.1%
7
 
2.1%
6
 
1.8%
6
 
1.8%
Other values (129) 235
69.3%

턴스타일게이트
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)7.3%
Memory size1.1 KiB

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)7.3%
Memory size1.1 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing112
Missing (%)90.3%
Memory size1.1 KiB

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)7.3%
Memory size1.1 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)7.3%
Memory size1.1 KiB

슬림게이트
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing87
Missing (%)70.2%
Memory size1.1 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing87
Missing (%)70.2%
Memory size1.1 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing117
Missing (%)94.4%
Memory size1.1 KiB

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing87
Missing (%)70.2%
Memory size1.1 KiB

Unnamed: 12
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing87
Missing (%)70.2%
Memory size1.1 KiB

스피드게이트
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size1.1 KiB

Interactions

2024-04-30T01:49:44.608020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T01:49:46.379690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역번호
호선1.0001.000
역번호1.0001.000
2024-04-30T01:49:46.454916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역번호호선
역번호1.0000.987
호선0.9871.000

Missing values

2024-04-30T01:49:44.711110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:49:44.870761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:49:45.031424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

호선역번호역 명턴스타일게이트Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7슬림게이트Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12스피드게이트
0<NA><NA><NA>소계ENEXREVFL소계1형2형3형5형설치
11호선150서울역(1)4610NaN306161NaN1415
21호선151시청(1)NaNNaNNaNNaNNaN644NaN5644
31호선152종각376NaN274112NaN725
41호선153종로3가(1)345NaN245NaNNaNNaNNaNNaN4
51호선154종로5가314NaN234NaNNaNNaNNaNNaN3
61호선155동대문(1)325NaN225NaNNaNNaNNaNNaN5
71호선156신설동(1)254NaN174112NaN725
81호선157제기동204NaN133NaNNaNNaNNaNNaN2
91호선158청량리368NaN235NaNNaNNaNNaNNaN3
호선역번호역 명턴스타일게이트Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7슬림게이트Unnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12스피드게이트
1144호선426서울역92NaN5281NaN612
1154호선427숙대입구284NaN204NaNNaNNaNNaNNaN4
1164호선428삼각지254NaN174NaNNaNNaNNaNNaN2
1174호선429신용산234NaN154NaNNaNNaNNaNNaN4
1184호선430이 촌194NaN132NaNNaNNaNNaNNaN2
1194호선431동 작102NaN62NaNNaNNaNNaNNaN2
1204호선432총신대234NaN154NaNNaNNaNNaNNaN3
1214호선433사당(4)256NaN136NaNNaNNaNNaNNaN3
1224호선434남태령72NaN41NaNNaNNaNNaNNaN1
1234호선 합계<NA><NA>54992037681118130921371