Overview

Dataset statistics

Number of variables18
Number of observations170
Missing cells1487
Missing cells (%)48.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.5 KiB
Average record size in memory147.8 B

Variable types

Categorical1
Numeric3
Text1
Unsupported13

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13294/F/1/datasetView.do

Alerts

외부 역번호 is highly overall correlated with 호선High correlation
개집표기 is highly overall correlated with 호선High correlation
EV is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 외부 역번호 and 2 other fieldsHigh correlation
외부 역번호 has 13 (7.6%) missing valuesMissing
역명 has 5 (2.9%) missing valuesMissing
개집표기 has 4 (2.4%) missing valuesMissing
플랩형 has 9 (5.3%) missing valuesMissing
Unnamed: 5 has 53 (31.2%) missing valuesMissing
Unnamed: 6 has 9 (5.3%) missing valuesMissing
Unnamed: 7 has 116 (68.2%) missing valuesMissing
Unnamed: 8 has 78 (45.9%) missing valuesMissing
장애인/비상 has 121 (71.2%) missing valuesMissing
Unnamed: 10 has 160 (94.1%) missing valuesMissing
Unnamed: 11 has 126 (74.1%) missing valuesMissing
Unnamed: 12 has 86 (50.6%) missing valuesMissing
Unnamed: 13 has 86 (50.6%) missing valuesMissing
개방형 has 163 (95.9%) missing valuesMissing
Unnamed: 15 has 163 (95.9%) missing valuesMissing
Unnamed: 16 has 163 (95.9%) missing valuesMissing
EV has 132 (77.6%) missing valuesMissing
플랩형 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
장애인/비상 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
개방형 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 16 is an unsupported type, check if it needs cleaning or further analysisUnsupported
개집표기 has 6 (3.5%) zerosZeros
EV has 2 (1.2%) zerosZeros

Reproduction

Analysis started2024-04-29 16:49:37.461766
Analysis finished2024-04-29 16:49:40.188053
Duration2.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size1.5 KiB
5호선
53 
7호선
53 
6호선
39 
8호선
18 
<NA>
 
3
Other values (4)
 
4

Length

Max length6
Median length3
Mean length3.0882353
Min length3

Unique

Unique4 ?
Unique (%)2.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row5호선
5th row5호선

Common Values

ValueCountFrequency (%)
5호선 53
31.2%
7호선 53
31.2%
6호선 39
22.9%
8호선 18
 
10.6%
<NA> 3
 
1.8%
5호선 합계 1
 
0.6%
6호선 합계 1
 
0.6%
7호선 합계 1
 
0.6%
8호선 합계 1
 
0.6%

Length

2024-04-30T01:49:40.431837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:49:40.538453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5호선 54
31.0%
7호선 54
31.0%
6호선 40
23.0%
8호선 19
 
10.9%
합계 4
 
2.3%
na 3
 
1.7%

외부 역번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct157
Distinct (%)100.0%
Missing13
Missing (%)7.6%
Infinite0
Infinite (%)0.0%
Mean2654.242
Minimum2511
Maximum2827
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2024-04-30T01:49:40.689546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2511
5-th percentile2518.8
Q12550
median2638
Q32739
95-th percentile2819.2
Maximum2827
Range316
Interquartile range (IQR)189

Descriptive statistics

Standard deviation100.18415
Coefficient of variation (CV)0.037744919
Kurtosis-1.3226005
Mean2654.242
Median Absolute Deviation (MAD)95
Skewness0.11408772
Sum416716
Variance10036.864
MonotonicityStrictly increasing
2024-04-30T01:49:40.824574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2730 1
 
0.6%
2723 1
 
0.6%
2724 1
 
0.6%
2725 1
 
0.6%
2726 1
 
0.6%
2727 1
 
0.6%
2728 1
 
0.6%
2729 1
 
0.6%
2731 1
 
0.6%
2721 1
 
0.6%
Other values (147) 147
86.5%
(Missing) 13
 
7.6%
ValueCountFrequency (%)
2511 1
0.6%
2512 1
0.6%
2513 1
0.6%
2514 1
0.6%
2515 1
0.6%
2516 1
0.6%
2517 1
0.6%
2518 1
0.6%
2519 1
0.6%
2520 1
0.6%
ValueCountFrequency (%)
2827 1
0.6%
2826 1
0.6%
2825 1
0.6%
2824 1
0.6%
2823 1
0.6%
2822 1
0.6%
2821 1
0.6%
2820 1
0.6%
2819 1
0.6%
2818 1
0.6%

역명
Text

MISSING 

Distinct165
Distinct (%)100.0%
Missing5
Missing (%)2.9%
Memory size1.5 KiB
2024-04-30T01:49:41.136370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.2242424
Min length2

Characters and Unicode

Total characters532
Distinct characters191
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique165 ?
Unique (%)100.0%

Sample

1st row전자처(랩실)
2nd row도봉중정비
3rd row방화기지
4th row방화
5th row개화산
ValueCountFrequency (%)
여의도 1
 
0.6%
청담 1
 
0.6%
사가정 1
 
0.6%
용마산 1
 
0.6%
중곡 1
 
0.6%
군자(7 1
 
0.6%
어린이대공원 1
 
0.6%
건대입구 1
 
0.6%
뚝섬 1
 
0.6%
유원지 1
 
0.6%
Other values (161) 161
94.2%
2024-04-30T01:49:41.602849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
16
 
3.0%
14
 
2.6%
13
 
2.4%
) 12
 
2.3%
( 12
 
2.3%
12
 
2.3%
11
 
2.1%
11
 
2.1%
10
 
1.9%
9
 
1.7%
Other values (181) 412
77.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 490
92.1%
Close Punctuation 12
 
2.3%
Open Punctuation 12
 
2.3%
Decimal Number 12
 
2.3%
Control 6
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
16
 
3.3%
14
 
2.9%
13
 
2.7%
12
 
2.4%
11
 
2.2%
11
 
2.2%
10
 
2.0%
9
 
1.8%
9
 
1.8%
8
 
1.6%
Other values (172) 377
76.9%
Decimal Number
ValueCountFrequency (%)
5 4
33.3%
6 3
25.0%
7 2
16.7%
8 1
 
8.3%
4 1
 
8.3%
3 1
 
8.3%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Control
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 490
92.1%
Common 42
 
7.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
16
 
3.3%
14
 
2.9%
13
 
2.7%
12
 
2.4%
11
 
2.2%
11
 
2.2%
10
 
2.0%
9
 
1.8%
9
 
1.8%
8
 
1.6%
Other values (172) 377
76.9%
Common
ValueCountFrequency (%)
) 12
28.6%
( 12
28.6%
6
14.3%
5 4
 
9.5%
6 3
 
7.1%
7 2
 
4.8%
8 1
 
2.4%
4 1
 
2.4%
3 1
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 490
92.1%
ASCII 42
 
7.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
16
 
3.3%
14
 
2.9%
13
 
2.7%
12
 
2.4%
11
 
2.2%
11
 
2.2%
10
 
2.0%
9
 
1.8%
9
 
1.8%
8
 
1.6%
Other values (172) 377
76.9%
ASCII
ValueCountFrequency (%)
) 12
28.6%
( 12
28.6%
6
14.3%
5 4
 
9.5%
6 3
 
7.1%
7 2
 
4.8%
8 1
 
2.4%
4 1
 
2.4%
3 1
 
2.4%

개집표기
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct40
Distinct (%)24.1%
Missing4
Missing (%)2.4%
Infinite0
Infinite (%)0.0%
Mean32.60241
Minimum0
Maximum1058
Zeros6
Zeros (%)3.5%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2024-04-30T01:49:41.779020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile8
Q111
median15.5
Q320
95-th percentile37.5
Maximum1058
Range1058
Interquartile range (IQR)9

Descriptive statistics

Standard deviation112.20279
Coefficient of variation (CV)3.4415489
Kurtosis59.756939
Mean32.60241
Median Absolute Deviation (MAD)4.5
Skewness7.5748836
Sum5412
Variance12589.465
MonotonicityNot monotonic
2024-04-30T01:49:41.925478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
16 18
 
10.6%
10 17
 
10.0%
13 14
 
8.2%
12 9
 
5.3%
14 9
 
5.3%
11 8
 
4.7%
8 7
 
4.1%
19 7
 
4.1%
20 7
 
4.1%
17 6
 
3.5%
Other values (30) 64
37.6%
ValueCountFrequency (%)
0 6
 
3.5%
6 1
 
0.6%
7 1
 
0.6%
8 7
4.1%
9 5
 
2.9%
10 17
10.0%
11 8
4.7%
12 9
5.3%
13 14
8.2%
14 9
5.3%
ValueCountFrequency (%)
1058 1
0.6%
816 1
0.6%
595 1
0.6%
237 1
0.6%
58 1
0.6%
53 1
0.6%
45 1
0.6%
39 1
0.6%
38 1
0.6%
36 1
0.6%

플랩형
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)5.3%
Memory size1.5 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing53
Missing (%)31.2%
Memory size1.5 KiB

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing9
Missing (%)5.3%
Memory size1.5 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing116
Missing (%)68.2%
Memory size1.5 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing78
Missing (%)45.9%
Memory size1.5 KiB

장애인/비상
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing121
Missing (%)71.2%
Memory size1.5 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing160
Missing (%)94.1%
Memory size1.5 KiB

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing126
Missing (%)74.1%
Memory size1.5 KiB

Unnamed: 12
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing86
Missing (%)50.6%
Memory size1.5 KiB

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing86
Missing (%)50.6%
Memory size1.5 KiB

개방형
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing163
Missing (%)95.9%
Memory size1.5 KiB

Unnamed: 15
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing163
Missing (%)95.9%
Memory size1.5 KiB

Unnamed: 16
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing163
Missing (%)95.9%
Memory size1.5 KiB

EV
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct6
Distinct (%)15.8%
Missing132
Missing (%)77.6%
Infinite0
Infinite (%)0.0%
Mean2.5263158
Minimum0
Maximum23
Zeros2
Zeros (%)1.2%
Negative0
Negative (%)0.0%
Memory size1.6 KiB
2024-04-30T01:49:42.026983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.85
Q11
median2
Q32
95-th percentile8.15
Maximum23
Range23
Interquartile range (IQR)1

Descriptive statistics

Standard deviation3.9437727
Coefficient of variation (CV)1.5610767
Kurtosis20.31887
Mean2.5263158
Median Absolute Deviation (MAD)1
Skewness4.2276488
Sum96
Variance15.553343
MonotonicityNot monotonic
2024-04-30T01:49:42.116682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 16
 
9.4%
1 16
 
9.4%
8 2
 
1.2%
0 2
 
1.2%
23 1
 
0.6%
9 1
 
0.6%
(Missing) 132
77.6%
ValueCountFrequency (%)
0 2
 
1.2%
1 16
9.4%
2 16
9.4%
8 2
 
1.2%
9 1
 
0.6%
23 1
 
0.6%
ValueCountFrequency (%)
23 1
 
0.6%
9 1
 
0.6%
8 2
 
1.2%
2 16
9.4%
1 16
9.4%
0 2
 
1.2%

Interactions

2024-04-30T01:49:39.314005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:38.818303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.098887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.388126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:38.943190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.185686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.463876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.019830image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:49:39.250249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T01:49:42.204094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선외부 역번호개집표기EV
호선1.0001.0001.0001.000
외부 역번호1.0001.000NaNNaN
개집표기1.000NaN1.0001.000
EV1.000NaN1.0001.000
2024-04-30T01:49:42.334911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
외부 역번호개집표기EV호선
외부 역번호1.0000.0830.0320.990
개집표기0.0831.0000.4500.991
EV0.0320.4501.0000.926
호선0.9900.9910.9261.000

Missing values

2024-04-30T01:49:39.586882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:49:39.775004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:49:39.981781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

호선외부 역번호역명개집표기플랩형Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8장애인/비상Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13개방형Unnamed: 15Unnamed: 16EV
0<NA><NA><NA><NA>1형2형3형4형5형1형2형5형6형7형1형2형3형<NA>
1<NA><NA>전자처(랩실)<NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN<NA>
2<NA><NA>도봉중정비<NA>1NaN1NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN<NA>
35호선<NA>방화기지<NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN<NA>
45호선2511방화101121111NaN11NaNNaNNaN<NA>
55호선2512개화산112NaN212NaNNaNNaN11NaNNaNNaN2
65호선2513김포공항25537NaN4NaNNaNNaN22NaNNaNNaN2
75호선2514송정11224NaN1NaNNaNNaN11NaNNaNNaN<NA>
85호선2515마곡16242NaN1NaNNaNNaN11131<NA>
95호선2516발산29757241NaN111NaNNaNNaN<NA>
호선외부 역번호역명개집표기플랩형Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8장애인/비상Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13개방형Unnamed: 15Unnamed: 16EV
1608호선2820장지19465NaN21NaN1NaNNaNNaNNaNNaN<NA>
1618호선2821복정1022211NaNNaNNaN11NaNNaNNaN<NA>
1628호선2822산성13325NaN1NaNNaNNaN11NaNNaNNaN<NA>
1638호선2823남한산성입구1333412NaNNaNNaNNaNNaNNaNNaNNaN0
1648호선2824단대오거리1532422NaNNaNNaNNaNNaNNaNNaNNaN2
1658호선2825신흥6113NaN1NaNNaNNaNNaNNaNNaNNaNNaN<NA>
1668호선2826수진9224NaN1NaNNaNNaNNaNNaNNaNNaNNaN<NA>
1678호선2827모란811211NaNNaNNaNNaNNaNNaNNaNNaN2
1688호선<NA>모란기지0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN<NA>
1698호선 합계<NA><NA>237464471132640410100009