Overview

Dataset statistics

Number of variables7
Number of observations282
Missing cells3
Missing cells (%)0.2%
Duplicate rows1
Duplicate rows (%)0.4%
Total size in memory15.6 KiB
Average record size in memory56.5 B

Variable types

Unsupported1
Categorical5
Text1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13293/F/1/datasetView.do

Alerts

Dataset has 1 (0.4%) duplicate rowsDuplicates
Unnamed: 1 is highly overall correlated with Unnamed: 2 and 3 other fieldsHigh correlation
Unnamed: 2 is highly overall correlated with Unnamed: 1 and 3 other fieldsHigh correlation
Unnamed: 4 is highly overall correlated with Unnamed: 1 and 3 other fieldsHigh correlation
Unnamed: 5 is highly overall correlated with Unnamed: 1 and 3 other fieldsHigh correlation
Unnamed: 6 is highly overall correlated with Unnamed: 1 and 3 other fieldsHigh correlation
Unnamed: 4 is highly imbalanced (74.3%)Imbalance
역사별 승강장안전문(PSD) 설치현황 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-17 04:25:43.062487
Analysis finished2024-04-17 04:25:43.881483
Duration0.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역사별 승강장안전문(PSD) 설치현황
Unsupported

REJECTED  UNSUPPORTED 

Missing1
Missing (%)0.4%
Memory size2.3 KiB

Unnamed: 1
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2호선
52 
5호선
51 
7호선
51 
6호선
38 
3호선
34 
Other values (5)
56 

Length

Max length4
Median length3
Mean length3.0035461
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
2호선 52
18.4%
5호선 51
18.1%
7호선 51
18.1%
6호선 38
13.5%
3호선 34
12.1%
4호선 26
9.2%
8호선 17
 
6.0%
1호선 10
 
3.5%
<NA> 2
 
0.7%
호선 1
 
0.4%

Length

2024-04-17T13:25:43.939147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:44.048884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2호선 52
18.4%
5호선 51
18.1%
7호선 51
18.1%
6호선 38
13.5%
3호선 34
12.1%
4호선 26
9.2%
8호선 17
 
6.0%
1호선 10
 
3.5%
na 2
 
0.7%
호선 1
 
0.4%

Unnamed: 2
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
ATO+RF+센서 방식
157 
RF+센서 방식
62 
센서 방식
60 
<NA>
 
2
개폐방식
 
1

Length

Max length12
Median length12
Mean length9.5460993
Min length4

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row개폐방식
4th rowRF+센서 방식
5th rowRF+센서 방식

Common Values

ValueCountFrequency (%)
ATO+RF+센서 방식 157
55.7%
RF+센서 방식 62
 
22.0%
센서 방식 60
 
21.3%
<NA> 2
 
0.7%
개폐방식 1
 
0.4%

Length

2024-04-17T13:25:44.173134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:44.259988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
방식 279
49.7%
ato+rf+센서 157
28.0%
rf+센서 62
 
11.1%
센서 60
 
10.7%
na 2
 
0.4%
개폐방식 1
 
0.2%
Distinct256
Distinct (%)91.4%
Missing2
Missing (%)0.7%
Memory size2.3 KiB
2024-04-17T13:25:44.557520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length7
Mean length3.8571429
Min length2

Characters and Unicode

Total characters1080
Distinct characters217
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique233 ?
Unique (%)83.2%

Sample

1st row역사명
2nd row서 울
3rd row시 청
4th row종 각
5th row종로3가
ValueCountFrequency (%)
10
 
2.5%
8
 
2.0%
8
 
2.0%
8
 
2.0%
6
 
1.5%
6
 
1.5%
5
 
1.2%
5
 
1.2%
5
 
1.2%
4
 
1.0%
Other values (246) 343
84.1%
2024-04-17T13:25:45.000257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
260
24.1%
32
 
3.0%
28
 
2.6%
25
 
2.3%
25
 
2.3%
19
 
1.8%
15
 
1.4%
15
 
1.4%
15
 
1.4%
14
 
1.3%
Other values (207) 632
58.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 805
74.5%
Space Separator 260
 
24.1%
Decimal Number 8
 
0.7%
Uppercase Letter 3
 
0.3%
Open Punctuation 2
 
0.2%
Close Punctuation 2
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
4.0%
28
 
3.5%
25
 
3.1%
25
 
3.1%
19
 
2.4%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (198) 603
74.9%
Decimal Number
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
C 1
33.3%
M 1
33.3%
D 1
33.3%
Space Separator
ValueCountFrequency (%)
260
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 805
74.5%
Common 272
 
25.2%
Latin 3
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
4.0%
28
 
3.5%
25
 
3.1%
25
 
3.1%
19
 
2.4%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (198) 603
74.9%
Common
ValueCountFrequency (%)
260
95.6%
3 5
 
1.8%
4 2
 
0.7%
( 2
 
0.7%
) 2
 
0.7%
5 1
 
0.4%
Latin
ValueCountFrequency (%)
C 1
33.3%
M 1
33.3%
D 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 805
74.5%
ASCII 275
 
25.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
260
94.5%
3 5
 
1.8%
4 2
 
0.7%
( 2
 
0.7%
) 2
 
0.7%
C 1
 
0.4%
M 1
 
0.4%
5 1
 
0.4%
D 1
 
0.4%
Hangul
ValueCountFrequency (%)
32
 
4.0%
28
 
3.5%
25
 
3.1%
25
 
3.1%
19
 
2.4%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (198) 603
74.9%

Unnamed: 4
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
자체
252 
민자
 
24
서울시(신설역)
 
3
<NA>
 
2
사업방식
 
1

Length

Max length8
Median length2
Mean length2.0851064
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row사업방식
4th row민자
5th row민자

Common Values

ValueCountFrequency (%)
자체 252
89.4%
민자 24
 
8.5%
서울시(신설역) 3
 
1.1%
<NA> 2
 
0.7%
사업방식 1
 
0.4%

Length

2024-04-17T13:25:45.108721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:45.203337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자체 252
89.4%
민자 24
 
8.5%
서울시(신설역 3
 
1.1%
na 2
 
0.7%
사업방식 1
 
0.4%

Unnamed: 5
Categorical

HIGH CORRELATION 

Distinct45
Distinct (%)16.0%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2009.12
58 
’09.12.29
39 
2009.10
38 
2008.12
27 
2008.9
20 
Other values (40)
100 

Length

Max length9
Median length8
Mean length7.7553191
Min length3

Unique

Unique17 ?
Unique (%)6.0%

Sample

1st row<NA>
2nd row<NA>
3rd row설치일
4th row’07.11.01
5th row’0712.03

Common Values

ValueCountFrequency (%)
2009.12 58
20.6%
’09.12.29 39
13.8%
2009.10 38
13.5%
2008.12 27
9.6%
2008.9 20
 
7.1%
’09.12.30 18
 
6.4%
2013.01 9
 
3.2%
’09.11.30 8
 
2.8%
2007.10 4
 
1.4%
’09.06.24 4
 
1.4%
Other values (35) 57
20.2%

Length

2024-04-17T13:25:45.307516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2009.12 58
20.6%
’09.12.29 39
13.8%
2009.10 38
13.5%
2008.12 27
9.6%
2008.9 20
 
7.1%
’09.12.30 18
 
6.4%
2013.01 9
 
3.2%
’09.11.30 8
 
2.8%
2007.10 4
 
1.4%
’09.06.24 4
 
1.4%
Other values (35) 57
20.2%

Unnamed: 6
Categorical

HIGH CORRELATION 

Distinct13
Distinct (%)4.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
현대E/L
68 
현대E/V
45 
GS네오텍
40 
삼중테크
36 
삼성SDS
27 
Other values (8)
66 

Length

Max length6
Median length5
Mean length4.7553191
Min length3

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row<NA>
2nd row<NA>
3rd row공사업체
4th row현대E/L
5th row현대E/L

Common Values

ValueCountFrequency (%)
현대E/L 68
24.1%
현대E/V 45
16.0%
GS네오텍 40
14.2%
삼중테크 36
12.8%
삼성SDS 27
 
9.6%
도철PSD 20
 
7.1%
포스콘 20
 
7.1%
피에쓰에쓰텍 14
 
5.0%
서윤산업 4
 
1.4%
DUANI 4
 
1.4%
Other values (3) 4
 
1.4%

Length

2024-04-17T13:25:45.402020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
현대e/l 68
24.1%
현대e/v 45
16.0%
gs네오텍 40
14.2%
삼중테크 36
12.8%
삼성sds 27
 
9.6%
도철psd 20
 
7.1%
포스콘 20
 
7.1%
피에쓰에쓰텍 14
 
5.0%
서윤산업 4
 
1.4%
duani 4
 
1.4%
Other values (3) 4
 
1.4%

Correlations

2024-04-17T13:25:45.462708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 4Unnamed: 5Unnamed: 6
Unnamed: 11.0001.0000.7920.9410.920
Unnamed: 21.0001.0000.9290.9870.981
Unnamed: 40.7920.9291.0000.9710.908
Unnamed: 50.9410.9870.9711.0000.993
Unnamed: 60.9200.9810.9080.9931.000
2024-04-17T13:25:45.541504image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 4Unnamed: 1Unnamed: 5Unnamed: 2Unnamed: 6
Unnamed: 41.0000.6440.7560.6470.625
Unnamed: 10.6441.0000.6580.9910.715
Unnamed: 50.7560.6581.0000.8080.859
Unnamed: 20.6470.9910.8081.0000.805
Unnamed: 60.6250.7150.8590.8051.000
2024-04-17T13:25:45.617981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 4Unnamed: 5Unnamed: 6
Unnamed: 11.0000.9910.6440.6580.715
Unnamed: 20.9911.0000.6470.8080.805
Unnamed: 40.6440.6471.0000.7560.625
Unnamed: 50.6580.8080.7561.0000.859
Unnamed: 60.7150.8050.6250.8591.000

Missing values

2024-04-17T13:25:43.391126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T13:25:43.477438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T13:25:43.816226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

역사별 승강장안전문(PSD) 설치현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
0NaN<NA><NA><NA><NA><NA><NA>
1□ 승강장안전문관리단<NA><NA><NA><NA><NA><NA>
2연번호선개폐방식역사명사업방식설치일공사업체
311호선RF+센서 방식서 울민자’07.11.01현대E/L
421호선RF+센서 방식시 청민자’0712.03현대E/L
531호선RF+센서 방식종 각자체’09.12.29현대E/L
641호선RF+센서 방식종로3가민자’08.01.03현대E/L
751호선RF+센서 방식종로5가자체’09.12.29현대E/L
861호선RF+센서 방식동대문자체’08.06.18서윤산업
971호선RF+센서 방식동 묘자체’06.01.10현대E/L
역사별 승강장안전문(PSD) 설치현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6
2722708호선ATO+RF+센서 방식가락시장자체2009.12도철PSD
2732718호선ATO+RF+센서 방식문정자체2009.12도철PSD
2742728호선ATO+RF+센서 방식장지자체2009.12도철PSD
2752738호선ATO+RF+센서 방식복정자체2009.12도철PSD
2762748호선ATO+RF+센서 방식산성자체2009.12도철PSD
2772758호선ATO+RF+센서 방식남한산성입구자체2009.12도철PSD
2782768호선ATO+RF+센서 방식단대오거리자체2009.12도철PSD
2792778호선ATO+RF+센서 방식신흥자체2009.12도철PSD
2802788호선ATO+RF+센서 방식수진자체2009.12도철PSD
2812798호선ATO+RF+센서 방식모란자체2009.12도철PSD

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6# duplicates
0<NA><NA><NA><NA><NA><NA>2