Overview

Dataset statistics

Number of variables4
Number of observations268
Missing cells401
Missing cells (%)37.4%
Duplicate rows16
Duplicate rows (%)6.0%
Total size in memory8.5 KiB
Average record size in memory32.5 B

Variable types

Text3
Categorical1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13116/S/1/datasetView.do

Alerts

Dataset has 16 (6.0%) duplicate rowsDuplicates
휠체어경사로 설치현황 has 244 (91.0%) missing valuesMissing
Unnamed: 1 has 132 (49.3%) missing valuesMissing
Unnamed: 3 has 25 (9.3%) missing valuesMissing

Reproduction

Analysis started2024-04-29 16:41:48.310961
Analysis finished2024-04-29 16:41:49.795814
Duration1.48 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct17
Distinct (%)70.8%
Missing244
Missing (%)91.0%
Memory size2.2 KiB
2024-04-30T01:41:49.901972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7.5
Mean length4.625
Min length2

Characters and Unicode

Total characters111
Distinct characters24
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)50.0%

Sample

1st row(1~4호선)
2nd row호선
3rd row총 120역
4th row1호선(10역)
5th row2호선(50역)
ValueCountFrequency (%)
4
13.3%
4
13.3%
4호선 3
10.0%
1호선 3
10.0%
3호선 3
10.0%
2호선 3
10.0%
4호선(26역 1
 
3.3%
1
 
3.3%
120역 1
 
3.3%
1호선(10역 1
 
3.3%
Other values (6) 6
20.0%
2024-04-30T01:41:50.187277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
17.1%
19
17.1%
1 7
 
6.3%
4 6
 
5.4%
2 6
 
5.4%
6
 
5.4%
) 5
 
4.5%
5
 
4.5%
3 5
 
4.5%
( 5
 
4.5%
Other values (14) 28
25.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 57
51.4%
Decimal Number 29
26.1%
Close Punctuation 9
 
8.1%
Open Punctuation 9
 
8.1%
Space Separator 6
 
5.4%
Math Symbol 1
 
0.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
33.3%
19
33.3%
5
 
8.8%
4
 
7.0%
4
 
7.0%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%
Decimal Number
ValueCountFrequency (%)
1 7
24.1%
4 6
20.7%
2 6
20.7%
3 5
17.2%
0 3
10.3%
6 1
 
3.4%
5 1
 
3.4%
Close Punctuation
ValueCountFrequency (%)
) 5
55.6%
] 4
44.4%
Open Punctuation
ValueCountFrequency (%)
( 5
55.6%
[ 4
44.4%
Space Separator
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 57
51.4%
Common 54
48.6%

Most frequent character per script

Common
ValueCountFrequency (%)
1 7
13.0%
4 6
11.1%
2 6
11.1%
6
11.1%
) 5
9.3%
3 5
9.3%
( 5
9.3%
[ 4
7.4%
] 4
7.4%
0 3
5.6%
Other values (3) 3
5.6%
Hangul
ValueCountFrequency (%)
19
33.3%
19
33.3%
5
 
8.8%
4
 
7.0%
4
 
7.0%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 57
51.4%
ASCII 54
48.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
19
33.3%
19
33.3%
5
 
8.8%
4
 
7.0%
4
 
7.0%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%
1
 
1.8%
ASCII
ValueCountFrequency (%)
1 7
13.0%
4 6
11.1%
2 6
11.1%
6
11.1%
) 5
9.3%
3 5
9.3%
( 5
9.3%
[ 4
7.4%
] 4
7.4%
0 3
5.6%
Other values (3) 3
5.6%

Unnamed: 1
Text

MISSING 

Distinct118
Distinct (%)86.8%
Missing132
Missing (%)49.3%
Memory size2.2 KiB
2024-04-30T01:41:50.433764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length2.9044118
Min length2

Characters and Unicode

Total characters395
Distinct characters148
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)77.9%

Sample

1st row설치역수
2nd row107역
3rd row10역
4th row49역
5th row29역
ValueCountFrequency (%)
동대문 4
 
2.9%
4
 
2.9%
역명 4
 
2.9%
4
 
2.9%
역사문화공원 2
 
1.4%
시청 2
 
1.4%
충무로 2
 
1.4%
신설동 2
 
1.4%
을지로3가 2
 
1.4%
사당 2
 
1.4%
Other values (109) 112
80.0%
2024-04-30T01:41:50.779389image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
5.3%
15
 
3.8%
15
 
3.8%
15
 
3.8%
13
 
3.3%
11
 
2.8%
9
 
2.3%
9
 
2.3%
8
 
2.0%
7
 
1.8%
Other values (138) 272
68.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 374
94.7%
Decimal Number 17
 
4.3%
Space Separator 4
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
5.6%
15
 
4.0%
15
 
4.0%
15
 
4.0%
13
 
3.5%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.9%
Other values (129) 251
67.1%
Decimal Number
ValueCountFrequency (%)
3 4
23.5%
9 3
17.6%
1 3
17.6%
0 2
11.8%
4 2
11.8%
7 1
 
5.9%
2 1
 
5.9%
5 1
 
5.9%
Space Separator
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 374
94.7%
Common 21
 
5.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
5.6%
15
 
4.0%
15
 
4.0%
15
 
4.0%
13
 
3.5%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.9%
Other values (129) 251
67.1%
Common
ValueCountFrequency (%)
4
19.0%
3 4
19.0%
9 3
14.3%
1 3
14.3%
0 2
9.5%
4 2
9.5%
7 1
 
4.8%
2 1
 
4.8%
5 1
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 374
94.7%
ASCII 21
 
5.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
21
 
5.6%
15
 
4.0%
15
 
4.0%
15
 
4.0%
13
 
3.5%
11
 
2.9%
9
 
2.4%
9
 
2.4%
8
 
2.1%
7
 
1.9%
Other values (129) 251
67.1%
ASCII
ValueCountFrequency (%)
4
19.0%
3 4
19.0%
9 3
14.3%
1 3
14.3%
0 2
9.5%
4 2
9.5%
7 1
 
4.8%
2 1
 
4.8%
5 1
 
4.8%

Unnamed: 2
Categorical

Distinct18
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size2.2 KiB
<NA>
134 
1
39 
2
36 
3
19 
-
 
11
Other values (13)
29 

Length

Max length4
Median length4
Mean length2.6044776
Min length1

Unique

Unique10 ?
Unique (%)3.7%

Sample

1st row<NA>
2nd row설치현황
3rd row233대
4th row28대
5th row120대

Common Values

ValueCountFrequency (%)
<NA> 134
50.0%
1 39
 
14.6%
2 36
 
13.4%
3 19
 
7.1%
- 11
 
4.1%
4 10
 
3.7%
5 5
 
1.9%
개 소 4
 
1.5%
233대 1
 
0.4%
44 1
 
0.4%
Other values (8) 8
 
3.0%

Length

2024-04-30T01:41:50.919169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 134
49.3%
1 39
 
14.3%
2 36
 
13.2%
3 19
 
7.0%
11
 
4.0%
4 10
 
3.7%
5 5
 
1.8%
4
 
1.5%
4
 
1.5%
설치현황 1
 
0.4%
Other values (9) 9
 
3.3%

Unnamed: 3
Text

MISSING 

Distinct105
Distinct (%)43.2%
Missing25
Missing (%)9.3%
Memory size2.2 KiB
2024-04-30T01:41:51.128543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length25
Mean length15.444444
Min length1

Characters and Unicode

Total characters3753
Distinct characters92
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique80 ?
Unique (%)32.9%

Sample

1st row설치위치
2nd row- 4번 외부출구 앞
3rd row- 2번 외부출구 측 E/V 앞
4th row- 4번 외부출구 측 E/V 앞
5th row대합실 지하1층 화장실 앞
ValueCountFrequency (%)
189
15.6%
189
15.6%
외부출구 149
12.3%
120
9.9%
e/v 119
9.8%
대합실 65
 
5.4%
지하1층 40
 
3.3%
화장실 35
 
2.9%
1번 33
 
2.7%
2번 21
 
1.7%
Other values (90) 249
20.6%
2024-04-30T01:41:51.455169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
966
25.7%
201
 
5.4%
- 193
 
5.1%
165
 
4.4%
163
 
4.3%
158
 
4.2%
155
 
4.1%
152
 
4.1%
140
 
3.7%
E 126
 
3.4%
Other values (82) 1334
35.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1899
50.6%
Space Separator 966
25.7%
Decimal Number 274
 
7.3%
Uppercase Letter 253
 
6.7%
Dash Punctuation 193
 
5.1%
Other Punctuation 145
 
3.9%
Close Punctuation 11
 
0.3%
Open Punctuation 11
 
0.3%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
201
 
10.6%
165
 
8.7%
163
 
8.6%
158
 
8.3%
155
 
8.2%
152
 
8.0%
140
 
7.4%
101
 
5.3%
67
 
3.5%
66
 
3.5%
Other values (60) 531
28.0%
Decimal Number
ValueCountFrequency (%)
1 104
38.0%
2 45
16.4%
3 30
 
10.9%
4 24
 
8.8%
6 20
 
7.3%
5 18
 
6.6%
7 13
 
4.7%
8 11
 
4.0%
0 5
 
1.8%
9 4
 
1.5%
Other Punctuation
ValueCountFrequency (%)
/ 125
86.2%
, 13
 
9.0%
. 4
 
2.8%
# 3
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
E 126
49.8%
V 125
49.4%
F 2
 
0.8%
Space Separator
ValueCountFrequency (%)
966
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 193
100.0%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1899
50.6%
Common 1601
42.7%
Latin 253
 
6.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
201
 
10.6%
165
 
8.7%
163
 
8.6%
158
 
8.3%
155
 
8.2%
152
 
8.0%
140
 
7.4%
101
 
5.3%
67
 
3.5%
66
 
3.5%
Other values (60) 531
28.0%
Common
ValueCountFrequency (%)
966
60.3%
- 193
 
12.1%
/ 125
 
7.8%
1 104
 
6.5%
2 45
 
2.8%
3 30
 
1.9%
4 24
 
1.5%
6 20
 
1.2%
5 18
 
1.1%
7 13
 
0.8%
Other values (9) 63
 
3.9%
Latin
ValueCountFrequency (%)
E 126
49.8%
V 125
49.4%
F 2
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1899
50.6%
ASCII 1854
49.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
966
52.1%
- 193
 
10.4%
E 126
 
6.8%
V 125
 
6.7%
/ 125
 
6.7%
1 104
 
5.6%
2 45
 
2.4%
3 30
 
1.6%
4 24
 
1.3%
6 20
 
1.1%
Other values (12) 96
 
5.2%
Hangul
ValueCountFrequency (%)
201
 
10.6%
165
 
8.7%
163
 
8.6%
158
 
8.3%
155
 
8.2%
152
 
8.0%
140
 
7.4%
101
 
5.3%
67
 
3.5%
66
 
3.5%
Other values (60) 531
28.0%

Correlations

2024-04-30T01:41:51.546430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
휠체어경사로 설치현황Unnamed: 2
휠체어경사로 설치현황1.0001.000
Unnamed: 21.0001.000

Missing values

2024-04-30T01:41:49.522840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:41:49.639609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:41:49.740788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

휠체어경사로 설치현황Unnamed: 1Unnamed: 2Unnamed: 3
0(1~4호선)<NA><NA><NA>
1호선설치역수설치현황<NA>
2총 120역107역233대<NA>
31호선(10역)10역28대<NA>
42호선(50역)49역120대<NA>
53호선(34역)29역44대<NA>
64호선(26역)19역41대<NA>
7<NA><NA><NA><NA>
8호선별 설치현황<NA><NA><NA>
9[1호선]<NA><NA><NA>
휠체어경사로 설치현황Unnamed: 1Unnamed: 2Unnamed: 3
258<NA>신용산1- 2번 외부출구 측 장애인 리프트 앞
259<NA>이촌2- 2번 외부출구 측 E/V 앞
260<NA><NA><NA>대합실 지하1층 화장실 앞
261<NA>동작1- 1번 외부출구 앞
262<NA>총신대입구2- 1번 외부출구 측 E/V 앞
263<NA><NA><NA>- 14번 외부출구 측 E/V 앞
264<NA>사당1- 10번 외부출구 측 E/V 앞
265<NA>남태령2- 2번 외부출구 측 E/V 앞
266<NA><NA><NA>대합실 지하2층 화장실 앞
267<NA>소 계41<NA>

Duplicate rows

Most frequently occurring

휠체어경사로 설치현황Unnamed: 1Unnamed: 2Unnamed: 3# duplicates
13<NA><NA><NA>대합실 지하1층 화장실 앞14
7<NA><NA><NA>- 4번 외부출구 측 E/V 앞6
0구 분역명개 소설치위치4
5<NA><NA><NA>- 3번 외부출구 측 E/V 앞4
8<NA><NA><NA>- 5번 외부출구 측 E/V 앞4
15<NA><NA><NA><NA>4
2<NA><NA><NA>- 1번 외부출구 측 E/V 앞3
4<NA><NA><NA>- 2번 외부출구 측 E/V 앞3
9<NA><NA><NA>- 7번 외부출구 측 E/V 앞3
11<NA><NA><NA>- 대합실 지상 1층 화장실 앞3