Overview

Dataset statistics

Number of variables11
Number of observations291
Missing cells264
Missing cells (%)8.2%
Duplicate rows1
Duplicate rows (%)0.3%
Total size in memory25.1 KiB
Average record size in memory88.5 B

Variable types

Categorical1
Unsupported9
Text1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13208/F/1/datasetView.do

Alerts

Dataset has 1 (0.3%) duplicate rowsDuplicates
Unnamed: 1 has 10 (3.4%) missing valuesMissing
Unnamed: 2 has 10 (3.4%) missing valuesMissing
Unnamed: 3 has 4 (1.4%) missing valuesMissing
Unnamed: 4 has 4 (1.4%) missing valuesMissing
Unnamed: 5 has 4 (1.4%) missing valuesMissing
Unnamed: 6 has 165 (56.7%) missing valuesMissing
Unnamed: 7 has 4 (1.4%) missing valuesMissing
Unnamed: 8 has 54 (18.6%) missing valuesMissing
Unnamed: 9 has 5 (1.7%) missing valuesMissing
Unnamed: 10 has 4 (1.4%) missing valuesMissing
Unnamed: 1 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 16:42:31.895792
Analysis finished2024-04-29 16:42:32.638661
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct19
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
7호선
51 
5호선
51 
2호선
50 
6호선
38 
3호선
33 
Other values (14)
68 

Length

Max length6
Median length3
Mean length3.0962199
Min length2

Unique

Unique10 ?
Unique (%)3.4%

Sample

1st row<NA>
2nd row<NA>
3rd row호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
7호선 51
17.5%
5호선 51
17.5%
2호선 50
17.2%
6호선 38
13.1%
3호선 33
11.3%
4호선 26
8.9%
8호선 17
 
5.8%
1호선 10
 
3.4%
<NA> 5
 
1.7%
호선 1
 
0.3%
Other values (9) 9
 
3.1%

Length

2024-04-30T01:42:32.702483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7호선 52
17.4%
5호선 52
17.4%
2호선 51
17.1%
6호선 39
13.0%
3호선 34
11.4%
4호선 27
9.0%
8호선 18
 
6.0%
1호선 11
 
3.7%
합계 8
 
2.7%
na 5
 
1.7%
Other values (2) 2
 
0.7%

Unnamed: 1
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10
Missing (%)3.4%
Memory size2.4 KiB

Unnamed: 2
Text

MISSING 

Distinct265
Distinct (%)94.3%
Missing10
Missing (%)3.4%
Memory size2.4 KiB
2024-04-30T01:42:32.936992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.3380783
Min length2

Characters and Unicode

Total characters938
Distinct characters222
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique250 ?
Unique (%)89.0%

Sample

1st row역명
2nd row서울역(1)
3rd row시청(1)
4th row종각
5th row종로3가(1)
ValueCountFrequency (%)
소계 4
 
1.3%
동대문역사문화공원 3
 
1.0%
3
 
1.0%
3
 
1.0%
충정로 2
 
0.6%
2
 
0.6%
신대방 2
 
0.6%
뚝섬 2
 
0.6%
동묘앞 2
 
0.6%
불광 2
 
0.6%
Other values (270) 283
91.9%
2024-04-30T01:42:33.315013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
36
 
3.8%
32
 
3.4%
27
 
2.9%
25
 
2.7%
24
 
2.6%
( 24
 
2.6%
) 24
 
2.6%
18
 
1.9%
15
 
1.6%
15
 
1.6%
Other values (212) 698
74.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 813
86.7%
Space Separator 36
 
3.8%
Decimal Number 35
 
3.7%
Open Punctuation 24
 
2.6%
Close Punctuation 24
 
2.6%
Control 6
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
3.9%
27
 
3.3%
25
 
3.1%
24
 
3.0%
18
 
2.2%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
12
 
1.5%
Other values (200) 616
75.8%
Decimal Number
ValueCountFrequency (%)
3 7
20.0%
5 6
17.1%
2 5
14.3%
1 5
14.3%
6 4
11.4%
7 3
8.6%
4 3
8.6%
8 2
 
5.7%
Space Separator
ValueCountFrequency (%)
36
100.0%
Open Punctuation
ValueCountFrequency (%)
( 24
100.0%
Close Punctuation
ValueCountFrequency (%)
) 24
100.0%
Control
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 813
86.7%
Common 125
 
13.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
3.9%
27
 
3.3%
25
 
3.1%
24
 
3.0%
18
 
2.2%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
12
 
1.5%
Other values (200) 616
75.8%
Common
ValueCountFrequency (%)
36
28.8%
( 24
19.2%
) 24
19.2%
3 7
 
5.6%
6
 
4.8%
5 6
 
4.8%
2 5
 
4.0%
1 5
 
4.0%
6 4
 
3.2%
7 3
 
2.4%
Other values (2) 5
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 813
86.7%
ASCII 125
 
13.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
36
28.8%
( 24
19.2%
) 24
19.2%
3 7
 
5.6%
6
 
4.8%
5 6
 
4.8%
2 5
 
4.0%
1 5
 
4.0%
6 4
 
3.2%
7 3
 
2.4%
Other values (2) 5
 
4.0%
Hangul
ValueCountFrequency (%)
32
 
3.9%
27
 
3.3%
25
 
3.1%
24
 
3.0%
18
 
2.2%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
12
 
1.5%
Other values (200) 616
75.8%

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)1.4%
Memory size2.4 KiB

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)1.4%
Memory size2.4 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)1.4%
Memory size2.4 KiB

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing165
Missing (%)56.7%
Memory size2.4 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)1.4%
Memory size2.4 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing54
Missing (%)18.6%
Memory size2.4 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing5
Missing (%)1.7%
Memory size2.4 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)1.4%
Memory size2.4 KiB

Missing values

2024-04-30T01:42:32.275022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:42:32.396889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:42:32.521405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

전자분야 시설물 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10
0<NA>NaN<NA>NaNNaNNaNNaNNaNNaNNaNNaN
1<NA>NaN<NA>NaNNaNNaNNaNNaNNaNNaN(2020. 4. 30. 기준)
2호선역번호역명개집표기발매기환급기판매기정산기발권기휴대용정산기유인충전기
31호선150서울역(1)6714725141
41호선151시청(1)688424NaN41
51호선152종각537525NaN41
61호선153종로3가(1)387624132
71호선154종로5가345423NaN41
81호선155동대문(1)374525NaN41
91호선156신설동(1)418614NaN41
전자분야 시설물 현황Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10
2818호선2823남한산성입구1521NaN0011
2828호선2824단대오거리1731NaN0011
2838호선2825신흥821NaN0011
2848호선2826수진1121NaN0011
2858호선2827모란1021NaN0011
2868호선 합계178호선 소계2474021NaN311817
287<NA>NaN<NA>NaNNaNNaNNaNNaNNaNNaNNaN
288<NA>NaN<NA>NaNNaNNaNNaNNaNNaNNaNNaN
289<NA>NaN<NA>1254620901302398846641156648
290총수량NaN<NA>6273104565119942332578324

Duplicate rows

Most frequently occurring

전자분야 시설물 현황Unnamed: 2# duplicates
0<NA><NA>5