Overview

Dataset statistics

Number of variables15
Number of observations27
Missing cells113
Missing cells (%)27.9%
Duplicate rows3
Duplicate rows (%)11.1%
Total size in memory3.3 KiB
Average record size in memory124.9 B

Variable types

Text3
Categorical1
Unsupported11

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13231/F/1/datasetView.do

Alerts

Dataset has 3 (11.1%) duplicate rowsDuplicates
시 설 명 has 8 (29.6%) missing valuesMissing
Unnamed: 1 has 21 (77.8%) missing valuesMissing
Unnamed: 2 has 24 (88.9%) missing valuesMissing
has 7 (25.9%) missing valuesMissing
1~4호선 has 6 (22.2%) missing valuesMissing
Unnamed: 6 has 6 (22.2%) missing valuesMissing
Unnamed: 7 has 6 (22.2%) missing valuesMissing
Unnamed: 8 has 5 (18.5%) missing valuesMissing
Unnamed: 9 has 6 (22.2%) missing valuesMissing
5~8호선 has 6 (22.2%) missing valuesMissing
Unnamed: 11 has 6 (22.2%) missing valuesMissing
Unnamed: 12 has 6 (22.2%) missing valuesMissing
Unnamed: 13 has 2 (7.4%) missing valuesMissing
Unnamed: 14 has 4 (14.8%) missing valuesMissing
is an unsupported type, check if it needs cleaning or further analysisUnsupported
1~4호선 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
5~8호선 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 12 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 13 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 14 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 16:47:17.075716
Analysis finished2024-04-29 16:47:19.007162
Duration1.93 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시 설 명
Text

MISSING 

Distinct18
Distinct (%)94.7%
Missing8
Missing (%)29.6%
Memory size348.0 B
2024-04-30T01:47:19.104154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length6
Mean length4
Min length2

Characters and Unicode

Total characters76
Distinct characters46
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)89.5%

Sample

1st row궤도연장
2nd row(본선/측선)
3rd row곡선연장
4th row(R<1200)
5th row최소
ValueCountFrequency (%)
본선/측선 2
 
10.5%
콘크리트도상 1
 
5.3%
b2s 1
 
5.3%
신축이음매 1
 
5.3%
차량기지 1
 
5.3%
장비유치선 1
 
5.3%
구간 1
 
5.3%
장치 1
 
5.3%
체결 1
 
5.3%
방진 1
 
5.3%
Other values (8) 8
42.1%
2024-04-30T01:47:19.361815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7
 
9.2%
6
 
7.9%
4
 
5.3%
) 3
 
3.9%
3
 
3.9%
( 3
 
3.9%
2 2
 
2.6%
2
 
2.6%
2
 
2.6%
2
 
2.6%
Other values (36) 42
55.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59
77.6%
Decimal Number 5
 
6.6%
Close Punctuation 3
 
3.9%
Open Punctuation 3
 
3.9%
Uppercase Letter 3
 
3.9%
Other Punctuation 2
 
2.6%
Math Symbol 1
 
1.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7
 
11.9%
6
 
10.2%
4
 
6.8%
3
 
5.1%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
Other values (26) 27
45.8%
Decimal Number
ValueCountFrequency (%)
2 2
40.0%
0 2
40.0%
1 1
20.0%
Uppercase Letter
ValueCountFrequency (%)
B 1
33.3%
R 1
33.3%
S 1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%
Math Symbol
ValueCountFrequency (%)
< 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59
77.6%
Common 14
 
18.4%
Latin 3
 
3.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7
 
11.9%
6
 
10.2%
4
 
6.8%
3
 
5.1%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
Other values (26) 27
45.8%
Common
ValueCountFrequency (%)
) 3
21.4%
( 3
21.4%
2 2
14.3%
0 2
14.3%
/ 2
14.3%
1 1
 
7.1%
< 1
 
7.1%
Latin
ValueCountFrequency (%)
B 1
33.3%
R 1
33.3%
S 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59
77.6%
ASCII 17
 
22.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
7
 
11.9%
6
 
10.2%
4
 
6.8%
3
 
5.1%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
2
 
3.4%
Other values (26) 27
45.8%
ASCII
ValueCountFrequency (%)
) 3
17.6%
( 3
17.6%
2 2
11.8%
0 2
11.8%
/ 2
11.8%
B 1
 
5.9%
1 1
 
5.9%
< 1
 
5.9%
R 1
 
5.9%
S 1
 
5.9%

Unnamed: 1
Text

MISSING 

Distinct4
Distinct (%)66.7%
Missing21
Missing (%)77.8%
Memory size348.0 B
2024-04-30T01:47:19.515065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length5.5
Min length3

Characters and Unicode

Total characters33
Distinct characters16
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)33.3%

Sample

1st row곡선반경 (R)
2nd row연장 (m)
3rd row구 간
4th row기울기 (‰)
5th row연장 (m)
ValueCountFrequency (%)
연장 2
16.7%
m 2
16.7%
2
16.7%
2
16.7%
곡선반경 1
8.3%
r 1
8.3%
기울기 1
8.3%
1
8.3%
2024-04-30T01:47:19.791093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6
18.2%
( 4
12.1%
) 4
12.1%
2
 
6.1%
2
 
6.1%
m 2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
1
 
3.0%
Other values (6) 6
18.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15
45.5%
Space Separator 6
 
18.2%
Open Punctuation 4
 
12.1%
Close Punctuation 4
 
12.1%
Lowercase Letter 2
 
6.1%
Uppercase Letter 1
 
3.0%
Other Punctuation 1
 
3.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2
13.3%
2
13.3%
2
13.3%
2
13.3%
2
13.3%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
Space Separator
ValueCountFrequency (%)
6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Lowercase Letter
ValueCountFrequency (%)
m 2
100.0%
Uppercase Letter
ValueCountFrequency (%)
R 1
100.0%
Other Punctuation
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 15
45.5%
Hangul 15
45.5%
Latin 3
 
9.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2
13.3%
2
13.3%
2
13.3%
2
13.3%
2
13.3%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
Common
ValueCountFrequency (%)
6
40.0%
( 4
26.7%
) 4
26.7%
1
 
6.7%
Latin
ValueCountFrequency (%)
m 2
66.7%
R 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 17
51.5%
Hangul 15
45.5%
Punctuation 1
 
3.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6
35.3%
( 4
23.5%
) 4
23.5%
m 2
 
11.8%
R 1
 
5.9%
Hangul
ValueCountFrequency (%)
2
13.3%
2
13.3%
2
13.3%
2
13.3%
2
13.3%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
1
6.7%
Punctuation
ValueCountFrequency (%)
1
100.0%

Unnamed: 2
Text

MISSING 

Distinct3
Distinct (%)100.0%
Missing24
Missing (%)88.9%
Memory size348.0 B
2024-04-30T01:47:19.919823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters15
Distinct characters10
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)100.0%

Sample

1st rowAlt-Ⅰ
2nd rowAlt-Ⅱ
3rd rowDFF14
ValueCountFrequency (%)
alt-ⅰ 1
33.3%
alt-ⅱ 1
33.3%
dff14 1
33.3%
2024-04-30T01:47:20.167929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
A 2
13.3%
l 2
13.3%
t 2
13.3%
- 2
13.3%
F 2
13.3%
1
6.7%
1
6.7%
D 1
6.7%
1 1
6.7%
4 1
6.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 5
33.3%
Lowercase Letter 4
26.7%
Dash Punctuation 2
 
13.3%
Letter Number 2
 
13.3%
Decimal Number 2
 
13.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 2
40.0%
F 2
40.0%
D 1
20.0%
Lowercase Letter
ValueCountFrequency (%)
l 2
50.0%
t 2
50.0%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Decimal Number
ValueCountFrequency (%)
1 1
50.0%
4 1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11
73.3%
Common 4
 
26.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 2
18.2%
l 2
18.2%
t 2
18.2%
F 2
18.2%
1
9.1%
1
9.1%
D 1
9.1%
Common
ValueCountFrequency (%)
- 2
50.0%
1 1
25.0%
4 1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13
86.7%
Number Forms 2
 
13.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A 2
15.4%
l 2
15.4%
t 2
15.4%
- 2
15.4%
F 2
15.4%
D 1
7.7%
1 1
7.7%
4 1
7.7%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%

단위
Categorical

Distinct9
Distinct (%)33.3%
Missing0
Missing (%)0.0%
Memory size348.0 B
<NA>
km
m
개소
Other values (4)

Length

Max length4
Median length2
Mean length2.2592593
Min length1

Unique

Unique3 ?
Unique (%)11.1%

Sample

1st row<NA>
2nd rowkm
3rd row<NA>
4th rowkm
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9
33.3%
km 4
14.8%
m 3
 
11.1%
3
 
11.1%
개소 3
 
11.1%
- 2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%

Length

2024-04-30T01:47:20.284202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:47:20.420689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9
33.3%
km 4
14.8%
m 3
 
11.1%
3
 
11.1%
개소 3
 
11.1%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%


Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing7
Missing (%)25.9%
Memory size348.0 B

1~4호선
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing5
Missing (%)18.5%
Memory size348.0 B

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

5~8호선
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 12
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)22.2%
Memory size348.0 B

Unnamed: 13
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2
Missing (%)7.4%
Memory size348.0 B

Unnamed: 14
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing4
Missing (%)14.8%
Memory size348.0 B

Correlations

2024-04-30T01:47:20.534601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시 설 명Unnamed: 1Unnamed: 2단위
시 설 명1.0001.0001.0001.000
Unnamed: 11.0001.000NaN1.000
Unnamed: 21.000NaN1.000NaN
단위1.0001.000NaN1.000

Missing values

2024-04-30T01:47:18.469949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:47:18.673206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:47:18.848433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시 설 명Unnamed: 1Unnamed: 2단위1~4호선Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 95~8호선Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
0<NA><NA><NA><NA>NaN소계1호선2호선3호선4호선소계5호선6호선7호선8호선
1궤도연장<NA><NA>km618.098284.12918.831122.71177.98364.604333.969109.39967.435118.09139.044
2(본선/측선)<NA><NA><NA>/225.258/120.567/1.043/51.806/45.398/22.320/104.691/35.097/20.489/37.980/11.125
3곡선연장<NA><NA>km254.922124.259.76949.56535.87229.044130.67246.1831.10738.24215.143
4(R<1200)<NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
5최소곡선반경 (R)<NA>m--136200199180-246269246248
6곡선연장 (m)<NA>m--318311434180-438311278524
7<NA>구 간<NA>---시청~종각서초~방배외1개소안국~종로당고개~상계-방이~오금응암~역촌장승산성~단대
8<NA><NA><NA><NA>NaNNaNNaNNaN3가NaNNaNNaNNaN배기~오거리
9<NA><NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaN신대방NaN
시 설 명Unnamed: 1Unnamed: 2단위1~4호선Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 95~8호선Unnamed: 11Unnamed: 12Unnamed: 13Unnamed: 14
17콘크리트도상<NA><NA>km471.6146.714.563.339.429.4324.9109.465.8112.936.9
18B2S<NA><NA>km53.553.55.820.713.613.4-----
19방진<NA>Alt-Ⅰ50510267324086113446337496523778-1433675821860
20체결<NA><NA><NA>NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
21장치<NA>Alt-Ⅱ4332043320343717917142057761-----
22구간<NA>DFF141194111941-11235-706-----
23장비유치선<NA><NA>개소288-422207364
24차량기지<NA><NA>개소115-22162121
25신축이음매<NA><NA>개소40937521152981043442244
26도유기<NA><NA>25313165233411216815326

Duplicate rows

Most frequently occurring

시 설 명Unnamed: 1Unnamed: 2단위# duplicates
2<NA><NA><NA><NA>5
0(본선/측선)<NA><NA><NA>2
1<NA>구 간<NA>-2