Overview

Dataset statistics

Number of variables12
Number of observations285
Missing cells1967
Missing cells (%)57.5%
Duplicate rows6
Duplicate rows (%)2.1%
Total size in memory27.1 KiB
Average record size in memory97.5 B

Variable types

Text1
Numeric1
Unsupported10

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13116/S/1/datasetView.do

Alerts

Dataset has 6 (2.1%) duplicate rowsDuplicates
외부출입구 has 178 (62.5%) missing valuesMissing
Unnamed: 3 has 194 (68.1%) missing valuesMissing
외부 E/V has 56 (19.6%) missing valuesMissing
Unnamed: 5 has 72 (25.3%) missing valuesMissing
대합실 has 193 (67.7%) missing valuesMissing
Unnamed: 7 has 210 (73.7%) missing valuesMissing
내부 E/V has 254 (89.1%) missing valuesMissing
Unnamed: 9 has 270 (94.7%) missing valuesMissing
승강장 has 261 (91.6%) missing valuesMissing
Unnamed: 11 has 277 (97.2%) missing valuesMissing
외부출입구 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
외부 E/V is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported
대합실 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
내부 E/V is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
승강장 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported
has 12 (4.2%) zerosZeros

Reproduction

Analysis started2024-04-29 16:41:59.850788
Analysis finished2024-04-29 16:42:00.537764
Duration0.69 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Text

Distinct253
Distinct (%)89.1%
Missing1
Missing (%)0.4%
Memory size2.4 KiB
2024-04-30T01:42:00.768178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length2.9084507
Min length1

Characters and Unicode

Total characters826
Distinct characters219
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique224 ?
Unique (%)78.9%

Sample

1st row
2nd row1호선
3rd row2호선
4th row3호선
5th row4호선
ValueCountFrequency (%)
8
 
2.7%
동대문 3
 
1.0%
종로3가 3
 
1.0%
사당 2
 
0.7%
신당 2
 
0.7%
왕십리 2
 
0.7%
청구 2
 
0.7%
노원 2
 
0.7%
을지로3가 2
 
0.7%
영등포구청 2
 
0.7%
Other values (236) 263
90.4%
2024-04-30T01:42:01.166610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
30
 
3.6%
25
 
3.0%
25
 
3.0%
23
 
2.8%
20
 
2.4%
18
 
2.2%
17
 
2.1%
14
 
1.7%
14
 
1.7%
14
 
1.7%
Other values (209) 626
75.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 792
95.9%
Decimal Number 24
 
2.9%
Space Separator 7
 
0.8%
Uppercase Letter 3
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
30
 
3.8%
25
 
3.2%
25
 
3.2%
23
 
2.9%
20
 
2.5%
18
 
2.3%
17
 
2.1%
14
 
1.8%
14
 
1.8%
14
 
1.8%
Other values (197) 592
74.7%
Decimal Number
ValueCountFrequency (%)
3 7
29.2%
4 4
16.7%
5 3
12.5%
7 2
 
8.3%
2 2
 
8.3%
8 2
 
8.3%
6 2
 
8.3%
1 2
 
8.3%
Uppercase Letter
ValueCountFrequency (%)
C 1
33.3%
M 1
33.3%
D 1
33.3%
Space Separator
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 792
95.9%
Common 31
 
3.8%
Latin 3
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
30
 
3.8%
25
 
3.2%
25
 
3.2%
23
 
2.9%
20
 
2.5%
18
 
2.3%
17
 
2.1%
14
 
1.8%
14
 
1.8%
14
 
1.8%
Other values (197) 592
74.7%
Common
ValueCountFrequency (%)
3 7
22.6%
7
22.6%
4 4
12.9%
5 3
9.7%
7 2
 
6.5%
2 2
 
6.5%
8 2
 
6.5%
6 2
 
6.5%
1 2
 
6.5%
Latin
ValueCountFrequency (%)
C 1
33.3%
M 1
33.3%
D 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 792
95.9%
ASCII 34
 
4.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
30
 
3.8%
25
 
3.2%
25
 
3.2%
23
 
2.9%
20
 
2.5%
18
 
2.3%
17
 
2.1%
14
 
1.8%
14
 
1.8%
14
 
1.8%
Other values (197) 592
74.7%
ASCII
ValueCountFrequency (%)
3 7
20.6%
7
20.6%
4 4
11.8%
5 3
8.8%
7 2
 
5.9%
2 2
 
5.9%
8 2
 
5.9%
6 2
 
5.9%
1 2
 
5.9%
C 1
 
2.9%
Other values (2) 2
 
5.9%


Real number (ℝ)

ZEROS 

Distinct17
Distinct (%)6.0%
Missing1
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean7.028169
Minimum0
Maximum499
Zeros12
Zeros (%)4.2%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-30T01:42:01.275871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q33
95-th percentile27.25
Maximum499
Range499
Interquartile range (IQR)2

Descriptive statistics

Standard deviation33.431316
Coefficient of variation (CV)4.7567604
Kurtosis167.94043
Mean7.028169
Median Absolute Deviation (MAD)1
Skewness11.90828
Sum1996
Variance1117.6529
MonotonicityNot monotonic
2024-04-30T01:42:01.377391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 105
36.8%
2 94
33.0%
3 30
 
10.5%
4 18
 
6.3%
0 12
 
4.2%
5 5
 
1.8%
122 2
 
0.7%
44 2
 
0.7%
41 2
 
0.7%
70 2
 
0.7%
Other values (7) 12
 
4.2%
ValueCountFrequency (%)
0 12
 
4.2%
1 105
36.8%
2 94
33.0%
3 30
 
10.5%
4 18
 
6.3%
5 5
 
1.8%
6 2
 
0.7%
7 1
 
0.4%
23 2
 
0.7%
28 2
 
0.7%
ValueCountFrequency (%)
499 1
0.4%
122 2
0.7%
104 2
0.7%
70 2
0.7%
67 2
0.7%
44 2
0.7%
41 2
0.7%
28 2
0.7%
23 2
0.7%
7 1
0.4%

외부출입구
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing178
Missing (%)62.5%
Memory size2.4 KiB

Unnamed: 3
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing194
Missing (%)68.1%
Memory size2.4 KiB

외부 E/V
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing56
Missing (%)19.6%
Memory size2.4 KiB

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing72
Missing (%)25.3%
Memory size2.4 KiB

대합실
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing193
Missing (%)67.7%
Memory size2.4 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing210
Missing (%)73.7%
Memory size2.4 KiB

내부 E/V
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing254
Missing (%)89.1%
Memory size2.4 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing270
Missing (%)94.7%
Memory size2.4 KiB

승강장
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing261
Missing (%)91.6%
Memory size2.4 KiB

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing277
Missing (%)97.2%
Memory size2.4 KiB

Interactions

2024-04-30T01:41:59.969689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-04-30T01:42:00.076063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:42:00.234448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:42:00.418485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

호선외부출입구Unnamed: 3외부 E/VUnnamed: 5대합실Unnamed: 7내부 E/VUnnamed: 9승강장Unnamed: 11
0<NA><NA>설치수량위치설치수량위치설치수량위치설치수량위치설치수량위치
1499114NaN261NaN98NaN16NaN10NaN
21호선283NaN13NaN9NaN2NaN1NaN
32호선12220NaN47NaN42NaN10NaN3NaN
43호선447NaN31NaN6NaN0NaN0NaN
54호선417NaN22NaN8NaN4NaN0NaN
65호선7013NaN45NaN12NaN0NaN0NaN
76호선6724NaN34NaN3NaN0NaN6NaN
87호선10432NaN58NaN14NaN0NaN0NaN
98호선238NaN11NaN4NaN0NaN0NaN
호선외부출입구Unnamed: 3외부 E/VUnnamed: 5대합실Unnamed: 7내부 E/VUnnamed: 9승강장Unnamed: 11
275가락시장113번출구NaNNaNNaNNaNNaNNaNNaNNaN
276문정1NaNNaN12번출구NaNNaNNaNNaNNaNNaN
277장지2NaNNaN21,3번출구NaNNaNNaNNaNNaNNaN
278복정114번출구NaNNaNNaNNaNNaNNaNNaNNaN
279산성1NaNNaN13번출구NaNNaNNaNNaNNaNNaN
280남한산성입구214번출구NaNNaN1B1(출구통로)NaNNaNNaNNaN
281단대오거리111번출구NaNNaNNaNNaNNaNNaNNaNNaN
282신흥0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
283수진0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
284모란2112번출구111번출구NaNNaNNaNNaNNaNNaN

Duplicate rows

Most frequently occurring

호선# duplicates
0가락시장12
1군자12
2노원32
3불광22
4삼각지22
5약수12