Overview

Dataset statistics

Number of variables9
Number of observations2285
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory165.3 KiB
Average record size in memory74.1 B

Variable types

Numeric2
Text2
Categorical5

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-12926/F/1/datasetView.do

Alerts

기능 is highly overall correlated with 용도 and 2 other fieldsHigh correlation
필터유무 is highly overall correlated with 기능High correlation
용도 is highly overall correlated with 기능 and 1 other fieldsHigh correlation
위치 is highly overall correlated with 용도 and 1 other fieldsHigh correlation
용도 is highly imbalanced (57.9%)Imbalance
구조물형태 is highly imbalanced (56.9%)Imbalance
높이(m) has 26 (1.1%) zerosZeros

Reproduction

Analysis started2024-04-29 16:38:34.534917
Analysis finished2024-04-29 16:38:35.634549
Duration1.1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Real number (ℝ)

Distinct8
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5698031
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.2 KiB
2024-04-30T01:38:35.679428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q36
95-th percentile7
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.052306
Coefficient of variation (CV)0.44910163
Kurtosis-1.2458931
Mean4.5698031
Median Absolute Deviation (MAD)2
Skewness-0.09314761
Sum10442
Variance4.2119599
MonotonicityIncreasing
2024-04-30T01:38:35.768111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5 459
20.1%
7 442
19.3%
2 439
19.2%
6 273
11.9%
3 270
11.8%
4 178
 
7.8%
8 113
 
4.9%
1 111
 
4.9%
ValueCountFrequency (%)
1 111
 
4.9%
2 439
19.2%
3 270
11.8%
4 178
 
7.8%
5 459
20.1%
6 273
11.9%
7 442
19.3%
8 113
 
4.9%
ValueCountFrequency (%)
8 113
 
4.9%
7 442
19.3%
6 273
11.9%
5 459
20.1%
4 178
 
7.8%
3 270
11.8%
2 439
19.2%
1 111
 
4.9%

번호
Text

Distinct527
Distinct (%)23.1%
Missing1
Missing (%)< 0.1%
Memory size18.0 KiB
2024-04-30T01:38:36.104376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length2.6908932
Min length1

Characters and Unicode

Total characters6146
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)4.2%

Sample

1st row3
2nd row4
3rd row5
4th row6
5th row7
ValueCountFrequency (%)
3 8
 
0.4%
56 8
 
0.4%
92 8
 
0.4%
81 8
 
0.4%
79 8
 
0.4%
78 8
 
0.4%
77 8
 
0.4%
76 8
 
0.4%
75 8
 
0.4%
74 8
 
0.4%
Other values (517) 2204
96.5%
2024-04-30T01:38:36.620981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1150
18.7%
2 916
14.9%
3 781
12.7%
4 581
9.5%
5 465
7.6%
6 460
 
7.5%
7 448
 
7.3%
0 436
 
7.1%
9 425
 
6.9%
8 415
 
6.8%
Other values (2) 69
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6077
98.9%
Dash Punctuation 67
 
1.1%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 1150
18.9%
2 916
15.1%
3 781
12.9%
4 581
9.6%
5 465
7.7%
6 460
 
7.6%
7 448
 
7.4%
0 436
 
7.2%
9 425
 
7.0%
8 415
 
6.8%
Dash Punctuation
ValueCountFrequency (%)
- 67
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 6146
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 1150
18.7%
2 916
14.9%
3 781
12.7%
4 581
9.5%
5 465
7.6%
6 460
 
7.5%
7 448
 
7.3%
0 436
 
7.1%
9 425
 
6.9%
8 415
 
6.8%
Other values (2) 69
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 6146
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1150
18.7%
2 916
14.9%
3 781
12.7%
4 581
9.5%
5 465
7.6%
6 460
 
7.5%
7 448
 
7.3%
0 436
 
7.1%
9 425
 
6.9%
8 415
 
6.8%
Other values (2) 69
 
1.1%

구간
Text

Distinct502
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
2024-04-30T01:38:36.920791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length12
Mean length4.5089716
Min length2

Characters and Unicode

Total characters10303
Distinct characters216
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.8%

Sample

1st row서울역
2nd row서울역
3rd row서울역
4th row서울역
5th row서울역~시청
ValueCountFrequency (%)
종로3가 20
 
0.9%
영등포구청 18
 
0.8%
충정로 17
 
0.7%
사당 14
 
0.6%
신설동~용두 14
 
0.6%
시청 13
 
0.6%
합정 12
 
0.5%
삼각지 12
 
0.5%
신설동 12
 
0.5%
잠원~고속터미널 12
 
0.5%
Other values (492) 2141
93.7%
2024-04-30T01:38:37.589452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
~ 857
 
8.3%
378
 
3.7%
329
 
3.2%
317
 
3.1%
252
 
2.4%
215
 
2.1%
177
 
1.7%
169
 
1.6%
164
 
1.6%
148
 
1.4%
Other values (206) 7297
70.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9196
89.3%
Math Symbol 970
 
9.4%
Decimal Number 100
 
1.0%
Dash Punctuation 27
 
0.3%
Uppercase Letter 6
 
0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
378
 
4.1%
329
 
3.6%
317
 
3.4%
252
 
2.7%
215
 
2.3%
177
 
1.9%
169
 
1.8%
164
 
1.8%
148
 
1.6%
145
 
1.6%
Other values (197) 6902
75.1%
Decimal Number
ValueCountFrequency (%)
3 57
57.0%
4 24
24.0%
5 19
 
19.0%
Math Symbol
ValueCountFrequency (%)
~ 857
88.4%
113
 
11.6%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Uppercase Letter
ValueCountFrequency (%)
U 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9196
89.3%
Common 1101
 
10.7%
Latin 6
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
378
 
4.1%
329
 
3.6%
317
 
3.4%
252
 
2.7%
215
 
2.3%
177
 
1.9%
169
 
1.8%
164
 
1.8%
148
 
1.6%
145
 
1.6%
Other values (197) 6902
75.1%
Common
ValueCountFrequency (%)
~ 857
77.8%
113
 
10.3%
3 57
 
5.2%
- 27
 
2.5%
4 24
 
2.2%
5 19
 
1.7%
( 2
 
0.2%
) 2
 
0.2%
Latin
ValueCountFrequency (%)
U 6
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9196
89.3%
ASCII 994
 
9.6%
Math Operators 113
 
1.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
~ 857
86.2%
3 57
 
5.7%
- 27
 
2.7%
4 24
 
2.4%
5 19
 
1.9%
U 6
 
0.6%
( 2
 
0.2%
) 2
 
0.2%
Hangul
ValueCountFrequency (%)
378
 
4.1%
329
 
3.6%
317
 
3.4%
252
 
2.7%
215
 
2.3%
177
 
1.9%
169
 
1.8%
164
 
1.8%
148
 
1.6%
145
 
1.6%
Other values (197) 6902
75.1%
Math Operators
ValueCountFrequency (%)
113
100.0%

용도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
역사
1229 
본선
990 
역사(변전실)
 
53
본선(변전실)
 
7
본선(유치선)
 
4
Other values (2)
 
2

Length

Max length7
Median length2
Mean length2.1417943
Min length2

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row역사
2nd row역사
3rd row역사
4th row역사
5th row본선

Common Values

ValueCountFrequency (%)
역사 1229
53.8%
본선 990
43.3%
역사(변전실) 53
 
2.3%
본선(변전실) 7
 
0.3%
본선(유치선) 4
 
0.2%
본선(폐) 1
 
< 0.1%
출고선 1
 
< 0.1%

Length

2024-04-30T01:38:37.732376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:38:37.834211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
역사 1229
53.8%
본선 990
43.3%
역사(변전실 53
 
2.3%
본선(변전실 7
 
0.3%
본선(유치선 4
 
0.2%
본선(폐 1
 
< 0.1%
출고선 1
 
< 0.1%

기능
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
배기
1058 
급기
913 
자연
313 
배기(폐)
 
1

Length

Max length5
Median length2
Mean length2.0013129
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row급기
2nd row배기
3rd row배기
4th row급기
5th row자연

Common Values

ValueCountFrequency (%)
배기 1058
46.3%
급기 913
40.0%
자연 313
 
13.7%
배기(폐) 1
 
< 0.1%

Length

2024-04-30T01:38:37.970744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:38:38.064348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
배기 1058
46.3%
급기 913
40.0%
자연 313
 
13.7%
배기(폐 1
 
< 0.1%

위치
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
보도
1696 
녹지
332 
차도
171 
기타
 
85
기타(폐)
 
1

Length

Max length5
Median length2
Mean length2.0013129
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row기타
2nd row기타
3rd row녹지
4th row녹지
5th row보도

Common Values

ValueCountFrequency (%)
보도 1696
74.2%
녹지 332
 
14.5%
차도 171
 
7.5%
기타 85
 
3.7%
기타(폐) 1
 
< 0.1%

Length

2024-04-30T01:38:38.170665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:38:38.263742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
보도 1696
74.2%
녹지 332
 
14.5%
차도 171
 
7.5%
기타 85
 
3.7%
기타(폐 1
 
< 0.1%

높이(m)
Real number (ℝ)

ZEROS 

Distinct282
Distinct (%)12.4%
Missing2
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean1.2554446
Minimum0
Maximum13
Zeros26
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size20.2 KiB
2024-04-30T01:38:38.375521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.15
Q10.6
median1.3
Q31.66
95-th percentile2.279
Maximum13
Range13
Interquartile range (IQR)1.06

Descriptive statistics

Standard deviation0.8279135
Coefficient of variation (CV)0.65945842
Kurtosis25.670949
Mean1.2554446
Median Absolute Deviation (MAD)0.52
Skewness2.8228095
Sum2866.18
Variance0.68544076
MonotonicityNot monotonic
2024-04-30T01:38:38.539562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.6 140
 
6.1%
1.5 106
 
4.6%
1.6 75
 
3.3%
1.2 56
 
2.5%
2.0 53
 
2.3%
1.4 51
 
2.2%
1.7 50
 
2.2%
1.3 48
 
2.1%
0.7 45
 
2.0%
0.5 38
 
1.7%
Other values (272) 1621
70.9%
ValueCountFrequency (%)
0.0 26
1.1%
0.02 2
 
0.1%
0.03 1
 
< 0.1%
0.04 4
 
0.2%
0.05 2
 
0.1%
0.06 1
 
< 0.1%
0.07 3
 
0.1%
0.08 8
 
0.4%
0.09 3
 
0.1%
0.1 30
1.3%
ValueCountFrequency (%)
13.0 1
 
< 0.1%
7.6 4
0.2%
7.0 1
 
< 0.1%
5.8 1
 
< 0.1%
5.5 1
 
< 0.1%
5.0 5
0.2%
4.89 1
 
< 0.1%
4.4 1
 
< 0.1%
4.36 1
 
< 0.1%
4.3 2
 
0.1%

구조물형태
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
탑형
2083 
지면형
 
202

Length

Max length3
Median length2
Mean length2.0884026
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row탑형
2nd row탑형
3rd row지면형
4th row탑형
5th row탑형

Common Values

ValueCountFrequency (%)
탑형 2083
91.2%
지면형 202
 
8.8%

Length

2024-04-30T01:38:38.718226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:38:38.827909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
탑형 2083
91.2%
지면형 202
 
8.8%

필터유무
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size18.0 KiB
1372 
913 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
1372
60.0%
913
40.0%

Length

2024-04-30T01:38:38.952727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T01:38:39.037772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1372
60.0%
913
40.0%

Interactions

2024-04-30T01:38:35.143085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:38:34.967266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:38:35.233085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:38:35.060656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T01:38:39.102766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선용도기능위치높이(m)구조물형태필터유무
호선1.0000.1940.6140.3190.2490.3310.250
용도0.1941.0000.7530.6650.2940.0920.227
기능0.6140.7531.0000.6650.2890.1321.000
위치0.3190.6650.6651.0000.1990.0940.080
높이(m)0.2490.2940.2890.1991.0000.2930.301
구조물형태0.3310.0920.1320.0940.2931.0000.140
필터유무0.2500.2271.0000.0800.3010.1401.000
2024-04-30T01:38:39.224002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기능위치구조물형태필터유무용도
기능1.0000.5960.0871.0000.633
위치0.5961.0000.1150.0980.508
구조물형태0.0870.1151.0000.0900.098
필터유무1.0000.0980.0901.0000.242
용도0.6330.5080.0980.2421.000
2024-04-30T01:38:39.318963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선높이(m)용도기능위치구조물형태필터유무
호선1.0000.2270.1050.3120.2020.2480.187
높이(m)0.2271.0000.1060.1930.1260.3000.310
용도0.1050.1061.0000.6330.5080.0980.242
기능0.3120.1930.6331.0000.5960.0871.000
위치0.2020.1260.5080.5961.0000.1150.098
구조물형태0.2480.3000.0980.0870.1151.0000.090
필터유무0.1870.3100.2421.0000.0980.0901.000

Missing values

2024-04-30T01:38:35.354149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:38:35.480275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-30T01:38:35.590564image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

호선번호구간용도기능위치높이(m)구조물형태필터유무
013서울역역사급기기타0.48탑형
114서울역역사배기기타0.5탑형
215서울역역사배기녹지0.24지면형
316서울역역사급기녹지0.55탑형
417서울역~시청본선자연보도0.64탑형
518서울역~시청본선자연보도1.74탑형
619서울역~시청본선자연보도0.67탑형
7110서울역~시청본선자연녹지1.1탑형
8111서울역~시청본선자연보도1.53탑형
9112서울역~시청본선자연차도0.02지면형
호선번호구간용도기능위치높이(m)구조물형태필터유무
22758104강동구청∼토성본선배기녹지0.6탑형
22768105강동구청∼토성본선급기녹지1.14탑형
22778106강동구청∼토성본선배기보도0.74탑형
22788107몽촌토성역사배기녹지0.59탑형
22798108몽촌토성역사급기녹지1.62탑형
22808109몽촌토성역사배기보도0.6탑형
22818110몽촌토성역사급기보도1.55탑형
22828111몽촌토성∼잠실본선배기보도0.63탑형
22838026-1가락시장역사자연보도1.76탑형
22848026-2가락시장역사자연보도1.76탑형