Overview

Dataset statistics

Number of variables9
Number of observations277
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.4 KiB
Average record size in memory75.5 B

Variable types

Numeric3
Text1
Categorical4
DateTime1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13293/F/1/datasetView.do

Alerts

연번 is highly overall correlated with 호선 and 2 other fieldsHigh correlation
호선 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
안전문 수량 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
개폐방식 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
설치업체 is highly overall correlated with 호선 and 1 other fieldsHigh correlation
비고 is highly overall correlated with 안전문 수량High correlation
사업방식 is highly imbalanced (67.1%)Imbalance
비고 is highly imbalanced (71.9%)Imbalance
연번 has unique valuesUnique

Reproduction

Analysis started2024-04-17 04:25:52.823790
Analysis finished2024-04-17 04:25:53.982928
Duration1.16 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct277
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139
Minimum1
Maximum277
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:54.035654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14.8
Q170
median139
Q3208
95-th percentile263.2
Maximum277
Range276
Interquartile range (IQR)138

Descriptive statistics

Standard deviation80.10722
Coefficient of variation (CV)0.57631093
Kurtosis-1.2
Mean139
Median Absolute Deviation (MAD)69
Skewness0
Sum38503
Variance6417.1667
MonotonicityStrictly increasing
2024-04-17T13:25:54.139028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.4%
184 1
 
0.4%
190 1
 
0.4%
189 1
 
0.4%
188 1
 
0.4%
187 1
 
0.4%
186 1
 
0.4%
185 1
 
0.4%
183 1
 
0.4%
175 1
 
0.4%
Other values (267) 267
96.4%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
277 1
0.4%
276 1
0.4%
275 1
0.4%
274 1
0.4%
273 1
0.4%
272 1
0.4%
271 1
0.4%
270 1
0.4%
269 1
0.4%
268 1
0.4%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.5920578
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:54.230133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q36
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.0080477
Coefficient of variation (CV)0.43728712
Kurtosis-1.1589192
Mean4.5920578
Median Absolute Deviation (MAD)2
Skewness-0.045160859
Sum1272
Variance4.0322555
MonotonicityIncreasing
2024-04-17T13:25:54.310796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5 56
20.2%
2 52
18.8%
7 42
15.2%
6 39
14.1%
3 34
12.3%
4 26
9.4%
8 18
 
6.5%
1 10
 
3.6%
ValueCountFrequency (%)
1 10
 
3.6%
2 52
18.8%
3 34
12.3%
4 26
9.4%
5 56
20.2%
6 39
14.1%
7 42
15.2%
8 18
 
6.5%
ValueCountFrequency (%)
8 18
 
6.5%
7 42
15.2%
6 39
14.1%
5 56
20.2%
4 26
9.4%
3 34
12.3%
2 52
18.8%
1 10
 
3.6%
Distinct243
Distinct (%)87.7%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2024-04-17T13:25:54.576733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.9458484
Min length2

Characters and Unicode

Total characters816
Distinct characters210
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique211 ?
Unique (%)76.2%

Sample

1st row서울
2nd row시청
3rd row종각
4th row종로3가
5th row종로5가
ValueCountFrequency (%)
종로3가 3
 
1.1%
동대문역사문화공원 3
 
1.1%
충무로 2
 
0.7%
노원 2
 
0.7%
천호 2
 
0.7%
사당 2
 
0.7%
교대 2
 
0.7%
고속터미널 2
 
0.7%
삼각지 2
 
0.7%
가락시장 2
 
0.7%
Other values (233) 255
92.1%
2024-04-17T13:25:54.987545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
 
3.9%
27
 
3.3%
25
 
3.1%
22
 
2.7%
20
 
2.5%
16
 
2.0%
15
 
1.8%
15
 
1.8%
14
 
1.7%
14
 
1.7%
Other values (200) 616
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 802
98.3%
Decimal Number 8
 
1.0%
Close Punctuation 3
 
0.4%
Open Punctuation 3
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
4.0%
27
 
3.4%
25
 
3.1%
22
 
2.7%
20
 
2.5%
16
 
2.0%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (195) 602
75.1%
Decimal Number
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 802
98.3%
Common 14
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
4.0%
27
 
3.4%
25
 
3.1%
22
 
2.7%
20
 
2.5%
16
 
2.0%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (195) 602
75.1%
Common
ValueCountFrequency (%)
3 5
35.7%
) 3
21.4%
( 3
21.4%
4 2
 
14.3%
5 1
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 802
98.3%
ASCII 14
 
1.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
32
 
4.0%
27
 
3.4%
25
 
3.1%
22
 
2.7%
20
 
2.5%
16
 
2.0%
15
 
1.9%
15
 
1.9%
14
 
1.7%
14
 
1.7%
Other values (195) 602
75.1%
ASCII
ValueCountFrequency (%)
3 5
35.7%
) 3
21.4%
( 3
21.4%
4 2
 
14.3%
5 1
 
7.1%

안전문 수량
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.353791
Minimum24
Maximum160
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:55.090053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile48
Q164
median64
Q380
95-th percentile80
Maximum160
Range136
Interquartile range (IQR)16

Descriptive statistics

Standard deviation17.006534
Coefficient of variation (CV)0.24172875
Kurtosis5.1158302
Mean70.353791
Median Absolute Deviation (MAD)16
Skewness1.0011642
Sum19488
Variance289.2222
MonotonicityNot monotonic
2024-04-17T13:25:55.176647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
64 120
43.3%
80 112
40.4%
48 21
 
7.6%
32 11
 
4.0%
128 7
 
2.5%
96 3
 
1.1%
160 1
 
0.4%
24 1
 
0.4%
120 1
 
0.4%
ValueCountFrequency (%)
24 1
 
0.4%
32 11
 
4.0%
48 21
 
7.6%
64 120
43.3%
80 112
40.4%
96 3
 
1.1%
120 1
 
0.4%
128 7
 
2.5%
160 1
 
0.4%
ValueCountFrequency (%)
160 1
 
0.4%
128 7
 
2.5%
120 1
 
0.4%
96 3
 
1.1%
80 112
40.4%
64 120
43.3%
48 21
 
7.6%
32 11
 
4.0%
24 1
 
0.4%

개폐방식
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
ATO+RF 방식
136 
RF+센서 방식
62 
센서 방식
60 
ATO+RF+센서 방식
19 

Length

Max length12
Median length9
Mean length8.1155235
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRF+센서 방식
2nd rowRF+센서 방식
3rd rowRF+센서 방식
4th rowRF+센서 방식
5th rowRF+센서 방식

Common Values

ValueCountFrequency (%)
ATO+RF 방식 136
49.1%
RF+센서 방식 62
22.4%
센서 방식 60
21.7%
ATO+RF+센서 방식 19
 
6.9%

Length

2024-04-17T13:25:55.285529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:55.372595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
방식 277
50.0%
ato+rf 136
24.5%
rf+센서 62
 
11.2%
센서 60
 
10.8%
ato+rf+센서 19
 
3.4%

사업방식
Categorical

IMBALANCE 

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
자체예산
244 
민자유치
 
24
위수탁
 
6
서울시(신설역)
 
3

Length

Max length8
Median length4
Mean length4.0216606
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row민자유치
2nd row민자유치
3rd row자체예산
4th row민자유치
5th row자체예산

Common Values

ValueCountFrequency (%)
자체예산 244
88.1%
민자유치 24
 
8.7%
위수탁 6
 
2.2%
서울시(신설역) 3
 
1.1%

Length

2024-04-17T13:25:55.456882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:55.530500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자체예산 244
88.1%
민자유치 24
 
8.7%
위수탁 6
 
2.2%
서울시(신설역 3
 
1.1%
Distinct46
Distinct (%)16.6%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
Minimum2005-11-04 00:00:00
Maximum2021-12-28 00:00:00
2024-04-17T13:25:55.619015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:55.733857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)

설치업체
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
현대엘리베이터
105 
GS네오텍
42 
삼중테크
34 
삼성SDS
27 
도철PSD
20 
Other values (7)
49 

Length

Max length7
Median length6
Mean length5.5956679
Min length3

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row현대엘리베이터
2nd row현대엘리베이터
3rd row현대엘리베이터
4th row현대엘리베이터
5th row현대엘리베이터

Common Values

ValueCountFrequency (%)
현대엘리베이터 105
37.9%
GS네오텍 42
 
15.2%
삼중테크 34
 
12.3%
삼성SDS 27
 
9.7%
도철PSD 20
 
7.2%
포스콘 20
 
7.2%
피에쓰에쓰텍 14
 
5.1%
㈜포스코ICT 8
 
2.9%
서윤산업 3
 
1.1%
현대무벡스 2
 
0.7%
Other values (2) 2
 
0.7%

Length

2024-04-17T13:25:55.834737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
현대엘리베이터 105
37.9%
gs네오텍 42
 
15.2%
삼중테크 34
 
12.3%
삼성sds 27
 
9.7%
도철psd 20
 
7.2%
포스콘 20
 
7.2%
피에쓰에쓰텍 14
 
5.1%
㈜포스코ict 8
 
2.9%
서윤산업 3
 
1.1%
현대무벡스 2
 
0.7%
Other values (2) 2
 
0.7%

비고
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct9
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
<NA>
242 
재시공역
 
8
4트랙
 
7
1트랙
 
7
성수지선 4량
 
4
Other values (4)
 
9

Length

Max length12
Median length4
Mean length4.0577617
Min length3

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 242
87.4%
재시공역 8
 
2.9%
4트랙 7
 
2.5%
1트랙 7
 
2.5%
성수지선 4량 4
 
1.4%
3트랙 4
 
1.4%
신정지선 6량 3
 
1.1%
재시공역(4트랙) 1
 
0.4%
신정지선 6량(1트랙) 1
 
0.4%

Length

2024-04-17T13:25:55.929302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:56.009835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 242
84.9%
재시공역 8
 
2.8%
4트랙 7
 
2.5%
1트랙 7
 
2.5%
성수지선 4
 
1.4%
4량 4
 
1.4%
3트랙 4
 
1.4%
신정지선 4
 
1.4%
6량 3
 
1.1%
재시공역(4트랙 1
 
0.4%

Interactions

2024-04-17T13:25:53.600325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.189793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.390947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.666391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.252565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.457336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.747599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.320681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:53.525527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T13:25:56.317162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선안전문 수량개폐방식사업방식준공일설치업체비고
연번1.0000.9170.7210.8820.4860.8750.7840.744
호선0.9171.0000.7630.9820.6070.9270.8650.658
안전문 수량0.7210.7631.0000.7340.2620.7620.7380.940
개폐방식0.8820.9820.7341.0000.6360.8620.8290.826
사업방식0.4860.6070.2620.6361.0000.9470.6510.000
준공일0.8750.9270.7620.8620.9471.0000.9950.899
설치업체0.7840.8650.7380.8290.6510.9951.0000.704
비고0.7440.6580.9400.8260.0000.8990.7041.000
2024-04-17T13:25:56.403739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
개폐방식비고사업방식설치업체
개폐방식1.0000.4580.2940.515
비고0.4581.0000.0000.426
사업방식0.2940.0001.0000.348
설치업체0.5150.4260.3481.000
2024-04-17T13:25:56.489907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선안전문 수량개폐방식사업방식설치업체비고
연번1.0000.988-0.6280.7420.3080.4770.452
호선0.9881.000-0.6440.8120.3050.5870.451
안전문 수량-0.628-0.6441.0000.6050.1810.4740.832
개폐방식0.7420.8120.6051.0000.2940.5150.458
사업방식0.3080.3050.1810.2941.0000.3480.000
설치업체0.4770.5870.4740.5150.3481.0000.426
비고0.4520.4510.8320.4580.0000.4261.000

Missing values

2024-04-17T13:25:53.843376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T13:25:53.943615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선역사명안전문 수량개폐방식사업방식준공일설치업체비고
011서울80RF+센서 방식민자유치2007-11-01현대엘리베이터<NA>
121시청80RF+센서 방식민자유치2007-12-03현대엘리베이터<NA>
231종각80RF+센서 방식자체예산2009-12-29현대엘리베이터<NA>
341종로3가80RF+센서 방식민자유치2008-01-03현대엘리베이터<NA>
451종로5가80RF+센서 방식자체예산2009-12-29현대엘리베이터<NA>
561동대문80RF+센서 방식자체예산2008-06-18서윤산업<NA>
671동묘앞80RF+센서 방식자체예산2006-01-10현대엘리베이터<NA>
781신설동80RF+센서 방식자체예산2009-12-29현대엘리베이터<NA>
891제기동80RF+센서 방식자체예산2009-12-29현대엘리베이터<NA>
9101청량리80RF+센서 방식자체예산2009-12-29현대엘리베이터<NA>
연번호선역사명안전문 수량개폐방식사업방식준공일설치업체비고
2672688문정48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2682698장지48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2692708복정48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2702718남위례48ATO+RF 방식위수탁2021-12-28현대무벡스<NA>
2712728산성48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2722738남한산성입구48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2732748단대오거리48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2742758신흥48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2752768수진48ATO+RF 방식자체예산2009-12-30도철PSD<NA>
2762778모란48ATO+RF+센서 방식자체예산2009-12-30도철PSD<NA>