Overview

Dataset statistics

Number of variables8
Number of observations285
Missing cells2
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.8 KiB
Average record size in memory67.5 B

Variable types

Numeric3
Text1
Categorical3
DateTime1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13293/F/1/datasetView.do

Alerts

연번 is highly overall correlated with 호선 and 2 other fieldsHigh correlation
호선 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
안전문수량 is highly overall correlated with 연번 and 2 other fieldsHigh correlation
개폐방식 is highly overall correlated with 연번 and 3 other fieldsHigh correlation
설치업체 is highly overall correlated with 호선 and 1 other fieldsHigh correlation
사업방식 is highly imbalanced (61.2%)Imbalance

Reproduction

Analysis started2024-04-17 04:25:48.709325
Analysis finished2024-04-17 04:25:49.834510
Duration1.13 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION 

Distinct283
Distinct (%)100.0%
Missing2
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean142
Minimum1
Maximum283
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:49.885681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile15.1
Q171.5
median142
Q3212.5
95-th percentile268.9
Maximum283
Range282
Interquartile range (IQR)141

Descriptive statistics

Standard deviation81.839273
Coefficient of variation (CV)0.57633291
Kurtosis-1.2
Mean142
Median Absolute Deviation (MAD)71
Skewness0
Sum40186
Variance6697.6667
MonotonicityStrictly increasing
2024-04-17T13:25:49.987041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
188 1
 
0.4%
194 1
 
0.4%
193 1
 
0.4%
192 1
 
0.4%
191 1
 
0.4%
190 1
 
0.4%
189 1
 
0.4%
187 1
 
0.4%
196 1
 
0.4%
186 1
 
0.4%
Other values (273) 273
95.8%
(Missing) 2
 
0.7%
ValueCountFrequency (%)
1 1
0.4%
2 1
0.4%
3 1
0.4%
4 1
0.4%
5 1
0.4%
6 1
0.4%
7 1
0.4%
8 1
0.4%
9 1
0.4%
10 1
0.4%
ValueCountFrequency (%)
283 1
0.4%
282 1
0.4%
281 1
0.4%
280 1
0.4%
279 1
0.4%
278 1
0.4%
277 1
0.4%
276 1
0.4%
275 1
0.4%
274 1
0.4%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6561404
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:50.073812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q36
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.014299
Coefficient of variation (CV)0.43261132
Kurtosis-1.1907495
Mean4.6561404
Median Absolute Deviation (MAD)2
Skewness-0.10227487
Sum1327
Variance4.0574005
MonotonicityIncreasing
2024-04-17T13:25:50.401556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5 56
19.6%
2 52
18.2%
7 51
17.9%
6 39
13.7%
3 34
11.9%
4 26
9.1%
8 17
 
6.0%
1 10
 
3.5%
ValueCountFrequency (%)
1 10
 
3.5%
2 52
18.2%
3 34
11.9%
4 26
9.1%
5 56
19.6%
6 39
13.7%
7 51
17.9%
8 17
 
6.0%
ValueCountFrequency (%)
8 17
 
6.0%
7 51
17.9%
6 39
13.7%
5 56
19.6%
4 26
9.1%
3 34
11.9%
2 52
18.2%
1 10
 
3.5%
Distinct253
Distinct (%)88.8%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
2024-04-17T13:25:50.619696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.9473684
Min length2

Characters and Unicode

Total characters840
Distinct characters217
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique222 ?
Unique (%)77.9%

Sample

1st row서울
2nd row시청
3rd row종각
4th row종로3가
5th row종로5가
ValueCountFrequency (%)
종로3가 3
 
1.1%
왕십리 2
 
0.7%
노원 2
 
0.7%
약수 2
 
0.7%
고속터미널 2
 
0.7%
건대입구 2
 
0.7%
태릉입구 2
 
0.7%
연신내 2
 
0.7%
교대 2
 
0.7%
합정 2
 
0.7%
Other values (243) 264
92.6%
2024-04-17T13:25:50.958019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
 
3.8%
28
 
3.3%
26
 
3.1%
25
 
3.0%
21
 
2.5%
16
 
1.9%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
Other values (207) 633
75.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 823
98.0%
Decimal Number 8
 
1.0%
Close Punctuation 3
 
0.4%
Open Punctuation 3
 
0.4%
Uppercase Letter 3
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
3.9%
28
 
3.4%
26
 
3.2%
25
 
3.0%
21
 
2.6%
16
 
1.9%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
Other values (199) 616
74.8%
Decimal Number
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
D 1
33.3%
M 1
33.3%
C 1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 823
98.0%
Common 14
 
1.7%
Latin 3
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
3.9%
28
 
3.4%
26
 
3.2%
25
 
3.0%
21
 
2.6%
16
 
1.9%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
Other values (199) 616
74.8%
Common
ValueCountFrequency (%)
3 5
35.7%
) 3
21.4%
( 3
21.4%
4 2
 
14.3%
5 1
 
7.1%
Latin
ValueCountFrequency (%)
D 1
33.3%
M 1
33.3%
C 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 823
98.0%
ASCII 17
 
2.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
32
 
3.9%
28
 
3.4%
26
 
3.2%
25
 
3.0%
21
 
2.6%
16
 
1.9%
15
 
1.8%
15
 
1.8%
15
 
1.8%
14
 
1.7%
Other values (199) 616
74.8%
ASCII
ValueCountFrequency (%)
3 5
29.4%
) 3
17.6%
( 3
17.6%
4 2
 
11.8%
D 1
 
5.9%
M 1
 
5.9%
C 1
 
5.9%
5 1
 
5.9%

안전문수량
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.231579
Minimum24
Maximum160
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-17T13:25:51.054092image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile48
Q164
median64
Q380
95-th percentile80
Maximum160
Range136
Interquartile range (IQR)16

Descriptive statistics

Standard deviation16.750522
Coefficient of variation (CV)0.23850413
Kurtosis5.3973459
Mean70.231579
Median Absolute Deviation (MAD)16
Skewness1.0467806
Sum20016
Variance280.57999
MonotonicityNot monotonic
2024-04-17T13:25:51.146480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
64 129
45.3%
80 112
39.3%
48 20
 
7.0%
32 11
 
3.9%
128 7
 
2.5%
96 3
 
1.1%
160 1
 
0.4%
24 1
 
0.4%
120 1
 
0.4%
ValueCountFrequency (%)
24 1
 
0.4%
32 11
 
3.9%
48 20
 
7.0%
64 129
45.3%
80 112
39.3%
96 3
 
1.1%
120 1
 
0.4%
128 7
 
2.5%
160 1
 
0.4%
ValueCountFrequency (%)
160 1
 
0.4%
128 7
 
2.5%
120 1
 
0.4%
96 3
 
1.1%
80 112
39.3%
64 129
45.3%
48 20
 
7.0%
32 11
 
3.9%
24 1
 
0.4%

개폐방식
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
ATO+RF방식
144 
RF+센서방식
62 
센서방식
60 
ATO+RF+센서방식
19 

Length

Max length11
Median length8
Mean length7.1403509
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRF+센서방식
2nd rowRF+센서방식
3rd rowRF+센서방식
4th rowRF+센서방식
5th rowRF+센서방식

Common Values

ValueCountFrequency (%)
ATO+RF방식 144
50.5%
RF+센서방식 62
21.8%
센서방식 60
21.1%
ATO+RF+센서방식 19
 
6.7%

Length

2024-04-17T13:25:51.270126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:51.352976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ato+rf방식 144
50.5%
rf+센서방식 62
21.8%
센서방식 60
21.1%
ato+rf+센서방식 19
 
6.7%

사업방식
Categorical

IMBALANCE 

Distinct4
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
자체예산
244 
민자유치
 
24
위수탁
 
14
서울시(신설역)
 
3

Length

Max length8
Median length4
Mean length3.9929825
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row민자유치
2nd row민자유치
3rd row자체예산
4th row민자유치
5th row자체예산

Common Values

ValueCountFrequency (%)
자체예산 244
85.6%
민자유치 24
 
8.4%
위수탁 14
 
4.9%
서울시(신설역) 3
 
1.1%

Length

2024-04-17T13:25:51.436015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T13:25:51.518046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자체예산 244
85.6%
민자유치 24
 
8.4%
위수탁 14
 
4.9%
서울시(신설역 3
 
1.1%
Distinct49
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
Minimum2005-11-04 00:00:00
Maximum2021-03-12 00:00:00
2024-04-17T13:25:51.609334image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:51.716193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)

설치업체
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size2.4 KiB
현대엘리베이터
112 
GS네오텍
44 
삼중테크
34 
삼성SDS
27 
도철PSD
20 
Other values (7)
48 

Length

Max length7
Median length6
Mean length5.6280702
Min length3

Unique

Unique3 ?
Unique (%)1.1%

Sample

1st row현대엘리베이터
2nd row현대엘리베이터
3rd row현대엘리베이터
4th row현대엘리베이터
5th row현대엘리베이터

Common Values

ValueCountFrequency (%)
현대엘리베이터 112
39.3%
GS네오텍 44
 
15.4%
삼중테크 34
 
11.9%
삼성SDS 27
 
9.5%
도철PSD 20
 
7.0%
포스콘 20
 
7.0%
피에쓰에쓰텍 14
 
4.9%
㈜포스코ICT 8
 
2.8%
서윤산업 3
 
1.1%
㈜에스티이엔 1
 
0.4%
Other values (2) 2
 
0.7%

Length

2024-04-17T13:25:51.836120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
현대엘리베이터 112
39.3%
gs네오텍 44
 
15.4%
삼중테크 34
 
11.9%
삼성sds 27
 
9.5%
도철psd 20
 
7.0%
포스콘 20
 
7.0%
피에쓰에쓰텍 14
 
4.9%
㈜포스코ict 8
 
2.8%
서윤산업 3
 
1.1%
㈜에스티이엔 1
 
0.4%
Other values (2) 2
 
0.7%

Interactions

2024-04-17T13:25:49.470467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.048101image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.255369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.537816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.113168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.322834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.609946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.187632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-17T13:25:49.393009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-17T13:25:51.906430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선안전문수량개폐방식사업방식준공일설치업체
연번1.0000.9230.7190.8960.5850.8860.780
호선0.9231.0000.7630.9820.6450.9290.857
안전문수량0.7190.7631.0000.7350.2970.7780.736
개폐방식0.8960.9820.7351.0000.6520.8610.830
사업방식0.5850.6450.2970.6521.0000.9460.496
준공일0.8860.9290.7780.8610.9461.0000.995
설치업체0.7800.8570.7360.8300.4960.9951.000
2024-04-17T13:25:51.986361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
개폐방식사업방식설치업체
개폐방식1.0000.3060.517
사업방식0.3061.0000.245
설치업체0.5170.2451.000
2024-04-17T13:25:52.055552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선안전문수량개폐방식사업방식설치업체
연번1.0000.988-0.6460.7650.3860.471
호선0.9881.000-0.6480.8110.3310.574
안전문수량-0.646-0.6481.0000.6070.2070.472
개폐방식0.7650.8110.6071.0000.3060.517
사업방식0.3860.3310.2070.3061.0000.245
설치업체0.4710.5740.4720.5170.2451.000

Missing values

2024-04-17T13:25:49.704278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T13:25:49.798427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선역사명안전문수량개폐방식사업방식준공일설치업체
011서울80RF+센서방식민자유치2007-11-01현대엘리베이터
121시청80RF+센서방식민자유치2007-12-03현대엘리베이터
231종각80RF+센서방식자체예산2009-12-29현대엘리베이터
341종로3가80RF+센서방식민자유치2008-01-03현대엘리베이터
451종로5가80RF+센서방식자체예산2009-12-29현대엘리베이터
561동대문80RF+센서방식자체예산2008-06-18서윤산업
671동묘80RF+센서방식자체예산2006-01-10현대엘리베이터
781신설동80RF+센서방식자체예산2009-12-29현대엘리베이터
891제기동80RF+센서방식자체예산2009-12-29현대엘리베이터
9101청량리80RF+센서방식자체예산2009-12-29현대엘리베이터
연번호선역사명안전문수량개폐방식사업방식준공일설치업체
2752748가락시장48ATO+RF방식자체예산2009-12-30도철PSD
2762758문정48ATO+RF방식자체예산2009-12-30도철PSD
2772768장지48ATO+RF방식자체예산2009-12-30도철PSD
2782778복정48ATO+RF방식자체예산2009-12-30도철PSD
2792788산성48ATO+RF방식자체예산2009-12-30도철PSD
2802798남한산성입구48ATO+RF방식자체예산2009-12-30도철PSD
2812808단대오거리48ATO+RF방식자체예산2009-12-30도철PSD
2822818신흥48ATO+RF방식자체예산2009-12-30도철PSD
2832828수진48ATO+RF방식자체예산2009-12-30도철PSD
2842838모란48ATO+RF+센서방식자체예산2009-12-30도철PSD