Overview

Dataset statistics

Number of variables7
Number of observations277
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory16.1 KiB
Average record size in memory59.5 B

Variable types

Categorical2
Text1
Numeric3
Unsupported1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-11572/S/1/datasetView.do

Alerts

길이(M) is highly overall correlated with 준공년도 and 1 other fieldsHigh correlation
준공년도 is highly overall correlated with 길이(M) and 1 other fieldsHigh correlation
호선 is highly overall correlated with 길이(M) and 1 other fieldsHigh correlation
층수 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 15:52:13.676800
Analysis finished2024-04-29 15:52:14.872876
Duration1.2 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
5호선
51 
7호선
51 
2호선
50 
6호선
38 
3호선
34 
Other values (3)
53 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
5호선 51
18.4%
7호선 51
18.4%
2호선 50
18.1%
6호선 38
13.7%
3호선 34
12.3%
4호선 26
9.4%
8호선 17
 
6.1%
1호선 10
 
3.6%

Length

2024-04-30T00:52:14.930058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T00:52:15.039539image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5호선 51
18.4%
7호선 51
18.4%
2호선 50
18.1%
6호선 38
13.7%
3호선 34
12.3%
4호선 26
9.4%
8호선 17
 
6.1%
1호선 10
 
3.6%

역명
Text

Distinct244
Distinct (%)88.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
2024-04-30T00:52:15.360712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.9205776
Min length2

Characters and Unicode

Total characters809
Distinct characters210
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique212 ?
Unique (%)76.5%

Sample

1st row서울역
2nd row시청
3rd row종각
4th row종로3가
5th row종로5가
ValueCountFrequency (%)
종로3가 3
 
1.1%
시청 2
 
0.7%
공덕 2
 
0.7%
태릉입구 2
 
0.7%
잠실 2
 
0.7%
고속터미널 2
 
0.7%
노원 2
 
0.7%
가락시장 2
 
0.7%
불광 2
 
0.7%
대림 2
 
0.7%
Other values (234) 256
92.4%
2024-04-30T00:52:15.833048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
32
 
4.0%
29
 
3.6%
25
 
3.1%
25
 
3.1%
18
 
2.2%
15
 
1.9%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
Other values (200) 606
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 801
99.0%
Decimal Number 8
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
32
 
4.0%
29
 
3.6%
25
 
3.1%
25
 
3.1%
18
 
2.2%
15
 
1.9%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
Other values (197) 598
74.7%
Decimal Number
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 801
99.0%
Common 8
 
1.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
32
 
4.0%
29
 
3.6%
25
 
3.1%
25
 
3.1%
18
 
2.2%
15
 
1.9%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
Other values (197) 598
74.7%
Common
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 801
99.0%
ASCII 8
 
1.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
32
 
4.0%
29
 
3.6%
25
 
3.1%
25
 
3.1%
18
 
2.2%
15
 
1.9%
15
 
1.9%
15
 
1.9%
15
 
1.9%
14
 
1.7%
Other values (197) 598
74.7%
ASCII
ValueCountFrequency (%)
3 5
62.5%
4 2
 
25.0%
5 1
 
12.5%

형식
Categorical

Distinct5
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
상대식
186 
섬식
38 
섬 식
37 
복합식
 
10
단 선
 
6

Length

Max length5
Median length3
Mean length3.1732852
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row섬식
2nd row상대식
3rd row상대식
4th row상대식
5th row상대식

Common Values

ValueCountFrequency (%)
상대식 186
67.1%
섬식 38
 
13.7%
섬 식 37
 
13.4%
복합식 10
 
3.6%
단 선 6
 
2.2%

Length

2024-04-30T00:52:15.959960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T00:52:16.051431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
상대식 186
58.1%
섬식 38
 
11.9%
37
 
11.6%
37
 
11.6%
복합식 10
 
3.1%
6
 
1.9%
6
 
1.9%

길이(M)
Real number (ℝ)

HIGH CORRELATION 

Distinct7
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean178.61011
Minimum90
Maximum210
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-30T00:52:16.161449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum90
5-th percentile125
Q1165
median165
Q3205
95-th percentile205
Maximum210
Range120
Interquartile range (IQR)40

Descriptive statistics

Standard deviation24.791879
Coefficient of variation (CV)0.13880446
Kurtosis-0.30975084
Mean178.61011
Median Absolute Deviation (MAD)0
Skewness-0.40752222
Sum49475
Variance614.63729
MonotonicityNot monotonic
2024-04-30T00:52:16.263592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
165 143
51.6%
205 103
37.2%
125 17
 
6.1%
210 10
 
3.6%
130 2
 
0.7%
90 1
 
0.4%
190 1
 
0.4%
ValueCountFrequency (%)
90 1
 
0.4%
125 17
 
6.1%
130 2
 
0.7%
165 143
51.6%
190 1
 
0.4%
205 103
37.2%
210 10
 
3.6%
ValueCountFrequency (%)
210 10
 
3.6%
205 103
37.2%
190 1
 
0.4%
165 143
51.6%
130 2
 
0.7%
125 17
 
6.1%
90 1
 
0.4%

층수
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size2.3 KiB

면적(㎡)
Real number (ℝ)

Distinct275
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8671.7366
Minimum1387
Maximum23052.81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-30T00:52:16.444762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1387
5-th percentile5075.062
Q16526.63
median8133.22
Q39963.89
95-th percentile14028.312
Maximum23052.81
Range21665.81
Interquartile range (IQR)3437.26

Descriptive statistics

Standard deviation3090.0847
Coefficient of variation (CV)0.35633978
Kurtosis2.7702551
Mean8671.7366
Median Absolute Deviation (MAD)1694.22
Skewness1.1863956
Sum2402071
Variance9548623.5
MonotonicityNot monotonic
2024-04-30T00:52:16.569184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6439.0 2
 
0.7%
5875.0 2
 
0.7%
10335.0 1
 
0.4%
6193.76 1
 
0.4%
10677.550000000001 1
 
0.4%
14457.44 1
 
0.4%
6793.790000000001 1
 
0.4%
11098.900000000001 1
 
0.4%
9147.54 1
 
0.4%
7675.719999999999 1
 
0.4%
Other values (265) 265
95.7%
ValueCountFrequency (%)
1387.0 1
0.4%
1423.0 1
0.4%
1503.05 1
0.4%
2203.0 1
0.4%
3860.0 1
0.4%
3936.0 1
0.4%
4497.0 1
0.4%
4691.0 1
0.4%
4838.6 1
0.4%
4844.77 1
0.4%
ValueCountFrequency (%)
23052.81 1
0.4%
20302.8 1
0.4%
18984.55 1
0.4%
18812.65 1
0.4%
18506.0 1
0.4%
18195.21 1
0.4%
16345.05 1
0.4%
16106.43 1
0.4%
15695.34 1
0.4%
15490.0 1
0.4%

준공년도
Real number (ℝ)

HIGH CORRELATION 

Distinct20
Distinct (%)7.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1992.9097
Minimum1974
Maximum2012
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.6 KiB
2024-04-30T00:52:16.677274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1974
5-th percentile1980
Q11985
median1996
Q32000
95-th percentile2002.6
Maximum2012
Range38
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.6075345
Coefficient of variation (CV)0.004319079
Kurtosis-0.54995837
Mean1992.9097
Median Absolute Deviation (MAD)5
Skewness-0.13814696
Sum552036
Variance74.089651
MonotonicityNot monotonic
2024-04-30T00:52:16.779857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1985 47
17.0%
1996 46
16.6%
2001 41
14.8%
1997 26
9.4%
2000 19
6.9%
1984 17
 
6.1%
1983 14
 
5.1%
1995 12
 
4.3%
1980 11
 
4.0%
1974 9
 
3.2%
Other values (10) 35
12.6%
ValueCountFrequency (%)
1974 9
 
3.2%
1980 11
 
4.0%
1982 4
 
1.4%
1983 14
 
5.1%
1984 17
 
6.1%
1985 47
17.0%
1990 1
 
0.4%
1992 2
 
0.7%
1993 8
 
2.9%
1994 1
 
0.4%
ValueCountFrequency (%)
2012 9
 
3.2%
2010 3
 
1.1%
2005 2
 
0.7%
2002 1
 
0.4%
2001 41
14.8%
2000 19
6.9%
1999 4
 
1.4%
1997 26
9.4%
1996 46
16.6%
1995 12
 
4.3%

Interactions

2024-04-30T00:52:14.436240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:13.960240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.200834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.522507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.035907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.285898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.598274image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.112876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T00:52:14.353865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T00:52:16.864413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선형식길이(M)면적(㎡)준공년도
호선1.0000.4860.8080.2910.938
형식0.4861.0000.3920.4220.454
길이(M)0.8080.3921.0000.6140.789
면적(㎡)0.2910.4220.6141.0000.342
준공년도0.9380.4540.7890.3421.000
2024-04-30T00:52:16.950341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
형식호선
형식1.0000.323
호선0.3231.000
2024-04-30T00:52:17.023345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
길이(M)면적(㎡)준공년도호선형식
길이(M)1.0000.060-0.7340.6190.278
면적(㎡)0.0601.0000.1480.1400.178
준공년도-0.7340.1481.0000.6610.292
호선0.6190.1400.6611.0000.323
형식0.2780.1780.2920.3231.000

Missing values

2024-04-30T00:52:14.726062image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T00:52:14.829689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역명형식길이(M)층수면적(㎡)준공년도
01호선서울역섬식2102층10335.01974
11호선시청상대식2102층10421.01974
21호선종각상대식2102층9072.01974
31호선종로3가상대식2102층9311.01974
41호선종로5가상대식2102층10465.01974
51호선동대문상대식2102층5490.01974
61호선동묘앞상대식2106층9473.02005
71호선신설동상대식2102층7240.01974
81호선제기동상대식2102층8662.01974
91호선청량리섬식2102층7125.01974
호선역명형식길이(M)층수면적(㎡)준공년도
2678호선가락시장상대식1252층7728.131996
2688호선문정상대식1252층5193.951997
2698호선장지상대식1252층5727.891997
2708호선복정섬 식1252층6585.851997
2718호선산성상대식1254층6526.631997
2728호선남한산성입구상대식1253층5412.291997
2738호선단대오거리상대식1253층8133.221997
2748호선신흥상대식1252층4861.611997
2758호선수진상대식1252층5067.311997
2768호선모란상대식1253층9918.771997