Overview

Dataset statistics

Number of variables6
Number of observations140
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.1 KiB
Average record size in memory51.9 B

Variable types

Numeric3
Text1
Categorical1
Unsupported1

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-13290/F/1/datasetView.do

Alerts

연번 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 연번High correlation
연번 has unique valuesUnique
환승 소요시간(분,초) is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-29 16:48:53.804544
Analysis finished2024-04-29 16:48:56.320141
Duration2.52 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct140
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.5
Minimum1
Maximum140
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2024-04-30T01:48:56.388052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7.95
Q135.75
median70.5
Q3105.25
95-th percentile133.05
Maximum140
Range139
Interquartile range (IQR)69.5

Descriptive statistics

Standard deviation40.5586
Coefficient of variation (CV)0.57529928
Kurtosis-1.2
Mean70.5
Median Absolute Deviation (MAD)35
Skewness0
Sum9870
Variance1645
MonotonicityStrictly increasing
2024-04-30T01:48:56.514455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.7%
98 1
 
0.7%
92 1
 
0.7%
93 1
 
0.7%
94 1
 
0.7%
95 1
 
0.7%
96 1
 
0.7%
97 1
 
0.7%
99 1
 
0.7%
90 1
 
0.7%
Other values (130) 130
92.9%
ValueCountFrequency (%)
1 1
0.7%
2 1
0.7%
3 1
0.7%
4 1
0.7%
5 1
0.7%
6 1
0.7%
7 1
0.7%
8 1
0.7%
9 1
0.7%
10 1
0.7%
ValueCountFrequency (%)
140 1
0.7%
139 1
0.7%
138 1
0.7%
137 1
0.7%
136 1
0.7%
135 1
0.7%
134 1
0.7%
133 1
0.7%
132 1
0.7%
131 1
0.7%

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)6.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.2785714
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2024-04-30T01:48:56.622372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1825596
Coefficient of variation (CV)0.51011409
Kurtosis-0.93873424
Mean4.2785714
Median Absolute Deviation (MAD)2
Skewness0.21805266
Sum599
Variance4.7635663
MonotonicityIncreasing
2024-04-30T01:48:56.730050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2 28
20.0%
5 25
17.9%
6 18
12.9%
3 16
11.4%
4 15
10.7%
7 15
10.7%
1 13
9.3%
8 6
 
4.3%
9 4
 
2.9%
ValueCountFrequency (%)
1 13
9.3%
2 28
20.0%
3 16
11.4%
4 15
10.7%
5 25
17.9%
6 18
12.9%
7 15
10.7%
8 6
 
4.3%
9 4
 
2.9%
ValueCountFrequency (%)
9 4
 
2.9%
8 6
 
4.3%
7 15
10.7%
6 18
12.9%
5 25
17.9%
4 15
10.7%
3 16
11.4%
2 28
20.0%
1 13
9.3%
Distinct72
Distinct (%)51.4%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2024-04-30T01:48:56.952191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length3.2857143
Min length2

Characters and Unicode

Total characters460
Distinct characters110
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique30 ?
Unique (%)21.4%

Sample

1st row서울
2nd row서울
3rd row서울
4th row시청
5th row종로3가
ValueCountFrequency (%)
서울 6
 
4.3%
왕십리 6
 
4.3%
공덕 6
 
4.3%
종로3가 6
 
4.3%
동대문역사문화공원 6
 
4.3%
신설동 4
 
2.9%
고속터미널 4
 
2.9%
김포공항 3
 
2.1%
청량리 3
 
2.1%
노원 2
 
1.4%
Other values (62) 94
67.1%
2024-04-30T01:48:57.325391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
19
 
4.1%
18
 
3.9%
18
 
3.9%
15
 
3.3%
14
 
3.0%
14
 
3.0%
14
 
3.0%
13
 
2.8%
11
 
2.4%
11
 
2.4%
Other values (100) 313
68.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 450
97.8%
Decimal Number 10
 
2.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
19
 
4.2%
18
 
4.0%
18
 
4.0%
15
 
3.3%
14
 
3.1%
14
 
3.1%
14
 
3.1%
13
 
2.9%
11
 
2.4%
11
 
2.4%
Other values (98) 303
67.3%
Decimal Number
ValueCountFrequency (%)
3 8
80.0%
4 2
 
20.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 450
97.8%
Common 10
 
2.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
19
 
4.2%
18
 
4.0%
18
 
4.0%
15
 
3.3%
14
 
3.1%
14
 
3.1%
14
 
3.1%
13
 
2.9%
11
 
2.4%
11
 
2.4%
Other values (98) 303
67.3%
Common
ValueCountFrequency (%)
3 8
80.0%
4 2
 
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 450
97.8%
ASCII 10
 
2.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
19
 
4.2%
18
 
4.0%
18
 
4.0%
15
 
3.3%
14
 
3.1%
14
 
3.1%
14
 
3.1%
13
 
2.9%
11
 
2.4%
11
 
2.4%
Other values (98) 303
67.3%
ASCII
ValueCountFrequency (%)
3 8
80.0%
4 2
 
20.0%

환승노선
Categorical

Distinct19
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2호선
20 
5호선
15 
1호선
13 
경의중앙선
13 
3호선
11 
Other values (14)
68 

Length

Max length6
Median length3
Mean length3.3357143
Min length2

Unique

Unique4 ?
Unique (%)2.9%

Sample

1st row4호선
2nd row공항철도
3rd row경의중앙선
4th row2호선
5th row3호선

Common Values

ValueCountFrequency (%)
2호선 20
14.3%
5호선 15
10.7%
1호선 13
9.3%
경의중앙선 13
9.3%
3호선 11
7.9%
6호선 10
7.1%
4호선 9
 
6.4%
9호선 9
 
6.4%
분당선 9
 
6.4%
7호선 7
 
5.0%
Other values (9) 24
17.1%

Length

2024-04-30T01:48:57.447429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2호선 20
14.3%
5호선 15
10.7%
1호선 13
9.3%
경의중앙선 13
9.3%
3호선 11
7.9%
6호선 10
7.1%
4호선 9
 
6.4%
9호선 9
 
6.4%
분당선 9
 
6.4%
공항철도 7
 
5.0%
Other values (9) 24
17.1%

환승거리(m)
Real number (ℝ)

Distinct77
Distinct (%)55.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean133.12143
Minimum7
Maximum355
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.4 KiB
2024-04-30T01:48:57.551993image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile34.4
Q177.75
median107
Q3181
95-th percentile279.35
Maximum355
Range348
Interquartile range (IQR)103.25

Descriptive statistics

Standard deviation78.918177
Coefficient of variation (CV)0.5928285
Kurtosis-0.10541884
Mean133.12143
Median Absolute Deviation (MAD)48
Skewness0.78442607
Sum18637
Variance6228.0787
MonotonicityNot monotonic
2024-04-30T01:48:57.667625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
110 5
 
3.6%
159 4
 
2.9%
77 4
 
2.9%
35 4
 
2.9%
155 4
 
2.9%
100 4
 
2.9%
82 4
 
2.9%
75 4
 
2.9%
93 3
 
2.1%
81 3
 
2.1%
Other values (67) 101
72.1%
ValueCountFrequency (%)
7 1
 
0.7%
16 1
 
0.7%
17 3
2.1%
19 1
 
0.7%
23 1
 
0.7%
35 4
2.9%
41 1
 
0.7%
44 1
 
0.7%
45 2
1.4%
46 2
1.4%
ValueCountFrequency (%)
355 1
0.7%
323 1
0.7%
314 1
0.7%
312 2
1.4%
309 1
0.7%
305 1
0.7%
278 2
1.4%
276 2
1.4%
275 1
0.7%
273 1
0.7%

환승 소요시간(분,초)
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size1.2 KiB

Interactions

2024-04-30T01:48:55.908455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.379462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.656208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.997016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.501463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.733955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:56.085073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.578283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-30T01:48:55.816010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-30T01:48:57.748533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선환승역명환승노선환승거리(m)
연번1.0000.9510.8270.0000.195
호선0.9511.0000.8200.0000.230
환승역명0.8270.8201.0000.9040.887
환승노선0.0000.0000.9041.0000.037
환승거리(m)0.1950.2300.8870.0371.000
2024-04-30T01:48:57.828405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번호선환승거리(m)환승노선
연번1.0000.990-0.1600.000
호선0.9901.000-0.1670.000
환승거리(m)-0.160-0.1671.0000.000
환승노선0.0000.0000.0001.000

Missing values

2024-04-30T01:48:56.185257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T01:48:56.273852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번호선환승역명환승노선환승거리(m)환승 소요시간(분,초)
011서울4호선15900:02:13
121서울공항철도30900:04:18
231서울경의중앙선16400:02:17
341시청2호선10100:01:24
451종로3가3호선11800:01:38
561종로3가5호선31200:04:20
671동대문4호선19400:02:42
781동묘앞6호선9600:01:20
891신설동2호선15900:02:13
9101신설동우이신설선12900:01:48
연번호선환승역명환승노선환승거리(m)환승 소요시간(분,초)
1301318천호5호선3500:00:29
1311328석촌9호선8200:01:08
1321338잠실2호선19000:02:38
1331348가락시장3호선3500:00:29
1341358복정분당선1600:00:13
1351368모란분당선9900:01:23
1361379선정릉분당선11000:01:32
1371389종합운동장2호선9400:01:18
1381399석촌8호선8200:01:08
1391409올림픽공원5호선9300:01:18