Overview

Dataset statistics

Number of variables4
Number of observations669
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory22.3 KiB
Average record size in memory34.2 B

Variable types

Numeric2
Text1
Categorical1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12764/A/1/datasetView.do

Alerts

SUBWAY_ID is highly overall correlated with STATN_ID and 1 other fieldsHigh correlation
STATN_ID is highly overall correlated with SUBWAY_ID and 1 other fieldsHigh correlation
호선이름 is highly overall correlated with SUBWAY_ID and 1 other fieldsHigh correlation
STATN_ID has unique valuesUnique

Reproduction

Analysis started2024-05-11 07:15:21.416309
Analysis finished2024-05-11 07:15:24.214630
Duration2.8 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

SUBWAY_ID
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1027.0135
Minimum1001
Maximum1093
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB
2024-05-11T07:15:24.472195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile1001
Q11003
median1006
Q31063
95-th percentile1087.6
Maximum1093
Range92
Interquartile range (IQR)60

Descriptive statistics

Standard deviation33.138994
Coefficient of variation (CV)0.032267342
Kurtosis-1.1643279
Mean1027.0135
Median Absolute Deviation (MAD)4
Skewness0.81675923
Sum687072
Variance1098.1929
MonotonicityIncreasing
2024-05-11T07:15:24.904018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1001 102
15.2%
1075 63
9.4%
1063 57
8.5%
1005 56
8.4%
1007 53
7.9%
1002 51
7.6%
1004 48
7.2%
1003 44
 
6.6%
1006 39
 
5.8%
1009 38
 
5.7%
Other values (7) 118
17.6%
ValueCountFrequency (%)
1001 102
15.2%
1002 51
7.6%
1003 44
6.6%
1004 48
7.2%
1005 56
8.4%
1006 39
 
5.8%
1007 53
7.9%
1008 18
 
2.7%
1009 38
 
5.7%
1063 57
8.5%
ValueCountFrequency (%)
1093 21
 
3.1%
1092 13
 
1.9%
1081 11
 
1.6%
1077 16
 
2.4%
1075 63
9.4%
1067 25
 
3.7%
1065 14
 
2.1%
1063 57
8.5%
1009 38
5.7%
1008 18
 
2.7%

STATN_ID
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct669
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.027038 × 109
Minimum1.0010001 × 109
Maximum1.093004 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB
2024-05-11T07:15:25.355840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0010001 × 109
5-th percentile1.0010001 × 109
Q11.0030003 × 109
median1.0060006 × 109
Q31.0630753 × 109
95-th percentile1.0876178 × 109
Maximum1.093004 × 109
Range92003922
Interquartile range (IQR)60075013

Descriptive statistics

Standard deviation33156862
Coefficient of variation (CV)0.032283969
Kurtosis-1.166184
Mean1.027038 × 109
Median Absolute Deviation (MAD)4000430
Skewness0.81629874
Sum6.870884 × 1011
Variance1.0993775 × 1015
MonotonicityStrictly increasing
2024-05-11T07:15:26.097030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1001000100 1
 
0.1%
1009000930 1
 
0.1%
1009000932 1
 
0.1%
1009000933 1
 
0.1%
1009000934 1
 
0.1%
1009000935 1
 
0.1%
1009000936 1
 
0.1%
1009000937 1
 
0.1%
1009000938 1
 
0.1%
1063075110 1
 
0.1%
Other values (659) 659
98.5%
ValueCountFrequency (%)
1001000100 1
0.1%
1001000101 1
0.1%
1001000102 1
0.1%
1001000103 1
0.1%
1001000104 1
0.1%
1001000105 1
0.1%
1001000106 1
0.1%
1001000107 1
0.1%
1001000108 1
0.1%
1001000109 1
0.1%
ValueCountFrequency (%)
1093004022 1
0.1%
1093004021 1
0.1%
1093004020 1
0.1%
1093004019 1
0.1%
1093004018 1
0.1%
1093004017 1
0.1%
1093004016 1
0.1%
1093004014 1
0.1%
1093004013 1
0.1%
1093004012 1
0.1%
Distinct548
Distinct (%)81.9%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
2024-05-11T07:15:27.162548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.0508221
Min length2

Characters and Unicode

Total characters2041
Distinct characters294
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique447 ?
Unique (%)66.8%

Sample

1st row소요산
2nd row동두천
3rd row보산
4th row동두천중앙
5th row지행
ValueCountFrequency (%)
청량리 4
 
0.6%
김포공항 4
 
0.6%
공덕 4
 
0.6%
서울 4
 
0.6%
왕십리 4
 
0.6%
홍대입구 3
 
0.4%
디지털미디어시티 3
 
0.4%
신설동 3
 
0.4%
대곡 3
 
0.4%
고속터미널 3
 
0.4%
Other values (539) 635
94.8%
2024-05-11T07:15:28.540259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
69
 
3.4%
56
 
2.7%
52
 
2.5%
51
 
2.5%
48
 
2.4%
41
 
2.0%
38
 
1.9%
36
 
1.8%
32
 
1.6%
) 29
 
1.4%
Other values (284) 1589
77.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1967
96.4%
Close Punctuation 29
 
1.4%
Open Punctuation 29
 
1.4%
Decimal Number 13
 
0.6%
Other Punctuation 2
 
0.1%
Space Separator 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
69
 
3.5%
56
 
2.8%
52
 
2.6%
51
 
2.6%
48
 
2.4%
41
 
2.1%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (273) 1515
77.0%
Decimal Number
ValueCountFrequency (%)
3 5
38.5%
4 3
23.1%
1 2
 
15.4%
2 1
 
7.7%
9 1
 
7.7%
5 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
. 1
50.0%
, 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1967
96.4%
Common 74
 
3.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
69
 
3.5%
56
 
2.8%
52
 
2.6%
51
 
2.6%
48
 
2.4%
41
 
2.1%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (273) 1515
77.0%
Common
ValueCountFrequency (%)
) 29
39.2%
( 29
39.2%
3 5
 
6.8%
4 3
 
4.1%
1 2
 
2.7%
2 1
 
1.4%
1
 
1.4%
. 1
 
1.4%
9 1
 
1.4%
, 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1967
96.4%
ASCII 74
 
3.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
69
 
3.5%
56
 
2.8%
52
 
2.6%
51
 
2.6%
48
 
2.4%
41
 
2.1%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (273) 1515
77.0%
ASCII
ValueCountFrequency (%)
) 29
39.2%
( 29
39.2%
3 5
 
6.8%
4 3
 
4.1%
1 2
 
2.7%
2 1
 
1.4%
1
 
1.4%
. 1
 
1.4%
9 1
 
1.4%
, 1
 
1.4%

호선이름
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
1호선
102 
수인분당선
63 
경의중앙선
57 
5호선
56 
7호선
53 
Other values (12)
338 

Length

Max length5
Median length3
Mean length3.4424514
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 102
15.2%
수인분당선 63
9.4%
경의중앙선 57
8.5%
5호선 56
8.4%
7호선 53
7.9%
2호선 51
7.6%
4호선 48
7.2%
3호선 44
 
6.6%
6호선 39
 
5.8%
9호선 38
 
5.7%
Other values (7) 118
17.6%

Length

2024-05-11T07:15:29.151013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1호선 102
15.2%
수인분당선 63
9.4%
경의중앙선 57
8.5%
5호선 56
8.4%
7호선 53
7.9%
2호선 51
7.6%
4호선 48
7.2%
3호선 44
 
6.6%
6호선 39
 
5.8%
9호선 38
 
5.7%
Other values (7) 118
17.6%

Interactions

2024-05-11T07:15:22.728693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T07:15:22.118908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T07:15:23.124236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T07:15:22.421020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T07:15:29.506707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SUBWAY_IDSTATN_ID호선이름
SUBWAY_ID1.0001.0001.000
STATN_ID1.0001.0001.000
호선이름1.0001.0001.000
2024-05-11T07:15:29.835003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SUBWAY_IDSTATN_ID호선이름
SUBWAY_ID1.0000.9960.991
STATN_ID0.9961.0000.991
호선이름0.9910.9911.000

Missing values

2024-05-11T07:15:23.865964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T07:15:24.106529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

SUBWAY_IDSTATN_IDSTATN_NM호선이름
010011001000100소요산1호선
110011001000101동두천1호선
210011001000102보산1호선
310011001000103동두천중앙1호선
410011001000104지행1호선
510011001000105덕정1호선
610011001000106덕계1호선
710011001000107양주1호선
810011001000108녹양1호선
910011001000109가능1호선
SUBWAY_IDSTATN_IDSTATN_NM호선이름
65910931093004012시흥대야서해선
66010931093004013신천서해선
66110931093004014신현서해선
66210931093004016시흥시청서해선
66310931093004017시흥능곡서해선
66410931093004018달미서해선
66510931093004019선부서해선
66610931093004020초지서해선
66710931093004021시우서해선
66810931093004022원시서해선