Overview

Dataset statistics

Number of variables4
Number of observations667
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory22.3 KiB
Average record size in memory34.2 B

Variable types

Numeric2
Text1
Categorical1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12764/A/1/datasetView.do

Alerts

SUBWAY_ID is highly overall correlated with STATN_ID and 1 other fieldsHigh correlation
STATN_ID is highly overall correlated with SUBWAY_ID and 1 other fieldsHigh correlation
호선이름 is highly overall correlated with SUBWAY_ID and 1 other fieldsHigh correlation
STATN_ID has unique valuesUnique

Reproduction

Analysis started2024-05-04 00:29:44.178076
Analysis finished2024-05-04 00:29:46.483311
Duration2.31 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

SUBWAY_ID
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1026.9055
Minimum1001
Maximum1093
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB
2024-05-04T00:29:46.691623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1001
5-th percentile1001
Q11003
median1006
Q31063
95-th percentile1088.7
Maximum1093
Range92
Interquartile range (IQR)60

Descriptive statistics

Standard deviation33.129898
Coefficient of variation (CV)0.032261875
Kurtosis-1.1501864
Mean1026.9055
Median Absolute Deviation (MAD)4
Skewness0.82583454
Sum684946
Variance1097.5902
MonotonicityIncreasing
2024-05-04T00:29:47.143183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1001 102
15.3%
1075 63
9.4%
1005 56
8.4%
1063 55
8.2%
1007 53
7.9%
1002 51
7.6%
1004 48
7.2%
1003 44
 
6.6%
1006 39
 
5.8%
1009 38
 
5.7%
Other values (7) 118
17.7%
ValueCountFrequency (%)
1001 102
15.3%
1002 51
7.6%
1003 44
6.6%
1004 48
7.2%
1005 56
8.4%
1006 39
 
5.8%
1007 53
7.9%
1008 18
 
2.7%
1009 38
 
5.7%
1063 55
8.2%
ValueCountFrequency (%)
1093 21
 
3.1%
1092 13
 
1.9%
1081 11
 
1.6%
1077 16
 
2.4%
1075 63
9.4%
1067 25
 
3.7%
1065 14
 
2.1%
1063 55
8.2%
1009 38
5.7%
1008 18
 
2.7%

STATN_ID
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct667
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0269299 × 109
Minimum1.0010001 × 109
Maximum1.093004 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.0 KiB
2024-05-04T00:29:47.540248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0010001 × 109
5-th percentile1.0010001 × 109
Q11.0030003 × 109
median1.0060006 × 109
Q31.0630753 × 109
95-th percentile1.0887145 × 109
Maximum1.093004 × 109
Range92003922
Interquartile range (IQR)60075012

Descriptive statistics

Standard deviation33147658
Coefficient of variation (CV)0.032278404
Kurtosis-1.1520386
Mean1.0269299 × 109
Median Absolute Deviation (MAD)4000428
Skewness0.82537834
Sum6.8496225 × 1011
Variance1.0987673 × 1015
MonotonicityStrictly increasing
2024-05-04T00:29:47.995158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1001000100 1
 
0.1%
1009000929 1
 
0.1%
1009000931 1
 
0.1%
1009000932 1
 
0.1%
1009000933 1
 
0.1%
1009000934 1
 
0.1%
1009000935 1
 
0.1%
1009000936 1
 
0.1%
1009000937 1
 
0.1%
1009000938 1
 
0.1%
Other values (657) 657
98.5%
ValueCountFrequency (%)
1001000100 1
0.1%
1001000101 1
0.1%
1001000102 1
0.1%
1001000103 1
0.1%
1001000104 1
0.1%
1001000105 1
0.1%
1001000106 1
0.1%
1001000107 1
0.1%
1001000108 1
0.1%
1001000109 1
0.1%
ValueCountFrequency (%)
1093004022 1
0.1%
1093004021 1
0.1%
1093004020 1
0.1%
1093004019 1
0.1%
1093004018 1
0.1%
1093004017 1
0.1%
1093004016 1
0.1%
1093004014 1
0.1%
1093004013 1
0.1%
1093004012 1
0.1%
Distinct546
Distinct (%)81.9%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
2024-05-04T00:29:49.077146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length2
Mean length3.0524738
Min length2

Characters and Unicode

Total characters2036
Distinct characters293
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique445 ?
Unique (%)66.7%

Sample

1st row소요산
2nd row동두천
3rd row보산
4th row동두천중앙
5th row지행
ValueCountFrequency (%)
청량리 4
 
0.6%
김포공항 4
 
0.6%
공덕 4
 
0.6%
서울 4
 
0.6%
왕십리 4
 
0.6%
홍대입구 3
 
0.4%
대곡 3
 
0.4%
고속터미널 3
 
0.4%
초지 3
 
0.4%
신설동 3
 
0.4%
Other values (537) 633
94.8%
2024-05-04T00:29:50.424824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
69
 
3.4%
56
 
2.8%
52
 
2.6%
51
 
2.5%
48
 
2.4%
40
 
2.0%
38
 
1.9%
36
 
1.8%
32
 
1.6%
) 29
 
1.4%
Other values (283) 1585
77.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1962
96.4%
Close Punctuation 29
 
1.4%
Open Punctuation 29
 
1.4%
Decimal Number 13
 
0.6%
Other Punctuation 2
 
0.1%
Space Separator 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
69
 
3.5%
56
 
2.9%
52
 
2.7%
51
 
2.6%
48
 
2.4%
40
 
2.0%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (272) 1511
77.0%
Decimal Number
ValueCountFrequency (%)
3 5
38.5%
4 3
23.1%
1 2
 
15.4%
9 1
 
7.7%
2 1
 
7.7%
5 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
. 1
50.0%
, 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 29
100.0%
Open Punctuation
ValueCountFrequency (%)
( 29
100.0%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1962
96.4%
Common 74
 
3.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
69
 
3.5%
56
 
2.9%
52
 
2.7%
51
 
2.6%
48
 
2.4%
40
 
2.0%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (272) 1511
77.0%
Common
ValueCountFrequency (%)
) 29
39.2%
( 29
39.2%
3 5
 
6.8%
4 3
 
4.1%
1 2
 
2.7%
9 1
 
1.4%
1
 
1.4%
2 1
 
1.4%
. 1
 
1.4%
, 1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1962
96.4%
ASCII 74
 
3.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
69
 
3.5%
56
 
2.9%
52
 
2.7%
51
 
2.6%
48
 
2.4%
40
 
2.0%
38
 
1.9%
36
 
1.8%
32
 
1.6%
29
 
1.5%
Other values (272) 1511
77.0%
ASCII
ValueCountFrequency (%)
) 29
39.2%
( 29
39.2%
3 5
 
6.8%
4 3
 
4.1%
1 2
 
2.7%
9 1
 
1.4%
1
 
1.4%
2 1
 
1.4%
. 1
 
1.4%
, 1
 
1.4%

호선이름
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size5.3 KiB
1호선
102 
수인분당선
63 
5호선
56 
경의중앙선
55 
7호선
53 
Other values (12)
338 

Length

Max length5
Median length3
Mean length3.4377811
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1호선
2nd row1호선
3rd row1호선
4th row1호선
5th row1호선

Common Values

ValueCountFrequency (%)
1호선 102
15.3%
수인분당선 63
9.4%
5호선 56
8.4%
경의중앙선 55
8.2%
7호선 53
7.9%
2호선 51
7.6%
4호선 48
7.2%
3호선 44
 
6.6%
6호선 39
 
5.8%
9호선 38
 
5.7%
Other values (7) 118
17.7%

Length

2024-05-04T00:29:51.004155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1호선 102
15.3%
수인분당선 63
9.4%
5호선 56
8.4%
경의중앙선 55
8.2%
7호선 53
7.9%
2호선 51
7.6%
4호선 48
7.2%
3호선 44
 
6.6%
6호선 39
 
5.8%
9호선 38
 
5.7%
Other values (7) 118
17.7%

Interactions

2024-05-04T00:29:45.306844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T00:29:44.748673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T00:29:45.617154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T00:29:45.010098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-04T00:29:51.282212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SUBWAY_IDSTATN_ID호선이름
SUBWAY_ID1.0001.0001.000
STATN_ID1.0001.0001.000
호선이름1.0001.0001.000
2024-05-04T00:29:51.604074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
SUBWAY_IDSTATN_ID호선이름
SUBWAY_ID1.0000.9960.991
STATN_ID0.9961.0000.991
호선이름0.9910.9911.000

Missing values

2024-05-04T00:29:46.092790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T00:29:46.372579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

SUBWAY_IDSTATN_IDSTATN_NM호선이름
010011001000100소요산1호선
110011001000101동두천1호선
210011001000102보산1호선
310011001000103동두천중앙1호선
410011001000104지행1호선
510011001000105덕정1호선
610011001000106덕계1호선
710011001000107양주1호선
810011001000108녹양1호선
910011001000109가능1호선
SUBWAY_IDSTATN_IDSTATN_NM호선이름
65710931093004012시흥대야서해선
65810931093004013신천서해선
65910931093004014신현서해선
66010931093004016시흥시청서해선
66110931093004017시흥능곡서해선
66210931093004018달미서해선
66310931093004019선부서해선
66410931093004020초지서해선
66510931093004021시우서해선
66610931093004022원시서해선