Overview

Dataset statistics

Number of variables6
Number of observations9764
Missing cells0
Missing cells (%)0.0%
Duplicate rows321
Duplicate rows (%)3.3%
Total size in memory467.4 KiB
Average record size in memory49.0 B

Variable types

Categorical2
Text3
Numeric1

Dataset

Description서울교통공사의 역별 역세권 현황 정보 입니다. 해당 데이터는 호선, 외부코드, 전철역코드, 출구번호, 역세권 명 데이터를 포함하고 있습니다.
Author서울교통공사
URLhttps://www.data.go.kr/data/15044230/fileData.do

Alerts

Dataset has 321 (3.3%) duplicate rowsDuplicates
전철역코드 is highly overall correlated with 호선High correlation
호선 is highly overall correlated with 전철역코드High correlation

Reproduction

Analysis started2023-12-12 17:09:30.270381
Analysis finished2023-12-12 17:09:31.302973
Duration1.03 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

HIGH CORRELATION 

Distinct18
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size76.4 KiB
2
1877 
1
1394 
3
1248 
5
936 
4
761 
Other values (13)
3548 

Length

Max length2
Median length1
Mean length1.0101393
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 1877
19.2%
1 1394
14.3%
3 1248
12.8%
5 936
9.6%
4 761
7.8%
7 759
7.8%
I 576
 
5.9%
6 509
 
5.2%
B 403
 
4.1%
K 321
 
3.3%
Other values (8) 980
10.0%

Length

2023-12-13T02:09:31.371054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2 1877
19.2%
1 1394
14.3%
3 1248
12.8%
5 936
9.6%
4 761
7.8%
7 759
7.8%
i 576
 
5.9%
6 509
 
5.2%
b 403
 
4.1%
k 321
 
3.3%
Other values (8) 980
10.0%
Distinct532
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size76.4 KiB
2023-12-13T02:09:31.755244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.2473372
Min length2

Characters and Unicode

Total characters31707
Distinct characters19
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row202
2nd row133
3rd row133
4th row221
5th row225
ValueCountFrequency (%)
132 158
 
1.6%
328 103
 
1.1%
327 103
 
1.1%
226 79
 
0.8%
126 74
 
0.8%
203 70
 
0.7%
133 69
 
0.7%
i114 68
 
0.7%
233 63
 
0.6%
218 62
 
0.6%
Other values (522) 8915
91.3%
2023-12-13T02:09:32.366812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 6372
20.1%
1 5860
18.5%
3 4435
14.0%
4 3008
9.5%
5 2444
 
7.7%
7 1770
 
5.6%
6 1560
 
4.9%
0 1510
 
4.8%
8 1266
 
4.0%
9 1177
 
3.7%
Other values (9) 2305
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 29402
92.7%
Uppercase Letter 2089
 
6.6%
Dash Punctuation 154
 
0.5%
Lowercase Letter 62
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 6372
21.7%
1 5860
19.9%
3 4435
15.1%
4 3008
10.2%
5 2444
 
8.3%
7 1770
 
6.0%
6 1560
 
5.3%
0 1510
 
5.1%
8 1266
 
4.3%
9 1177
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
K 757
36.2%
I 576
27.6%
P 537
25.7%
Y 115
 
5.5%
U 67
 
3.2%
D 21
 
1.0%
A 16
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 154
100.0%
Lowercase Letter
ValueCountFrequency (%)
k 62
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 29556
93.2%
Latin 2151
 
6.8%

Most frequent character per script

Common
ValueCountFrequency (%)
2 6372
21.6%
1 5860
19.8%
3 4435
15.0%
4 3008
10.2%
5 2444
 
8.3%
7 1770
 
6.0%
6 1560
 
5.3%
0 1510
 
5.1%
8 1266
 
4.3%
9 1177
 
4.0%
Latin
ValueCountFrequency (%)
K 757
35.2%
I 576
26.8%
P 537
25.0%
Y 115
 
5.3%
U 67
 
3.1%
k 62
 
2.9%
D 21
 
1.0%
A 16
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31707
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 6372
20.1%
1 5860
18.5%
3 4435
14.0%
4 3008
9.5%
5 2444
 
7.7%
7 1770
 
5.6%
6 1560
 
4.9%
0 1510
 
4.8%
8 1266
 
4.0%
9 1177
 
3.7%
Other values (9) 2305
 
7.3%

전철역코드
Real number (ℝ)

HIGH CORRELATION 

Distinct532
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1511.7837
Minimum150
Maximum4615
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size85.9 KiB
2023-12-13T02:09:32.509955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile157
Q1249
median1328
Q32622
95-th percentile4103
Maximum4615
Range4465
Interquartile range (IQR)2373

Descriptive statistics

Standard deviation1254.6623
Coefficient of variation (CV)0.82992183
Kurtosis-0.96777667
Mean1511.7837
Median Absolute Deviation (MAD)1110
Skewness0.4486409
Sum14761056
Variance1574177.5
MonotonicityNot monotonic
2023-12-13T02:09:32.635841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
151 158
 
1.6%
318 103
 
1.1%
317 103
 
1.1%
226 79
 
0.8%
156 74
 
0.8%
203 70
 
0.7%
150 69
 
0.7%
3114 68
 
0.7%
233 63
 
0.6%
218 62
 
0.6%
Other values (522) 8915
91.3%
ValueCountFrequency (%)
150 69
0.7%
151 158
1.6%
152 33
 
0.3%
153 37
 
0.4%
154 54
 
0.6%
155 61
 
0.6%
156 74
0.8%
157 46
 
0.5%
158 27
 
0.3%
159 59
 
0.6%
ValueCountFrequency (%)
4615 1
 
< 0.1%
4614 11
0.1%
4613 12
0.1%
4612 9
0.1%
4611 2
 
< 0.1%
4610 4
 
< 0.1%
4609 2
 
< 0.1%
4608 4
 
< 0.1%
4606 3
 
< 0.1%
4605 5
0.1%
Distinct530
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Memory size76.4 KiB
2023-12-13T02:09:32.927256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length2
Mean length2.8467841
Min length2

Characters and Unicode

Total characters27796
Distinct characters288
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row을지로입구
2nd row서울
3rd row서울
4th row역삼
5th row방배
ValueCountFrequency (%)
시청 158
 
1.6%
안국 103
 
1.1%
경복궁 103
 
1.1%
사당 79
 
0.8%
신설동 74
 
0.8%
을지로3가 70
 
0.7%
서울 69
 
0.7%
계산 68
 
0.7%
대림 63
 
0.6%
종합운동장 62
 
0.6%
Other values (520) 8915
91.3%
2023-12-13T02:09:33.366975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1025
 
3.7%
858
 
3.1%
855
 
3.1%
733
 
2.6%
594
 
2.1%
564
 
2.0%
466
 
1.7%
465
 
1.7%
459
 
1.7%
455
 
1.6%
Other values (278) 21322
76.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 27441
98.7%
Decimal Number 193
 
0.7%
Open Punctuation 68
 
0.2%
Close Punctuation 68
 
0.2%
Other Punctuation 26
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1025
 
3.7%
858
 
3.1%
855
 
3.1%
733
 
2.7%
594
 
2.2%
564
 
2.1%
466
 
1.7%
465
 
1.7%
459
 
1.7%
455
 
1.7%
Other values (272) 20967
76.4%
Decimal Number
ValueCountFrequency (%)
3 107
55.4%
5 54
28.0%
4 32
 
16.6%
Open Punctuation
ValueCountFrequency (%)
( 68
100.0%
Close Punctuation
ValueCountFrequency (%)
) 68
100.0%
Other Punctuation
ValueCountFrequency (%)
· 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 27441
98.7%
Common 355
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1025
 
3.7%
858
 
3.1%
855
 
3.1%
733
 
2.7%
594
 
2.2%
564
 
2.1%
466
 
1.7%
465
 
1.7%
459
 
1.7%
455
 
1.7%
Other values (272) 20967
76.4%
Common
ValueCountFrequency (%)
3 107
30.1%
( 68
19.2%
) 68
19.2%
5 54
15.2%
4 32
 
9.0%
· 26
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 27441
98.7%
ASCII 329
 
1.2%
None 26
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1025
 
3.7%
858
 
3.1%
855
 
3.1%
733
 
2.7%
594
 
2.2%
564
 
2.1%
466
 
1.7%
465
 
1.7%
459
 
1.7%
455
 
1.7%
Other values (272) 20967
76.4%
ASCII
ValueCountFrequency (%)
3 107
32.5%
( 68
20.7%
) 68
20.7%
5 54
16.4%
4 32
 
9.7%
None
ValueCountFrequency (%)
· 26
100.0%

출구번호
Categorical

Distinct20
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size76.4 KiB
1
2477 
2
1918 
3
1505 
4
1257 
5
720 
Other values (15)
1887 

Length

Max length7
Median length1
Mean length1.0409668
Min length1

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row3
2nd row13
3rd row13
4th row7
5th row1

Common Values

ValueCountFrequency (%)
1 2477
25.4%
2 1918
19.6%
3 1505
15.4%
4 1257
12.9%
5 720
 
7.4%
6 658
 
6.7%
7 423
 
4.3%
8 340
 
3.5%
9 136
 
1.4%
10 121
 
1.2%
Other values (10) 209
 
2.1%

Length

2023-12-13T02:09:33.503543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 2477
25.3%
2 1918
19.6%
3 1505
15.4%
4 1257
12.9%
5 720
 
7.4%
6 658
 
6.7%
7 423
 
4.3%
8 340
 
3.5%
9 136
 
1.4%
10 121
 
1.2%
Other values (11) 223
 
2.3%
Distinct7788
Distinct (%)79.8%
Missing0
Missing (%)0.0%
Memory size76.4 KiB
2023-12-13T02:09:33.781000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length37
Median length26
Mean length6.7970094
Min length2

Characters and Unicode

Total characters66366
Distinct characters644
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6520 ?
Unique (%)66.8%

Sample

1st row광교
2nd row대일학원
3rd row삼광초등학교
4th rowGS타워
5th rowKT 서초지사
ValueCountFrequency (%)
방면 80
 
0.7%
아파트 35
 
0.3%
기업은행 34
 
0.3%
주민센터 34
 
0.3%
국민은행 33
 
0.3%
현대아파트 30
 
0.3%
우리은행 23
 
0.2%
우체국 23
 
0.2%
국민건강보험공단 20
 
0.2%
우성아파트 19
 
0.2%
Other values (7987) 10429
96.9%
2023-12-13T02:09:34.188438image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2185
 
3.3%
2180
 
3.3%
1910
 
2.9%
0 1249
 
1.9%
1223
 
1.8%
1196
 
1.8%
1191
 
1.8%
1083
 
1.6%
1042
 
1.6%
1022
 
1.5%
Other values (634) 52085
78.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 58608
88.3%
Decimal Number 3701
 
5.6%
Space Separator 996
 
1.5%
Open Punctuation 920
 
1.4%
Close Punctuation 915
 
1.4%
Other Punctuation 685
 
1.0%
Uppercase Letter 494
 
0.7%
Dash Punctuation 17
 
< 0.1%
Other Symbol 15
 
< 0.1%
Math Symbol 10
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2185
 
3.7%
2180
 
3.7%
1910
 
3.3%
1223
 
2.1%
1196
 
2.0%
1191
 
2.0%
1083
 
1.8%
1042
 
1.8%
1022
 
1.7%
1009
 
1.7%
Other values (585) 44567
76.0%
Uppercase Letter
ValueCountFrequency (%)
A 70
14.2%
K 60
12.1%
S 58
11.7%
T 54
10.9%
C 38
7.7%
G 37
7.5%
L 23
 
4.7%
B 23
 
4.7%
P 22
 
4.5%
M 17
 
3.4%
Other values (13) 92
18.6%
Decimal Number
ValueCountFrequency (%)
0 1249
33.7%
1 623
16.8%
2 564
15.2%
3 340
 
9.2%
5 337
 
9.1%
4 222
 
6.0%
6 150
 
4.1%
7 94
 
2.5%
8 65
 
1.8%
9 57
 
1.5%
Other Punctuation
ValueCountFrequency (%)
/ 471
68.8%
. 138
 
20.1%
· 73
 
10.7%
@ 3
 
0.4%
Lowercase Letter
ValueCountFrequency (%)
o 2
40.0%
l 2
40.0%
d 1
20.0%
Open Punctuation
ValueCountFrequency (%)
( 919
99.9%
[ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 914
99.9%
] 1
 
0.1%
Math Symbol
ValueCountFrequency (%)
~ 6
60.0%
4
40.0%
Space Separator
ValueCountFrequency (%)
996
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%
Other Symbol
ValueCountFrequency (%)
15
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 58623
88.3%
Common 7244
 
10.9%
Latin 499
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2185
 
3.7%
2180
 
3.7%
1910
 
3.3%
1223
 
2.1%
1196
 
2.0%
1191
 
2.0%
1083
 
1.8%
1042
 
1.8%
1022
 
1.7%
1009
 
1.7%
Other values (586) 44582
76.0%
Latin
ValueCountFrequency (%)
A 70
14.0%
K 60
12.0%
S 58
11.6%
T 54
10.8%
C 38
 
7.6%
G 37
 
7.4%
L 23
 
4.6%
B 23
 
4.6%
P 22
 
4.4%
M 17
 
3.4%
Other values (16) 97
19.4%
Common
ValueCountFrequency (%)
0 1249
17.2%
996
13.7%
( 919
12.7%
) 914
12.6%
1 623
8.6%
2 564
7.8%
/ 471
 
6.5%
3 340
 
4.7%
5 337
 
4.7%
4 222
 
3.1%
Other values (12) 609
8.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 58603
88.3%
ASCII 7666
 
11.6%
None 88
 
0.1%
Compat Jamo 5
 
< 0.1%
Math Operators 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2185
 
3.7%
2180
 
3.7%
1910
 
3.3%
1223
 
2.1%
1196
 
2.0%
1191
 
2.0%
1083
 
1.8%
1042
 
1.8%
1022
 
1.7%
1009
 
1.7%
Other values (584) 44562
76.0%
ASCII
ValueCountFrequency (%)
0 1249
16.3%
996
13.0%
( 919
12.0%
) 914
11.9%
1 623
8.1%
2 564
7.4%
/ 471
 
6.1%
3 340
 
4.4%
5 337
 
4.4%
4 222
 
2.9%
Other values (36) 1031
13.4%
None
ValueCountFrequency (%)
· 73
83.0%
15
 
17.0%
Compat Jamo
ValueCountFrequency (%)
5
100.0%
Math Operators
ValueCountFrequency (%)
4
100.0%

Interactions

2023-12-13T02:09:30.955982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T02:09:34.264344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선전철역코드출구번호
호선1.0000.9890.303
전철역코드0.9891.0000.315
출구번호0.3030.3151.000
2023-12-13T02:09:34.330404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
출구번호호선
출구번호1.0000.095
호선0.0951.000
2023-12-13T02:09:34.403105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
전철역코드호선출구번호
전철역코드1.0000.8280.131
호선0.8281.0000.095
출구번호0.1310.0951.000

Missing values

2023-12-13T02:09:31.098084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T02:09:31.240087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선외부코드전철역코드전철역명출구번호역세권명
02202202을지로입구3광교
11133150서울13대일학원
21133150서울13삼광초등학교
32221221역삼7GS타워
42225225방배1KT 서초지사
52225225방배1남부순환로
62225225방배1삼익
72225225방배1서울강남지방노동사무소
82225225방배1서초여성회관
92225225방배1서울남부보훈지청
호선외부코드전철역코드전철역명출구번호역세권명
9754UU1234613어룡2신한은행
9755UU1234613어룡2충의중학교
9756UU1234613어룡2오동초등학교
9757UU1234613어룡2송현고등학교
9758UU1234613어룡2송산2동주민센터
9759UU1234613어룡2의정부용현초등학교
9760UU1234613어룡2송산푸르지오아파트
9761UU1234613어룡2송산주공2.5.7단지아파트
9762UU1234613어룡2근제근린공원
9763UU1244614송산1클나무지역아동센터

Duplicate rows

Most frequently occurring

호선외부코드전철역코드전철역명출구번호역세권명# duplicates
271132151시청3대한성공회3
2163327317경복궁5경복궁3
01124158청량리3미주아파트2
11124158청량리6동부청과시장2
21124158청량리6성바오로병원2
31126156신설동1대광초등학교2
41126156신설동10동대문우체국2
51126156신설동4동대문등기소2
61126156신설동5국민연금동대문중랑지사2
71126156신설동7동대문등기소2