Overview

Dataset statistics

Number of variables4
Number of observations8334
Missing cells0
Missing cells (%)0.0%
Duplicate rows9
Duplicate rows (%)0.1%
Total size in memory268.7 KiB
Average record size in memory33.0 B

Variable types

Numeric1
Text2
Categorical1

Dataset

Description역코드(내부),역코드(외부),출구번호,건물명
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-15993/S/1/datasetView.do

Alerts

Dataset has 9 (0.1%) duplicate rowsDuplicates

Reproduction

Analysis started2024-04-06 12:52:22.070858
Analysis finished2024-04-06 12:52:23.586035
Duration1.52 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

역코드(내부)
Real number (ℝ)

Distinct439
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1703.6866
Minimum150
Maximum4138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size73.4 KiB
2024-04-06T21:52:23.760579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile206
Q1332
median1956
Q32634
95-th percentile3110
Maximum4138
Range3988
Interquartile range (IQR)2302

Descriptive statistics

Standard deviation1151.3368
Coefficient of variation (CV)0.67579146
Kurtosis-1.363701
Mean1703.6866
Median Absolute Deviation (MAD)791
Skewness-0.10557137
Sum14198524
Variance1325576.5
MonotonicityIncreasing
2024-04-06T21:52:24.211940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
433 70
 
0.8%
2533 53
 
0.6%
2621 48
 
0.6%
2534 48
 
0.6%
318 48
 
0.6%
203 46
 
0.6%
151 44
 
0.5%
2511 44
 
0.5%
317 43
 
0.5%
2731 43
 
0.5%
Other values (429) 7847
94.2%
ValueCountFrequency (%)
150 27
0.3%
151 44
0.5%
152 25
0.3%
153 25
0.3%
154 27
0.3%
155 30
0.4%
156 22
0.3%
157 19
0.2%
158 15
 
0.2%
159 9
 
0.1%
ValueCountFrequency (%)
4138 15
0.2%
4137 10
 
0.1%
4136 20
0.2%
4135 14
0.2%
4134 16
0.2%
4133 18
0.2%
4132 25
0.3%
4131 29
0.3%
4130 4
 
< 0.1%
4129 23
0.3%
Distinct439
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size65.2 KiB
2024-04-06T21:52:25.258871image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.1270698
Min length3

Characters and Unicode

Total characters26061
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st row133
2nd row133
3rd row133
4th row133
5th row133
ValueCountFrequency (%)
433 70
 
0.8%
532 53
 
0.6%
328 48
 
0.6%
620 48
 
0.6%
533 48
 
0.6%
203 46
 
0.6%
510 44
 
0.5%
132 44
 
0.5%
729 43
 
0.5%
327 43
 
0.5%
Other values (429) 7847
94.2%
2024-04-06T21:52:26.404655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 4587
17.6%
1 4048
15.5%
3 3770
14.5%
4 2990
11.5%
5 2807
10.8%
6 1805
 
6.9%
7 1795
 
6.9%
0 1226
 
4.7%
8 1151
 
4.4%
9 933
 
3.6%
Other values (4) 949
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 25112
96.4%
Uppercase Letter 839
 
3.2%
Dash Punctuation 110
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 4587
18.3%
1 4048
16.1%
3 3770
15.0%
4 2990
11.9%
5 2807
11.2%
6 1805
 
7.2%
7 1795
 
7.1%
0 1226
 
4.9%
8 1151
 
4.6%
9 933
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
K 348
41.5%
P 309
36.8%
I 182
21.7%
Dash Punctuation
ValueCountFrequency (%)
- 110
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 25222
96.8%
Latin 839
 
3.2%

Most frequent character per script

Common
ValueCountFrequency (%)
2 4587
18.2%
1 4048
16.0%
3 3770
14.9%
4 2990
11.9%
5 2807
11.1%
6 1805
 
7.2%
7 1795
 
7.1%
0 1226
 
4.9%
8 1151
 
4.6%
9 933
 
3.7%
Latin
ValueCountFrequency (%)
K 348
41.5%
P 309
36.8%
I 182
21.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 26061
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 4587
17.6%
1 4048
15.5%
3 3770
14.5%
4 2990
11.5%
5 2807
10.8%
6 1805
 
6.9%
7 1795
 
6.9%
0 1226
 
4.7%
8 1151
 
4.4%
9 933
 
3.6%
Other values (4) 949
 
3.6%

출구번호
Categorical

Distinct26
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size65.2 KiB
1
1839 
2
1543 
3
1304 
4
1136 
5
649 
Other values (21)
1863 

Length

Max length8
Median length1
Mean length1.0517159
Min length1

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 1839
22.1%
2 1543
18.5%
3 1304
15.6%
4 1136
13.6%
5 649
 
7.8%
6 592
 
7.1%
7 415
 
5.0%
8 315
 
3.8%
9 152
 
1.8%
10 129
 
1.5%
Other values (16) 260
 
3.1%

Length

2024-04-06T21:52:26.888938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 1839
22.1%
2 1543
18.5%
3 1304
15.6%
4 1136
13.6%
5 649
 
7.8%
6 592
 
7.1%
7 415
 
5.0%
8 315
 
3.8%
9 152
 
1.8%
10 129
 
1.5%
Other values (16) 260
 
3.1%
Distinct6503
Distinct (%)78.0%
Missing0
Missing (%)0.0%
Memory size65.2 KiB
2024-04-06T21:52:27.582749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length47
Median length36
Mean length6.812935
Min length2

Characters and Unicode

Total characters56779
Distinct characters669
Distinct categories14 ?
Distinct scripts4 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5295 ?
Unique (%)63.5%

Sample

1st row동자동
2nd row서울시티투어버스 타는 곳
3rd row서울역
4th row역전우체국
5th row경의선 서울역
ValueCountFrequency (%)
방면 152
 
1.6%
주민센터 111
 
1.1%
아파트 47
 
0.5%
우체국 35
 
0.4%
기업은행 30
 
0.3%
청계천 30
 
0.3%
국민건강보험공단 21
 
0.2%
현대아파트 21
 
0.2%
서울 18
 
0.2%
우리은행 16
 
0.2%
Other values (6707) 9215
95.0%
2024-04-06T21:52:28.665505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2016
 
3.6%
1896
 
3.3%
1655
 
2.9%
1369
 
2.4%
1101
 
1.9%
1027
 
1.8%
980
 
1.7%
936
 
1.6%
875
 
1.5%
855
 
1.5%
Other values (659) 44069
77.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 52328
92.2%
Decimal Number 1582
 
2.8%
Space Separator 1369
 
2.4%
Other Punctuation 419
 
0.7%
Uppercase Letter 384
 
0.7%
Close Punctuation 291
 
0.5%
Open Punctuation 290
 
0.5%
Lowercase Letter 90
 
0.2%
Dash Punctuation 10
 
< 0.1%
Math Symbol 9
 
< 0.1%
Other values (4) 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2016
 
3.9%
1896
 
3.6%
1655
 
3.2%
1101
 
2.1%
1027
 
2.0%
980
 
1.9%
936
 
1.8%
875
 
1.7%
855
 
1.6%
852
 
1.6%
Other values (586) 40135
76.7%
Uppercase Letter
ValueCountFrequency (%)
K 64
16.7%
T 47
12.2%
C 37
9.6%
S 25
 
6.5%
B 25
 
6.5%
G 24
 
6.2%
I 23
 
6.0%
A 19
 
4.9%
E 17
 
4.4%
D 16
 
4.2%
Other values (13) 87
22.7%
Lowercase Letter
ValueCountFrequency (%)
m 42
46.7%
e 13
 
14.4%
k 5
 
5.6%
c 4
 
4.4%
n 3
 
3.3%
i 3
 
3.3%
t 3
 
3.3%
b 2
 
2.2%
u 2
 
2.2%
h 2
 
2.2%
Other values (8) 11
 
12.2%
Decimal Number
ValueCountFrequency (%)
1 528
33.4%
2 339
21.4%
3 199
 
12.6%
4 135
 
8.5%
5 95
 
6.0%
0 94
 
5.9%
9 67
 
4.2%
6 53
 
3.4%
7 49
 
3.1%
8 23
 
1.5%
Other Punctuation
ValueCountFrequency (%)
, 249
59.4%
. 97
 
23.2%
? 45
 
10.7%
/ 19
 
4.5%
& 6
 
1.4%
: 2
 
0.5%
@ 1
 
0.2%
Other Number
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 278
95.5%
] 13
 
4.5%
Open Punctuation
ValueCountFrequency (%)
( 277
95.5%
[ 13
 
4.5%
Math Symbol
ValueCountFrequency (%)
~ 8
88.9%
+ 1
 
11.1%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
1369
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 10
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 52328
92.2%
Common 3974
 
7.0%
Latin 476
 
0.8%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2016
 
3.9%
1896
 
3.6%
1655
 
3.2%
1101
 
2.1%
1027
 
2.0%
980
 
1.9%
936
 
1.8%
875
 
1.7%
855
 
1.6%
852
 
1.6%
Other values (586) 40135
76.7%
Latin
ValueCountFrequency (%)
K 64
13.4%
T 47
 
9.9%
m 42
 
8.8%
C 37
 
7.8%
S 25
 
5.3%
B 25
 
5.3%
G 24
 
5.0%
I 23
 
4.8%
A 19
 
4.0%
E 17
 
3.6%
Other values (33) 153
32.1%
Common
ValueCountFrequency (%)
1369
34.4%
1 528
 
13.3%
2 339
 
8.5%
) 278
 
7.0%
( 277
 
7.0%
, 249
 
6.3%
3 199
 
5.0%
4 135
 
3.4%
. 97
 
2.4%
5 95
 
2.4%
Other values (19) 408
 
10.3%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 52321
92.1%
ASCII 4445
 
7.8%
Compat Jamo 6
 
< 0.1%
Enclosed Alphanum 3
 
< 0.1%
Number Forms 2
 
< 0.1%
None 1
 
< 0.1%
CJK 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2016
 
3.9%
1896
 
3.6%
1655
 
3.2%
1101
 
2.1%
1027
 
2.0%
980
 
1.9%
936
 
1.8%
875
 
1.7%
855
 
1.6%
852
 
1.6%
Other values (584) 40128
76.7%
ASCII
ValueCountFrequency (%)
1369
30.8%
1 528
 
11.9%
2 339
 
7.6%
) 278
 
6.3%
( 277
 
6.2%
, 249
 
5.6%
3 199
 
4.5%
4 135
 
3.0%
. 97
 
2.2%
5 95
 
2.1%
Other values (57) 879
19.8%
Compat Jamo
ValueCountFrequency (%)
6
100.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%
None
ValueCountFrequency (%)
1
100.0%
Enclosed Alphanum
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK
ValueCountFrequency (%)
1
100.0%

Interactions

2024-04-06T21:52:22.908284image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T21:52:28.841892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역코드(내부)출구번호
역코드(내부)1.0000.227
출구번호0.2271.000
2024-04-06T21:52:29.032765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역코드(내부)출구번호
역코드(내부)1.0000.092
출구번호0.0921.000

Missing values

2024-04-06T21:52:23.313139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T21:52:23.485103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

역코드(내부)역코드(외부)출구번호건물명
01501331동자동
11501331서울시티투어버스 타는 곳
21501331서울역
31501332역전우체국
41501332경의선 서울역
51501332역전파출소
61501332문화역서울284
71501332서울로
81501332국민권익위원회
91501333한국주택금융공사
역코드(내부)역코드(외부)출구번호건물명
832441389381서울한산초등학교
832541389381한산중학교
832641389381중앙보훈병원후문
832741389382중앙보훈병원
832841389382생태공원앞교차로 방면
832941389383일자산제1체육관
833041389383일자산
833141389383일자산도시자연공원(잔디광장)
833241389383강동구도시농업공원
833341389383일자산허브천문공원 방면

Duplicate rows

Most frequently occurring

역코드(내부)역코드(외부)출구번호건물명# duplicates
02242241대한법률구조공단2
12282283영락고등학교2
24194193한성대학교2
314524372서울랜드2
414524373국립현대미술관2
52561P5551하늘산성교회2
627167146상계주공1?2단지2
728148132방이복지관2
83117I1171북부소방서2