Overview

Dataset statistics

Number of variables2
Number of observations2720
Missing cells0
Missing cells (%)0.0%
Duplicate rows682
Duplicate rows (%)25.1%
Total size in memory45.3 KiB
Average record size in memory17.0 B

Variable types

Text1
Numeric1

Dataset

Description노선명,노선
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15262/S/1/datasetView.do

Alerts

Dataset has 682 (25.1%) duplicate rowsDuplicates

Reproduction

Analysis started2024-05-11 06:16:06.410877
Analysis finished2024-05-11 06:16:06.957274
Duration0.55 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct690
Distinct (%)25.4%
Missing0
Missing (%)0.0%
Memory size21.4 KiB
2024-05-11T15:16:07.503984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length3.9466912
Min length3

Characters and Unicode

Total characters10735
Distinct characters70
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.3%

Sample

1st row0017
2nd row01A
3rd row01B
4th row0411
5th row100
ValueCountFrequency (%)
0017 4
 
0.1%
강서05-1 4
 
0.1%
관악08 4
 
0.1%
강북11 4
 
0.1%
강북12 4
 
0.1%
강서01 4
 
0.1%
강서02 4
 
0.1%
강서03 4
 
0.1%
강서04 4
 
0.1%
강서05 4
 
0.1%
Other values (680) 2680
98.5%
2024-05-11T15:16:08.361470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 1796
16.7%
0 1515
14.1%
2 986
9.2%
6 868
 
8.1%
3 735
 
6.8%
7 674
 
6.3%
5 649
 
6.0%
4 612
 
5.7%
8 241
 
2.2%
208
 
1.9%
Other values (60) 2451
22.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8234
76.7%
Other Letter 2240
 
20.9%
Uppercase Letter 207
 
1.9%
Dash Punctuation 54
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
208
 
9.3%
160
 
7.1%
128
 
5.7%
120
 
5.4%
120
 
5.4%
116
 
5.2%
100
 
4.5%
100
 
4.5%
84
 
3.8%
84
 
3.8%
Other values (42) 1020
45.5%
Decimal Number
ValueCountFrequency (%)
1 1796
21.8%
0 1515
18.4%
2 986
12.0%
6 868
10.5%
3 735
8.9%
7 674
 
8.2%
5 649
 
7.9%
4 612
 
7.4%
8 241
 
2.9%
9 158
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
N 76
36.7%
A 36
17.4%
B 31
15.0%
U 16
 
7.7%
O 16
 
7.7%
R 16
 
7.7%
T 16
 
7.7%
Dash Punctuation
ValueCountFrequency (%)
- 54
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 8288
77.2%
Hangul 2240
 
20.9%
Latin 207
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
208
 
9.3%
160
 
7.1%
128
 
5.7%
120
 
5.4%
120
 
5.4%
116
 
5.2%
100
 
4.5%
100
 
4.5%
84
 
3.8%
84
 
3.8%
Other values (42) 1020
45.5%
Common
ValueCountFrequency (%)
1 1796
21.7%
0 1515
18.3%
2 986
11.9%
6 868
10.5%
3 735
8.9%
7 674
 
8.1%
5 649
 
7.8%
4 612
 
7.4%
8 241
 
2.9%
9 158
 
1.9%
Latin
ValueCountFrequency (%)
N 76
36.7%
A 36
17.4%
B 31
15.0%
U 16
 
7.7%
O 16
 
7.7%
R 16
 
7.7%
T 16
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8495
79.1%
Hangul 2240
 
20.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 1796
21.1%
0 1515
17.8%
2 986
11.6%
6 868
10.2%
3 735
8.7%
7 674
 
7.9%
5 649
 
7.6%
4 612
 
7.2%
8 241
 
2.8%
9 158
 
1.9%
Other values (8) 261
 
3.1%
Hangul
ValueCountFrequency (%)
208
 
9.3%
160
 
7.1%
128
 
5.7%
120
 
5.4%
120
 
5.4%
116
 
5.2%
100
 
4.5%
100
 
4.5%
84
 
3.8%
84
 
3.8%
Other values (42) 1020
45.5%

노선
Real number (ℝ)

Distinct690
Distinct (%)25.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0656566 × 108
Minimum1.0000002 × 108
Maximum1.249 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size24.0 KiB
2024-05-11T15:16:08.657509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0000002 × 108
5-th percentile1.0010004 × 108
Q11.0010025 × 108
median1.0010059 × 108
Q31.13 × 108
95-th percentile1.22 × 108
Maximum1.249 × 108
Range24899986
Interquartile range (IQR)12899753

Descriptive statistics

Standard deviation8154293.5
Coefficient of variation (CV)0.07651896
Kurtosis-0.85545463
Mean1.0656566 × 108
Median Absolute Deviation (MAD)573
Skewness0.81842412
Sum2.898586 × 1011
Variance6.6492503 × 1013
MonotonicityNot monotonic
2024-05-11T15:16:08.928628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100100124 4
 
0.1%
115900002 4
 
0.1%
108900001 4
 
0.1%
108900012 4
 
0.1%
115900006 4
 
0.1%
115900003 4
 
0.1%
115900004 4
 
0.1%
115900001 4
 
0.1%
115900005 4
 
0.1%
115900008 4
 
0.1%
Other values (680) 2680
98.5%
ValueCountFrequency (%)
100000017 4
0.1%
100000018 4
0.1%
100100001 4
0.1%
100100006 4
0.1%
100100007 4
0.1%
100100008 4
0.1%
100100009 4
0.1%
100100010 4
0.1%
100100011 4
0.1%
100100012 4
0.1%
ValueCountFrequency (%)
124900003 4
0.1%
124900002 4
0.1%
124900001 4
0.1%
124000040 1
 
< 0.1%
124000039 4
0.1%
124000038 4
0.1%
124000036 4
0.1%
124000016 4
0.1%
124000015 4
0.1%
124000014 4
0.1%

Interactions

2024-05-11T15:16:06.524699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-05-11T15:16:06.758858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:16:06.892733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

노선명노선
00017100100124
101A100100001
201B106000004
30411104000012
4100100100549
5101100100006
61014100100129
71017100100130
8102100100007
91020100100131
노선명노선
2710종로03100900010
2711종로05100900011
2712종로07100900004
2713종로08100900005
2714종로09100900003
2715종로11100900007
2716종로12100900009
2717종로13100900002
2718중랑01106900001
2719중랑02106900002

Duplicate rows

Most frequently occurring

노선명노선# duplicates
000171001001244
101A1001000014
201B1060000044
304111040000124
41001001005494
51011001000064
610141001001294
710171001001304
81021001000074
910201001001314