Overview

Dataset statistics

Number of variables4
Number of observations1348
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory44.9 KiB
Average record size in memory34.1 B

Variable types

Numeric2
Text1
Categorical1

Dataset

Description노선_ID,노선_명칭,노선_유형,거리
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21230/S/1/datasetView.do

Alerts

노선_ID is highly overall correlated with 거리 and 1 other fieldsHigh correlation
거리 is highly overall correlated with 노선_IDHigh correlation
노선_유형 is highly overall correlated with 노선_IDHigh correlation
노선_ID has unique valuesUnique
거리 has 635 (47.1%) zerosZeros

Reproduction

Analysis started2024-05-04 02:53:18.970453
Analysis finished2024-05-04 02:53:21.903849
Duration2.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

노선_ID
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1348
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6112944 × 108
Minimum1.0000002 × 108
Maximum2.4146102 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2024-05-04T02:53:22.540152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.0000002 × 108
5-th percentile1.0010009 × 108
Q11.0010058 × 108
median1.2190001 × 108
Q32.2725008 × 108
95-th percentile2.4100588 × 108
Maximum2.4146102 × 108
Range1.41461 × 108
Interquartile range (IQR)1.271495 × 108

Descriptive statistics

Standard deviation59364451
Coefficient of variation (CV)0.3684271
Kurtosis-1.836749
Mean1.6112944 × 108
Median Absolute Deviation (MAD)21799970
Skewness0.19164509
Sum2.1720248 × 1011
Variance3.5241381 × 1015
MonotonicityStrictly decreasing
2024-05-04T02:53:23.129844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
241461015 1
 
0.1%
107900017 1
 
0.1%
107900009 1
 
0.1%
107900010 1
 
0.1%
107900011 1
 
0.1%
107900012 1
 
0.1%
107900013 1
 
0.1%
107900014 1
 
0.1%
107900015 1
 
0.1%
107900016 1
 
0.1%
Other values (1338) 1338
99.3%
ValueCountFrequency (%)
100000016 1
0.1%
100000017 1
0.1%
100000018 1
0.1%
100000020 1
0.1%
100100001 1
0.1%
100100006 1
0.1%
100100007 1
0.1%
100100008 1
0.1%
100100009 1
0.1%
100100010 1
0.1%
ValueCountFrequency (%)
241461015 1
0.1%
241461005 1
0.1%
241461002 1
0.1%
241457013 1
0.1%
241449011 1
0.1%
241449007 1
0.1%
241411001 1
0.1%
241409010 1
0.1%
241409009 1
0.1%
241409006 1
0.1%
Distinct1344
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
2024-05-04T02:53:24.201983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length5.0719585
Min length2

Characters and Unicode

Total characters6837
Distinct characters120
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1340 ?
Unique (%)99.4%

Sample

1st row김포16A
2nd row김포16-1
3rd row김포16
4th row양주15-1(구파발)
5th row양주15-1구파발
ValueCountFrequency (%)
01b 2
 
0.1%
01a 2
 
0.1%
m5443수원 2
 
0.1%
8112 2
 
0.1%
성북14-1 1
 
0.1%
성북06 1
 
0.1%
성북07 1
 
0.1%
성북10-1 1
 
0.1%
성북15 1
 
0.1%
성북14-2 1
 
0.1%
Other values (1334) 1334
99.0%
2024-05-04T02:53:25.822068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1024
15.0%
1 883
 
12.9%
2 423
 
6.2%
3 411
 
6.0%
5 348
 
5.1%
6 346
 
5.1%
7 325
 
4.8%
4 265
 
3.9%
8 179
 
2.6%
172
 
2.5%
Other values (110) 2461
36.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 4357
63.7%
Other Letter 2065
30.2%
Uppercase Letter 207
 
3.0%
Dash Punctuation 120
 
1.8%
Open Punctuation 44
 
0.6%
Close Punctuation 44
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
172
 
8.3%
171
 
8.3%
125
 
6.1%
107
 
5.2%
91
 
4.4%
87
 
4.2%
82
 
4.0%
79
 
3.8%
60
 
2.9%
52
 
2.5%
Other values (85) 1039
50.3%
Uppercase Letter
ValueCountFrequency (%)
M 46
22.2%
N 32
15.5%
G 29
14.0%
A 26
12.6%
P 26
12.6%
B 21
10.1%
O 6
 
2.9%
R 6
 
2.9%
T 6
 
2.9%
U 6
 
2.9%
Other values (2) 3
 
1.4%
Decimal Number
ValueCountFrequency (%)
0 1024
23.5%
1 883
20.3%
2 423
9.7%
3 411
9.4%
5 348
 
8.0%
6 346
 
7.9%
7 325
 
7.5%
4 265
 
6.1%
8 179
 
4.1%
9 153
 
3.5%
Dash Punctuation
ValueCountFrequency (%)
- 120
100.0%
Open Punctuation
ValueCountFrequency (%)
( 44
100.0%
Close Punctuation
ValueCountFrequency (%)
) 44
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 4565
66.8%
Hangul 2065
30.2%
Latin 207
 
3.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
172
 
8.3%
171
 
8.3%
125
 
6.1%
107
 
5.2%
91
 
4.4%
87
 
4.2%
82
 
4.0%
79
 
3.8%
60
 
2.9%
52
 
2.5%
Other values (85) 1039
50.3%
Common
ValueCountFrequency (%)
0 1024
22.4%
1 883
19.3%
2 423
9.3%
3 411
9.0%
5 348
 
7.6%
6 346
 
7.6%
7 325
 
7.1%
4 265
 
5.8%
8 179
 
3.9%
9 153
 
3.4%
Other values (3) 208
 
4.6%
Latin
ValueCountFrequency (%)
M 46
22.2%
N 32
15.5%
G 29
14.0%
A 26
12.6%
P 26
12.6%
B 21
10.1%
O 6
 
2.9%
R 6
 
2.9%
T 6
 
2.9%
U 6
 
2.9%
Other values (2) 3
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4772
69.8%
Hangul 2065
30.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1024
21.5%
1 883
18.5%
2 423
8.9%
3 411
8.6%
5 348
 
7.3%
6 346
 
7.3%
7 325
 
6.8%
4 265
 
5.6%
8 179
 
3.8%
9 153
 
3.2%
Other values (15) 415
8.7%
Hangul
ValueCountFrequency (%)
172
 
8.3%
171
 
8.3%
125
 
6.1%
107
 
5.2%
91
 
4.4%
87
 
4.2%
82
 
4.0%
79
 
3.8%
60
 
2.9%
52
 
2.5%
Other values (85) 1039
50.3%

노선_유형
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size10.7 KiB
경기
605 
마을
254 
지선
241 
간선
151 
공항
 
38
Other values (5)
 
59

Length

Max length6
Median length2
Mean length2.0178042
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기
2nd row경기
3rd row경기
4th row경기
5th row경기

Common Values

ValueCountFrequency (%)
경기 605
44.9%
마을 254
18.8%
지선 241
 
17.9%
간선 151
 
11.2%
공항 38
 
2.8%
인천 30
 
2.2%
광역 11
 
0.8%
광역(서울) 6
 
0.4%
순환 6
 
0.4%
관광 6
 
0.4%

Length

2024-05-04T02:53:26.539795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T02:53:27.026491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
경기 605
44.9%
마을 254
18.8%
지선 241
 
17.9%
간선 151
 
11.2%
공항 38
 
2.8%
인천 30
 
2.2%
광역 11
 
0.8%
광역(서울 6
 
0.4%
순환 6
 
0.4%
관광 6
 
0.4%

거리
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct456
Distinct (%)33.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.522661
Minimum0
Maximum220
Zeros635
Zeros (%)47.1%
Negative0
Negative (%)0.0%
Memory size12.0 KiB
2024-05-04T02:53:27.562579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4.15
Q325.8
95-th percentile63.68
Maximum220
Range220
Interquartile range (IQR)25.8

Descriptive statistics

Standard deviation30.912761
Coefficient of variation (CV)1.764159
Kurtosis12.775783
Mean17.522661
Median Absolute Deviation (MAD)4.15
Skewness3.180299
Sum23620.547
Variance955.59877
MonotonicityNot monotonic
2024-05-04T02:53:28.073065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 635
47.1%
7.0 11
 
0.8%
13.0 10
 
0.7%
12.0 8
 
0.6%
5.5 6
 
0.4%
39.0 6
 
0.4%
7.8 5
 
0.4%
5.9 5
 
0.4%
7.2 5
 
0.4%
20.0 5
 
0.4%
Other values (446) 652
48.4%
ValueCountFrequency (%)
0.0 635
47.1%
1.2 1
 
0.1%
1.6 1
 
0.1%
1.8 1
 
0.1%
2.0 1
 
0.1%
2.1 2
 
0.1%
2.6 4
 
0.3%
2.8 2
 
0.1%
2.9 2
 
0.1%
3.0 2
 
0.1%
ValueCountFrequency (%)
220.0 1
0.1%
207.0 1
0.1%
204.4 1
0.1%
196.0 1
0.1%
193.0 1
0.1%
192.0 1
0.1%
190.0 1
0.1%
188.0 1
0.1%
187.6 1
0.1%
184.0 2
0.1%

Interactions

2024-05-04T02:53:20.255007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T02:53:19.458046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T02:53:20.626827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-04T02:53:19.823662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-04T02:53:28.371225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선_ID노선_유형거리
노선_ID1.0000.8490.497
노선_유형0.8491.0000.877
거리0.4970.8771.000
2024-05-04T02:53:28.577594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선_ID거리노선_유형
노선_ID1.000-0.8610.662
거리-0.8611.0000.457
노선_유형0.6620.4571.000

Missing values

2024-05-04T02:53:21.314869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T02:53:21.750840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

노선_ID노선_명칭노선_유형거리
0241461015김포16A경기0.0
1241461005김포16-1경기0.0
2241461002김포16경기0.0
3241457013양주15-1(구파발)경기0.0
4241449011양주15-1구파발경기0.0
5241449007양주15-1막차경기0.0
6241411001하남01경기0.0
7241409010하남감북-01경기0.0
8241409009하남위례-01경기0.0
9241409006하남08경기0.0
노선_ID노선_명칭노선_유형거리
1338100100010105간선38.77
1339100100009104간선30.5
1340100100008103간선30.42
1341100100007102간선30.2
1342100100006101간선37.81
134310010000101A순환16.0
1344100000020청와대A01(자율주행)순환2.6
1345100000018TOUR12관광29.5
1346100000017TOUR11관광25.0
1347100000016N876간선36.0