Overview

Dataset statistics

Number of variables7
Number of observations819
Missing cells27
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory48.1 KiB
Average record size in memory60.2 B

Variable types

Numeric3
Text2
Categorical2

Dataset

Description고속도로 노선별 노드 이정 정보를 제공한다. (노드(ID), 노드명, 노선번호, 도로명, 도로등급구분코드, 상세코드명, 도로이정)
URLhttps://www.data.go.kr/data/15064247/fileData.do

Alerts

상세코드명 is highly overall correlated with 도로등급구분코드High correlation
도로등급구분코드 is highly overall correlated with 상세코드명High correlation
도로등급구분코드 is highly imbalanced (76.0%)Imbalance
상세코드명 is highly imbalanced (76.0%)Imbalance
노드(ID) has 27 (3.3%) missing valuesMissing
도로이정 has 46 (5.6%) zerosZeros

Reproduction

Analysis started2023-12-12 06:47:00.315747
Analysis finished2023-12-12 06:47:02.104313
Duration1.79 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

노드(ID)
Real number (ℝ)

MISSING 

Distinct687
Distinct (%)86.7%
Missing27
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean351.04545
Minimum1
Maximum1012
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB
2023-12-12T15:47:02.197780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile33.55
Q1170
median344.5
Q3520.25
95-th percentile663.45
Maximum1012
Range1011
Interquartile range (IQR)350.25

Descriptive statistics

Standard deviation214.23703
Coefficient of variation (CV)0.61028288
Kurtosis-0.14091405
Mean351.04545
Median Absolute Deviation (MAD)174.5
Skewness0.39350367
Sum278028
Variance45897.505
MonotonicityNot monotonic
2023-12-12T15:47:02.376512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
431 3
 
0.4%
303 3
 
0.4%
147 3
 
0.4%
405 3
 
0.4%
13 3
 
0.4%
596 3
 
0.4%
170 3
 
0.4%
369 3
 
0.4%
1008 2
 
0.2%
370 2
 
0.2%
Other values (677) 764
93.3%
(Missing) 27
 
3.3%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 2
0.2%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 2
0.2%
ValueCountFrequency (%)
1012 2
0.2%
1011 1
0.1%
1010 1
0.1%
1009 1
0.1%
1008 2
0.2%
1007 1
0.1%
1006 1
0.1%
1005 2
0.2%
1004 1
0.1%
1003 1
0.1%
Distinct702
Distinct (%)85.7%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
2023-12-12T15:47:02.819083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length4
Mean length4.5982906
Min length4

Characters and Unicode

Total characters3766
Distinct characters247
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique594 ?
Unique (%)72.5%

Sample

1st row경부선시점
2nd row구서IC
3rd row영락IC
4th row부산TG
5th row노포IC
ValueCountFrequency (%)
하이패스ic 12
 
1.4%
서평택jc 3
 
0.4%
서오산jc 3
 
0.4%
하이패스 3
 
0.4%
금호jc 3
 
0.4%
대동jc 3
 
0.4%
ic 3
 
0.4%
논산jc 3
 
0.4%
울산jct 3
 
0.4%
낙동jc 3
 
0.4%
Other values (695) 798
95.3%
2023-12-12T15:47:03.404765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 731
19.4%
I 518
 
13.8%
J 213
 
5.7%
92
 
2.4%
91
 
2.4%
78
 
2.1%
75
 
2.0%
74
 
2.0%
74
 
2.0%
T 54
 
1.4%
Other values (237) 1766
46.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2164
57.5%
Uppercase Letter 1562
41.5%
Space Separator 18
 
0.5%
Open Punctuation 6
 
0.2%
Close Punctuation 6
 
0.2%
Decimal Number 5
 
0.1%
Lowercase Letter 5
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
92
 
4.3%
91
 
4.2%
78
 
3.6%
75
 
3.5%
74
 
3.4%
74
 
3.4%
53
 
2.4%
45
 
2.1%
39
 
1.8%
38
 
1.8%
Other values (222) 1505
69.5%
Uppercase Letter
ValueCountFrequency (%)
C 731
46.8%
I 518
33.2%
J 213
 
13.6%
T 54
 
3.5%
G 45
 
2.9%
H 1
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
l 2
40.0%
u 1
20.0%
n 1
20.0%
i 1
20.0%
Decimal Number
ValueCountFrequency (%)
1 3
60.0%
8 2
40.0%
Space Separator
ValueCountFrequency (%)
18
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2164
57.5%
Latin 1567
41.6%
Common 35
 
0.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
92
 
4.3%
91
 
4.2%
78
 
3.6%
75
 
3.5%
74
 
3.4%
74
 
3.4%
53
 
2.4%
45
 
2.1%
39
 
1.8%
38
 
1.8%
Other values (222) 1505
69.5%
Latin
ValueCountFrequency (%)
C 731
46.6%
I 518
33.1%
J 213
 
13.6%
T 54
 
3.4%
G 45
 
2.9%
l 2
 
0.1%
H 1
 
0.1%
u 1
 
0.1%
n 1
 
0.1%
i 1
 
0.1%
Common
ValueCountFrequency (%)
18
51.4%
( 6
 
17.1%
) 6
 
17.1%
1 3
 
8.6%
8 2
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2164
57.5%
ASCII 1602
42.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 731
45.6%
I 518
32.3%
J 213
 
13.3%
T 54
 
3.4%
G 45
 
2.8%
18
 
1.1%
( 6
 
0.4%
) 6
 
0.4%
1 3
 
0.2%
8 2
 
0.1%
Other values (5) 6
 
0.4%
Hangul
ValueCountFrequency (%)
92
 
4.3%
91
 
4.2%
78
 
3.6%
75
 
3.5%
74
 
3.4%
74
 
3.4%
53
 
2.4%
45
 
2.1%
39
 
1.8%
38
 
1.8%
Other values (222) 1505
69.5%

노선번호
Real number (ℝ)

Distinct49
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207.1917
Minimum1
Maximum4002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.3 KiB
2023-12-12T15:47:03.600042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q117
median50
Q3130
95-th percentile655.9
Maximum4002
Range4001
Interquartile range (IQR)113

Descriptive statistics

Standard deviation593.11905
Coefficient of variation (CV)2.8626584
Kurtosis30.22653
Mean207.1917
Median Absolute Deviation (MAD)40
Skewness5.4200683
Sum169690
Variance351790.21
MonotonicityNot monotonic
2023-12-12T15:47:03.770935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
ValueCountFrequency (%)
1 68
 
8.3%
10 52
 
6.3%
35 45
 
5.5%
15 44
 
5.4%
25 44
 
5.4%
100 39
 
4.8%
50 37
 
4.5%
171 35
 
4.3%
55 35
 
4.3%
45 32
 
3.9%
Other values (39) 388
47.4%
ValueCountFrequency (%)
1 68
8.3%
10 52
6.3%
12 27
 
3.3%
15 44
5.4%
16 5
 
0.6%
17 16
 
2.0%
25 44
5.4%
27 14
 
1.7%
29 10
 
1.2%
30 19
 
2.3%
ValueCountFrequency (%)
4002 3
 
0.4%
4001 6
0.7%
4000 4
 
0.5%
3300 7
0.9%
1102 7
0.9%
700 14
1.7%
651 10
1.2%
600 11
1.3%
552 9
1.1%
551 5
 
0.6%
Distinct55
Distinct (%)6.7%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
2023-12-12T15:47:04.031477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length8
Mean length5.1880342
Min length3

Characters and Unicode

Total characters4249
Distinct characters78
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경부선
2nd row경부선
3rd row경부선
4th row경부선
5th row경부선
ValueCountFrequency (%)
경부선 68
 
8.3%
서해안선 44
 
5.4%
호남선 44
 
5.4%
수도권제1순환선 39
 
4.8%
남해선(순천-부산 38
 
4.6%
영동선 37
 
4.5%
중앙선 35
 
4.3%
용인서울선 33
 
4.0%
중부내륙선 32
 
3.9%
중부선 23
 
2.8%
Other values (45) 426
52.0%
2023-12-12T15:47:04.436495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
845
 
19.9%
221
 
5.2%
140
 
3.3%
138
 
3.2%
134
 
3.2%
126
 
3.0%
115
 
2.7%
115
 
2.7%
105
 
2.5%
103
 
2.4%
Other values (68) 2207
51.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3941
92.8%
Decimal Number 89
 
2.1%
Open Punctuation 73
 
1.7%
Dash Punctuation 73
 
1.7%
Close Punctuation 73
 
1.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
845
21.4%
221
 
5.6%
140
 
3.6%
138
 
3.5%
134
 
3.4%
126
 
3.2%
115
 
2.9%
115
 
2.9%
105
 
2.7%
103
 
2.6%
Other values (62) 1899
48.2%
Decimal Number
ValueCountFrequency (%)
1 45
50.6%
2 32
36.0%
3 12
 
13.5%
Open Punctuation
ValueCountFrequency (%)
( 73
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 73
100.0%
Close Punctuation
ValueCountFrequency (%)
) 73
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3941
92.8%
Common 308
 
7.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
845
21.4%
221
 
5.6%
140
 
3.6%
138
 
3.5%
134
 
3.4%
126
 
3.2%
115
 
2.9%
115
 
2.9%
105
 
2.7%
103
 
2.6%
Other values (62) 1899
48.2%
Common
ValueCountFrequency (%)
( 73
23.7%
- 73
23.7%
) 73
23.7%
1 45
14.6%
2 32
10.4%
3 12
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3941
92.8%
ASCII 308
 
7.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
845
21.4%
221
 
5.6%
140
 
3.6%
138
 
3.5%
134
 
3.4%
126
 
3.2%
115
 
2.9%
115
 
2.9%
105
 
2.7%
103
 
2.6%
Other values (62) 1899
48.2%
ASCII
ValueCountFrequency (%)
( 73
23.7%
- 73
23.7%
) 73
23.7%
1 45
14.6%
2 32
10.4%
3 12
 
3.9%

도로등급구분코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
101
751 
109
 
54
108
 
13
10
 
1

Length

Max length3
Median length3
Mean length2.998779
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row101
2nd row101
3rd row101
4th row101
5th row101

Common Values

ValueCountFrequency (%)
101 751
91.7%
109 54
 
6.6%
108 13
 
1.6%
10 1
 
0.1%

Length

2023-12-12T15:47:04.553886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:47:04.647047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
101 751
91.7%
109 54
 
6.6%
108 13
 
1.6%
10 1
 
0.1%

상세코드명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
고속국도
751 
민자고속국도(일반)
 
54
민자고속국도(위탁)
 
13
(null)
 
1

Length

Max length10
Median length4
Mean length4.4932845
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row고속국도
2nd row고속국도
3rd row고속국도
4th row고속국도
5th row고속국도

Common Values

ValueCountFrequency (%)
고속국도 751
91.7%
민자고속국도(일반) 54
 
6.6%
민자고속국도(위탁) 13
 
1.6%
(null) 1
 
0.1%

Length

2023-12-12T15:47:04.750715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:47:04.853079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
고속국도 751
91.7%
민자고속국도(일반 54
 
6.6%
민자고속국도(위탁 13
 
1.6%
null 1
 
0.1%

도로이정
Real number (ℝ)

ZEROS 

Distinct736
Distinct (%)89.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean96.46105
Minimum0
Maximum423
Zeros46
Zeros (%)5.6%
Negative0
Negative (%)0.0%
Memory size7.3 KiB
2023-12-12T15:47:04.976738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q114.71
median50.86
Q3141.355
95-th percentile335.989
Maximum423
Range423
Interquartile range (IQR)126.645

Descriptive statistics

Standard deviation108.79418
Coefficient of variation (CV)1.1278561
Kurtosis0.63744724
Mean96.46105
Median Absolute Deviation (MAD)44.17
Skewness1.3049488
Sum79001.6
Variance11836.174
MonotonicityNot monotonic
2023-12-12T15:47:05.116612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 46
 
5.6%
16.5 3
 
0.4%
15.3 2
 
0.2%
9.36 2
 
0.2%
21.82 2
 
0.2%
13.0 2
 
0.2%
19.3 2
 
0.2%
6.26 2
 
0.2%
3.0 2
 
0.2%
3.7 2
 
0.2%
Other values (726) 754
92.1%
ValueCountFrequency (%)
0.0 46
5.6%
0.02 1
 
0.1%
0.05 1
 
0.1%
0.2 2
 
0.2%
0.26 1
 
0.1%
0.36 1
 
0.1%
0.38 1
 
0.1%
0.42 1
 
0.1%
0.51 1
 
0.1%
0.56 1
 
0.1%
ValueCountFrequency (%)
423.0 1
0.1%
421.7 1
0.1%
420.32 1
0.1%
418.54 1
0.1%
416.05 1
0.1%
409.0 1
0.1%
408.43 1
0.1%
408.04 1
0.1%
406.94 1
0.1%
403.3 1
0.1%

Interactions

2023-12-12T15:47:01.569916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:00.820935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.179796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.671558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:00.928138image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.334680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.791878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.054051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T15:47:01.463945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T15:47:05.224796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노드(ID)노선번호도로명도로등급구분코드상세코드명도로이정
노드(ID)1.0000.2990.9030.6630.6630.445
노선번호0.2991.0001.0000.1680.1680.278
도로명0.9031.0001.0000.9550.9550.714
도로등급구분코드0.6630.1680.9551.0001.0000.160
상세코드명0.6630.1680.9551.0001.0000.160
도로이정0.4450.2780.7140.1600.1601.000
2023-12-12T15:47:05.671859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상세코드명도로등급구분코드
상세코드명1.0001.000
도로등급구분코드1.0001.000
2023-12-12T15:47:05.768451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노드(ID)노선번호도로이정도로등급구분코드상세코드명
노드(ID)1.0000.217-0.2440.3460.346
노선번호0.2171.000-0.4740.1380.138
도로이정-0.244-0.4741.0000.0950.095
도로등급구분코드0.3460.1380.0951.0001.000
상세코드명0.3460.1380.0951.0001.000

Missing values

2023-12-12T15:47:01.909683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:47:02.045703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

노드(ID)노드명노선번호도로명도로등급구분코드상세코드명도로이정
0491경부선시점1경부선101고속국도0.0
14구서IC1경부선101고속국도0.2
2446영락IC1경부선101고속국도2.02
3486부산TG1경부선101고속국도4.01
4447노포IC1경부선101고속국도5.08
5669노포JC1경부선101고속국도5.58
6155양산JC1경부선101고속국도12.86
7453양산IC1경부선101고속국도17.94
8652통도사 하이패스IC1경부선101고속국도31.0
9160통도사IC1경부선101고속국도32.05
노드(ID)노드명노선번호도로명도로등급구분코드상세코드명도로이정
809<NA>서변IC700대구외곽순환선101고속국도26.52
810<NA>연경TG700대구외곽순환선101고속국도27.72
811<NA>파군재IC700대구외곽순환선101고속국도30.52
812<NA>둔산IC700대구외곽순환선101고속국도36.75
813<NA>상매JCT700대구외곽순환선101고속국도38.46
814<NA>율암TG700대구외곽순환선101고속국도38.71
815<NA>남광산IC500광주외곽순환선101고속국도0.0
816<NA>남광산TG500광주외곽순환선101고속국도1.0
817<NA>남장성IC500광주외곽순환선101고속국도6.1
818<NA>남장성JCT500광주외곽순환선101고속국도9.7