Overview

Dataset statistics

Number of variables8
Number of observations5348
Missing cells0
Missing cells (%)0.0%
Duplicate rows50
Duplicate rows (%)0.9%
Total size in memory344.8 KiB
Average record size in memory66.0 B

Variable types

Categorical3
Text3
Numeric2

Dataset

Description한국도로공사 고속도로 터널 도로관리용 CCTV 관련 정보를 제공한다.(본부, 지사, 노선번호, 노선명, 터널명, 이정, 방향, 위치)
Author한국도로공사
URLhttps://www.data.go.kr/data/15102136/fileData.do

Alerts

Dataset has 50 (0.9%) duplicate rowsDuplicates
노선번호 is highly overall correlated with 노선명High correlation
이정 is highly overall correlated with 노선명High correlation
본부 is highly overall correlated with 노선명High correlation
노선명 is highly overall correlated with 노선번호 and 2 other fieldsHigh correlation
위치 is highly imbalanced (56.4%)Imbalance

Reproduction

Analysis started2023-12-12 04:07:47.436613
Analysis finished2023-12-12 04:07:49.093504
Duration1.66 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

본부
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
부산경남
1460 
강원
1116 
광주전남
832 
대구경북
709 
충북
480 
Other values (3)
751 

Length

Max length4
Median length4
Mean length3.2378459
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수도권
2nd row수도권
3rd row수도권
4th row수도권
5th row수도권

Common Values

ValueCountFrequency (%)
부산경남 1460
27.3%
강원 1116
20.9%
광주전남 832
15.6%
대구경북 709
13.3%
충북 480
 
9.0%
전북 298
 
5.6%
수도권 288
 
5.4%
대전충남 165
 
3.1%

Length

2023-12-12T13:07:49.203844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:07:49.319776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산경남 1460
27.3%
강원 1116
20.9%
광주전남 832
15.6%
대구경북 709
13.3%
충북 480
 
9.0%
전북 298
 
5.6%
수도권 288
 
5.4%
대전충남 165
 
3.1%

지사
Text

Distinct53
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
2023-12-12T13:07:49.517342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length2
Mean length2.1286462
Min length2

Characters and Unicode

Total characters11384
Distinct characters56
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경기광주
2nd row경기광주
3rd row경기광주
4th row경기광주
5th row경기광주
ValueCountFrequency (%)
양양 533
 
10.0%
서울산 431
 
8.1%
청송 348
 
6.5%
양산 330
 
6.2%
보성 255
 
4.8%
경주 243
 
4.5%
울산 234
 
4.4%
구례 225
 
4.2%
춘천 212
 
4.0%
진안 188
 
3.5%
Other values (43) 2349
43.9%
2023-12-12T13:07:49.919860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1514
 
13.3%
1027
 
9.0%
764
 
6.7%
737
 
6.5%
530
 
4.7%
522
 
4.6%
420
 
3.7%
380
 
3.3%
352
 
3.1%
348
 
3.1%
Other values (46) 4790
42.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11384
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1514
 
13.3%
1027
 
9.0%
764
 
6.7%
737
 
6.5%
530
 
4.7%
522
 
4.6%
420
 
3.7%
380
 
3.3%
352
 
3.1%
348
 
3.1%
Other values (46) 4790
42.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11384
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1514
 
13.3%
1027
 
9.0%
764
 
6.7%
737
 
6.5%
530
 
4.7%
522
 
4.6%
420
 
3.7%
380
 
3.3%
352
 
3.1%
348
 
3.1%
Other values (46) 4790
42.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11384
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1514
 
13.3%
1027
 
9.0%
764
 
6.7%
737
 
6.5%
530
 
4.7%
522
 
4.6%
420
 
3.7%
380
 
3.3%
352
 
3.1%
348
 
3.1%
Other values (46) 4790
42.1%

노선번호
Real number (ℝ)

HIGH CORRELATION 

Distinct32
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean123.47307
Minimum1
Maximum651
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.1 KiB
2023-12-12T13:07:50.069917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile12
Q130
median60
Q3101
95-th percentile600
Maximum651
Range650
Interquartile range (IQR)71

Descriptive statistics

Standard deviation174.00892
Coefficient of variation (CV)1.4092864
Kurtosis3.0777088
Mean123.47307
Median Absolute Deviation (MAD)31
Skewness2.1399776
Sum660334
Variance30279.104
MonotonicityNot monotonic
2023-12-12T13:07:50.268796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
60 710
13.3%
600 486
 
9.1%
30 458
 
8.6%
141 431
 
8.1%
65 379
 
7.1%
45 321
 
6.0%
27 312
 
5.8%
101 255
 
4.8%
50 212
 
4.0%
55 202
 
3.8%
Other values (22) 1582
29.6%
ValueCountFrequency (%)
1 18
 
0.3%
10 165
 
3.1%
12 175
 
3.3%
15 98
 
1.8%
20 194
3.6%
25 14
 
0.3%
27 312
5.8%
29 60
 
1.1%
30 458
8.6%
35 196
3.7%
ValueCountFrequency (%)
651 32
 
0.6%
600 486
9.1%
551 46
 
0.9%
451 25
 
0.5%
400 20
 
0.4%
300 28
 
0.5%
253 114
 
2.1%
251 30
 
0.6%
151 18
 
0.3%
150 2
 
< 0.1%

노선명
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
서울양양선
710 
부산외곽선
486 
함양울산선
431 
당진영덕선
381 
동해선
379 
Other values (26)
2961 

Length

Max length8
Median length5
Mean length4.6136874
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row중부선
2nd row중부선
3rd row중부선
4th row중부선
5th row중부선

Common Values

ValueCountFrequency (%)
서울양양선 710
13.3%
부산외곽선 486
 
9.1%
함양울산선 431
 
8.1%
당진영덕선 381
 
7.1%
동해선 379
 
7.1%
중부내륙선 321
 
6.0%
순천완주선 312
 
5.8%
영암순천선 255
 
4.8%
영동선 212
 
4.0%
중앙선 202
 
3.8%
Other values (21) 1659
31.0%

Length

2023-12-12T13:07:50.433090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울양양선 710
13.3%
부산외곽선 486
 
9.1%
함양울산선 431
 
8.1%
당진영덕선 381
 
7.1%
동해선 379
 
7.1%
중부내륙선 321
 
6.0%
순천완주선 312
 
5.8%
영암순천선 255
 
4.8%
영동선 212
 
4.0%
중앙선 202
 
3.8%
Other values (21) 1659
31.0%
Distinct543
Distinct (%)10.2%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
2023-12-12T13:07:50.888230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length3
Mean length2.9454001
Min length2

Characters and Unicode

Total characters15752
Distinct characters206
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.2%

Sample

1st row중부3
2nd row중부3
3rd row중부2
4th row중부1
5th row중부3
ValueCountFrequency (%)
인제양양 235
 
4.4%
금정 156
 
2.9%
신불산 133
 
2.5%
신어산 88
 
1.6%
재약산 71
 
1.3%
양북1 69
 
1.3%
단장4 50
 
0.9%
단장2 50
 
0.9%
문수산 49
 
0.9%
죽령 48
 
0.9%
Other values (527) 4399
82.3%
2023-12-12T13:07:51.541918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 887
 
5.6%
2 723
 
4.6%
687
 
4.4%
676
 
4.3%
3 533
 
3.4%
4 365
 
2.3%
291
 
1.8%
291
 
1.8%
283
 
1.8%
283
 
1.8%
Other values (196) 10733
68.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 12638
80.2%
Decimal Number 2882
 
18.3%
Space Separator 224
 
1.4%
Other Punctuation 6
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
687
 
5.4%
676
 
5.3%
291
 
2.3%
291
 
2.3%
283
 
2.2%
283
 
2.2%
264
 
2.1%
261
 
2.1%
252
 
2.0%
252
 
2.0%
Other values (183) 9098
72.0%
Decimal Number
ValueCountFrequency (%)
1 887
30.8%
2 723
25.1%
3 533
18.5%
4 365
12.7%
5 171
 
5.9%
6 98
 
3.4%
7 48
 
1.7%
9 37
 
1.3%
8 14
 
0.5%
0 6
 
0.2%
Space Separator
ValueCountFrequency (%)
224
100.0%
Other Punctuation
ValueCountFrequency (%)
, 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 12638
80.2%
Common 3114
 
19.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
687
 
5.4%
676
 
5.3%
291
 
2.3%
291
 
2.3%
283
 
2.2%
283
 
2.2%
264
 
2.1%
261
 
2.1%
252
 
2.0%
252
 
2.0%
Other values (183) 9098
72.0%
Common
ValueCountFrequency (%)
1 887
28.5%
2 723
23.2%
3 533
17.1%
4 365
11.7%
224
 
7.2%
5 171
 
5.5%
6 98
 
3.1%
7 48
 
1.5%
9 37
 
1.2%
8 14
 
0.4%
Other values (3) 14
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12638
80.2%
ASCII 3114
 
19.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 887
28.5%
2 723
23.2%
3 533
17.1%
4 365
11.7%
224
 
7.2%
5 171
 
5.5%
6 98
 
3.1%
7 48
 
1.5%
9 37
 
1.2%
8 14
 
0.4%
Other values (3) 14
 
0.4%
Hangul
ValueCountFrequency (%)
687
 
5.4%
676
 
5.3%
291
 
2.3%
291
 
2.3%
283
 
2.2%
283
 
2.2%
264
 
2.1%
261
 
2.1%
252
 
2.0%
252
 
2.0%
Other values (183) 9098
72.0%

이정
Real number (ℝ)

HIGH CORRELATION 

Distinct4227
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.536332
Minimum0.4
Maximum407.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size47.1 KiB
2023-12-12T13:07:51.736863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.4
5-th percentile6.847
Q134.0075
median92.46
Q3132.3675
95-th percentile245.1
Maximum407.6
Range407.2
Interquartile range (IQR)98.36

Descriptive statistics

Standard deviation76.479746
Coefficient of variation (CV)0.78411546
Kurtosis1.4096874
Mean97.536332
Median Absolute Deviation (MAD)49.635
Skewness1.1217387
Sum521624.31
Variance5849.1515
MonotonicityNot monotonic
2023-12-12T13:07:51.920142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
127.3 6
 
0.1%
32.0 6
 
0.1%
44.5 6
 
0.1%
406.5 6
 
0.1%
121.8 5
 
0.1%
126.4 5
 
0.1%
2.6 5
 
0.1%
10.1 5
 
0.1%
50.7 5
 
0.1%
114.0 5
 
0.1%
Other values (4217) 5294
99.0%
ValueCountFrequency (%)
0.4 1
< 0.1%
0.58 1
< 0.1%
0.7 1
< 0.1%
0.75 1
< 0.1%
0.85 1
< 0.1%
0.94 1
< 0.1%
0.97 1
< 0.1%
1.0 1
< 0.1%
1.01 1
< 0.1%
1.02 1
< 0.1%
ValueCountFrequency (%)
407.6 1
 
< 0.1%
406.5 6
0.1%
378.87 1
 
< 0.1%
378.68 1
 
< 0.1%
377.3 1
 
< 0.1%
376.92 1
 
< 0.1%
376.8 1
 
< 0.1%
376.75 1
 
< 0.1%
376.63 1
 
< 0.1%
376.6 1
 
< 0.1%

방향
Text

Distinct57
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
2023-12-12T13:07:52.208100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.0093493
Min length2

Characters and Unicode

Total characters10746
Distinct characters70
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row남이
2nd row남이
3rd row남이
4th row남이
5th row하남
ValueCountFrequency (%)
서울 448
 
8.4%
창원 401
 
7.5%
순천 347
 
6.5%
울산 347
 
6.5%
양양 309
 
5.8%
기장 244
 
4.6%
함양 219
 
4.1%
청주 218
 
4.1%
부산 211
 
3.9%
영덕 207
 
3.9%
Other values (47) 2397
44.8%
2023-12-12T13:07:52.660235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1095
 
10.2%
796
 
7.4%
757
 
7.0%
674
 
6.3%
518
 
4.8%
471
 
4.4%
457
 
4.3%
430
 
4.0%
407
 
3.8%
348
 
3.2%
Other values (60) 4793
44.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10731
99.9%
Open Punctuation 5
 
< 0.1%
Close Punctuation 5
 
< 0.1%
Decimal Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1095
 
10.2%
796
 
7.4%
757
 
7.1%
674
 
6.3%
518
 
4.8%
471
 
4.4%
457
 
4.3%
430
 
4.0%
407
 
3.8%
348
 
3.2%
Other values (56) 4778
44.5%
Decimal Number
ValueCountFrequency (%)
1 3
60.0%
2 2
40.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10731
99.9%
Common 15
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1095
 
10.2%
796
 
7.4%
757
 
7.1%
674
 
6.3%
518
 
4.8%
471
 
4.4%
457
 
4.3%
430
 
4.0%
407
 
3.8%
348
 
3.2%
Other values (56) 4778
44.5%
Common
ValueCountFrequency (%)
( 5
33.3%
) 5
33.3%
1 3
20.0%
2 2
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10731
99.9%
ASCII 15
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1095
 
10.2%
796
 
7.4%
757
 
7.1%
674
 
6.3%
518
 
4.8%
471
 
4.4%
457
 
4.3%
430
 
4.0%
407
 
3.8%
348
 
3.2%
Other values (56) 4778
44.5%
ASCII
ValueCountFrequency (%)
( 5
33.3%
) 5
33.3%
1 3
20.0%
2 2
 
13.3%

위치
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size41.9 KiB
내부
4867 
외부
 
481

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row내부
2nd row내부
3rd row내부
4th row내부
5th row내부

Common Values

ValueCountFrequency (%)
내부 4867
91.0%
외부 481
 
9.0%

Length

2023-12-12T13:07:52.808962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:07:52.948082image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
내부 4867
91.0%
외부 481
 
9.0%

Interactions

2023-12-12T13:07:48.607171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:07:48.375238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:07:48.723779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:07:48.497318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:07:53.041441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부지사노선번호노선명이정방향위치
본부1.0001.0000.8350.9710.5710.9590.063
지사1.0001.0000.9730.9960.9530.9830.135
노선번호0.8350.9731.0000.9950.4940.9900.102
노선명0.9710.9960.9951.0000.8530.9930.108
이정0.5710.9530.4940.8531.0000.8100.049
방향0.9590.9830.9900.9930.8101.0000.150
위치0.0630.1350.1020.1080.0490.1501.000
2023-12-12T13:07:53.184012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부위치노선명
본부1.0000.0470.836
위치0.0471.0000.091
노선명0.8360.0911.000
2023-12-12T13:07:53.279398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선번호이정본부노선명위치
노선번호1.000-0.2980.4250.9590.076
이정-0.2981.0000.3160.5070.037
본부0.4250.3161.0000.8360.047
노선명0.9590.5070.8361.0000.091
위치0.0760.0370.0470.0911.000

Missing values

2023-12-12T13:07:48.882459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:07:49.021225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

본부지사노선번호노선명터널명이정방향위치
0수도권경기광주35중부선중부3349.46남이내부
1수도권경기광주35중부선중부3349.58남이내부
2수도권경기광주35중부선중부2350.9남이내부
3수도권경기광주35중부선중부1355.14남이내부
4수도권경기광주35중부선중부3349.2하남내부
5수도권경기광주35중부선중부3349.3하남내부
6수도권경기광주35중부선중부2350.7하남내부
7수도권경기광주35중부선중부1354.88하남내부
8수도권경기광주37제2중부선하번천349.3이천내부
9수도권경기광주37제2중부선하번천349.5이천내부
본부지사노선번호노선명터널명이정방향위치
5338부산경남울산600부산외곽선철마446.46창원내부
5339부산경남울산600부산외곽선철마446.55창원내부
5340부산경남울산600부산외곽선철마446.63창원내부
5341부산경남울산600부산외곽선철마446.71창원내부
5342부산경남울산600부산외곽선철마446.8창원내부
5343부산경남울산600부산외곽선철마446.88창원내부
5344부산경남울산600부산외곽선철마446.96창원내부
5345부산경남울산600부산외곽선철마447.01창원내부
5346부산경남울산600부산외곽선철마447.08창원내부
5347부산경남울산600부산외곽선철마447.48창원외부

Duplicate rows

Most frequently occurring

본부지사노선번호노선명터널명이정방향위치# duplicates
45수도권수원1경부선판교406.5부산내부6
0강원양양60서울양양선인제양양133.73사갱내부4
1광주전남구례27순천완주선태방2.5순천외부2
2광주전남보성101영암순천선해룡104.88영암(중분대)외부2
3부산경남경주65동해선범서356.48포항외부2
4부산경남경주65동해선양북176.197포항내부2
5부산경남경주65동해선양북387.98포항외부2
6부산경남경주65동해선양북588.46포항외부2
7부산경남경주65동해선오천392.16포항외부2
8부산경남경주65동해선오천593.66울산외부2