Overview

Dataset statistics

Number of variables7
Number of observations2659
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows3
Duplicate rows (%)0.1%
Total size in memory150.7 KiB
Average record size in memory58.0 B

Variable types

Numeric2
Categorical3
Text2

Dataset

Description한국도로공사 본선 도로관리용 CCTV 관련 정보를 제공한다.(노선번호, 노선명, CCTV명, 이정, 방향, 설치목적, 운영상태)
Author한국도로공사
URLhttps://www.data.go.kr/data/15102134/fileData.do

Alerts

Dataset has 3 (0.1%) duplicate rowsDuplicates
노선번호 is highly overall correlated with 노선명High correlation
노선명 is highly overall correlated with 노선번호High correlation
설치목적 is highly imbalanced (95.0%)Imbalance
운영상태 is highly imbalanced (98.7%)Imbalance

Reproduction

Analysis started2023-12-11 23:24:12.468889
Analysis finished2023-12-11 23:24:13.760319
Duration1.29 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

노선번호
Real number (ℝ)

HIGH CORRELATION 

Distinct37
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74.49041
Minimum1
Maximum651
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.5 KiB
2023-12-12T08:24:13.843934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q115
median45
Q365
95-th percentile251
Maximum651
Range650
Interquartile range (IQR)50

Descriptive statistics

Standard deviation109.86602
Coefficient of variation (CV)1.4749015
Kurtosis12.684366
Mean74.49041
Median Absolute Deviation (MAD)25
Skewness3.3669063
Sum198070
Variance12070.542
MonotonicityNot monotonic
2023-12-12T08:24:13.982822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
1 280
 
10.5%
50 200
 
7.5%
35 199
 
7.5%
15 199
 
7.5%
45 193
 
7.3%
55 179
 
6.7%
30 133
 
5.0%
10 124
 
4.7%
251 120
 
4.5%
100 114
 
4.3%
Other values (27) 918
34.5%
ValueCountFrequency (%)
1 280
10.5%
10 124
4.7%
12 74
 
2.8%
15 199
7.5%
16 12
 
0.5%
20 60
 
2.3%
25 46
 
1.7%
27 44
 
1.7%
29 22
 
0.8%
30 133
5.0%
ValueCountFrequency (%)
651 29
 
1.1%
600 23
 
0.9%
551 7
 
0.3%
451 25
 
0.9%
400 2
 
0.1%
351 4
 
0.2%
301 9
 
0.3%
300 8
 
0.3%
253 14
 
0.5%
251 120
4.5%

노선명
Categorical

HIGH CORRELATION 

Distinct39
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size20.9 KiB
경부선
310 
서해안선
226 
영동선
200 
중부내륙선
193 
중앙선
179 
Other values (34)
1551 

Length

Max length8
Median length7
Mean length4.1259872
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row중부선
2nd row중부선
3rd row중부선
4th row중부선
5th row중부선

Common Values

ValueCountFrequency (%)
경부선 310
 
11.7%
서해안선 226
 
8.5%
영동선 200
 
7.5%
중부내륙선 193
 
7.3%
중앙선 179
 
6.7%
호남선 130
 
4.9%
남해선 124
 
4.7%
광주대구선 121
 
4.6%
중부선 116
 
4.4%
동해선 91
 
3.4%
Other values (29) 969
36.4%

Length

2023-12-12T08:24:14.150692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경부선 310
 
11.7%
서해안선 226
 
8.5%
영동선 200
 
7.5%
중부내륙선 193
 
7.3%
중앙선 179
 
6.7%
호남선 130
 
4.9%
남해선 124
 
4.7%
광주대구선 121
 
4.6%
중부선 116
 
4.4%
동해선 91
 
3.4%
Other values (29) 969
36.4%
Distinct2564
Distinct (%)96.4%
Missing0
Missing (%)0.0%
Memory size20.9 KiB
2023-12-12T08:24:14.568364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length14
Mean length3.8386612
Min length2

Characters and Unicode

Total characters10207
Distinct characters365
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2472 ?
Unique (%)93.0%

Sample

1st row호법분기점
2nd row안평
3rd row마장분기점
4th row서이천
5th row이천휴게소
ValueCountFrequency (%)
죽령졸음쉼터 4
 
0.1%
졸음쉼터 3
 
0.1%
검단 3
 
0.1%
여주분기점 3
 
0.1%
용산 2
 
0.1%
광천 2
 
0.1%
문학 2
 
0.1%
강릉분기점 2
 
0.1%
광명 2
 
0.1%
옥포분기점 2
 
0.1%
Other values (2564) 2663
99.1%
2023-12-12T08:24:15.156230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1044
 
10.2%
314
 
3.1%
2 292
 
2.9%
1 282
 
2.8%
235
 
2.3%
221
 
2.2%
192
 
1.9%
180
 
1.8%
175
 
1.7%
174
 
1.7%
Other values (355) 7098
69.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9090
89.1%
Decimal Number 698
 
6.8%
Uppercase Letter 160
 
1.6%
Open Punctuation 87
 
0.9%
Close Punctuation 87
 
0.9%
Space Separator 48
 
0.5%
Connector Punctuation 16
 
0.2%
Other Punctuation 11
 
0.1%
Dash Punctuation 8
 
0.1%
Lowercase Letter 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1044
 
11.5%
314
 
3.5%
235
 
2.6%
221
 
2.4%
192
 
2.1%
180
 
2.0%
175
 
1.9%
174
 
1.9%
161
 
1.8%
153
 
1.7%
Other values (325) 6241
68.7%
Uppercase Letter
ValueCountFrequency (%)
C 63
39.4%
I 49
30.6%
J 15
 
9.4%
S 9
 
5.6%
T 5
 
3.1%
P 5
 
3.1%
A 4
 
2.5%
B 4
 
2.5%
F 2
 
1.2%
G 2
 
1.2%
Other values (2) 2
 
1.2%
Decimal Number
ValueCountFrequency (%)
2 292
41.8%
1 282
40.4%
3 68
 
9.7%
4 32
 
4.6%
5 8
 
1.1%
7 7
 
1.0%
6 6
 
0.9%
8 2
 
0.3%
0 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
# 6
54.5%
, 5
45.5%
Lowercase Letter
ValueCountFrequency (%)
c 1
50.0%
t 1
50.0%
Open Punctuation
ValueCountFrequency (%)
( 87
100.0%
Close Punctuation
ValueCountFrequency (%)
) 87
100.0%
Space Separator
ValueCountFrequency (%)
48
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 16
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9090
89.1%
Common 955
 
9.4%
Latin 162
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1044
 
11.5%
314
 
3.5%
235
 
2.6%
221
 
2.4%
192
 
2.1%
180
 
2.0%
175
 
1.9%
174
 
1.9%
161
 
1.8%
153
 
1.7%
Other values (325) 6241
68.7%
Common
ValueCountFrequency (%)
2 292
30.6%
1 282
29.5%
( 87
 
9.1%
) 87
 
9.1%
3 68
 
7.1%
48
 
5.0%
4 32
 
3.4%
_ 16
 
1.7%
- 8
 
0.8%
5 8
 
0.8%
Other values (6) 27
 
2.8%
Latin
ValueCountFrequency (%)
C 63
38.9%
I 49
30.2%
J 15
 
9.3%
S 9
 
5.6%
T 5
 
3.1%
P 5
 
3.1%
A 4
 
2.5%
B 4
 
2.5%
F 2
 
1.2%
G 2
 
1.2%
Other values (4) 4
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9090
89.1%
ASCII 1117
 
10.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1044
 
11.5%
314
 
3.5%
235
 
2.6%
221
 
2.4%
192
 
2.1%
180
 
2.0%
175
 
1.9%
174
 
1.9%
161
 
1.8%
153
 
1.7%
Other values (325) 6241
68.7%
ASCII
ValueCountFrequency (%)
2 292
26.1%
1 282
25.2%
( 87
 
7.8%
) 87
 
7.8%
3 68
 
6.1%
C 63
 
5.6%
I 49
 
4.4%
48
 
4.3%
4 32
 
2.9%
_ 16
 
1.4%
Other values (20) 93
 
8.3%

이정
Real number (ℝ)

Distinct2280
Distinct (%)85.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.24659
Minimum0
Maximum1906
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size23.5 KiB
2023-12-12T08:24:15.340626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5.2
Q133.251
median88.3
Q3185.2
95-th percentile335.84
Maximum1906
Range1906
Interquartile range (IQR)151.949

Descriptive statistics

Standard deviation112.26601
Coefficient of variation (CV)0.91835698
Kurtosis23.21761
Mean122.24659
Median Absolute Deviation (MAD)64.9
Skewness2.2575722
Sum325053.69
Variance12603.657
MonotonicityNot monotonic
2023-12-12T08:24:15.803766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.1 6
 
0.2%
14.1 6
 
0.2%
5.2 5
 
0.2%
24.5 5
 
0.2%
15.6 5
 
0.2%
13.3 5
 
0.2%
0.6 4
 
0.2%
17.7 4
 
0.2%
13.5 4
 
0.2%
40.5 4
 
0.2%
Other values (2270) 2611
98.2%
ValueCountFrequency (%)
0.0 1
 
< 0.1%
0.018 1
 
< 0.1%
0.02 1
 
< 0.1%
0.05 1
 
< 0.1%
0.1 6
0.2%
0.11 1
 
< 0.1%
0.184 1
 
< 0.1%
0.2 1
 
< 0.1%
0.27 2
 
0.1%
0.3 1
 
< 0.1%
ValueCountFrequency (%)
1906.0 1
< 0.1%
418.6 1
< 0.1%
415.5 1
< 0.1%
413.2 1
< 0.1%
412.5 1
< 0.1%
411.4 1
< 0.1%
410.3 1
< 0.1%
409.4 1
< 0.1%
409.3 1
< 0.1%
409.1 1
< 0.1%

방향
Text

Distinct65
Distinct (%)2.4%
Missing3
Missing (%)0.1%
Memory size20.9 KiB
2023-12-12T08:24:16.073034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length2
Mean length2.0154367
Min length2

Characters and Unicode

Total characters5353
Distinct characters80
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)0.2%

Sample

1st row하남
2nd row하남
3rd row남이
4th row남이
5th row하남
ValueCountFrequency (%)
서울 367
 
13.8%
부산 265
 
10.0%
순천 170
 
6.4%
인천 132
 
5.0%
춘천 119
 
4.5%
창원 109
 
4.1%
양평 105
 
4.0%
목포 96
 
3.6%
강릉 87
 
3.3%
대구 82
 
3.1%
Other values (54) 1124
42.3%
2023-12-12T08:24:16.482104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
553
 
10.3%
409
 
7.6%
402
 
7.5%
382
 
7.1%
265
 
5.0%
171
 
3.2%
170
 
3.2%
170
 
3.2%
152
 
2.8%
147
 
2.7%
Other values (70) 2532
47.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5352
> 99.9%
Space Separator 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
553
 
10.3%
409
 
7.6%
402
 
7.5%
382
 
7.1%
265
 
5.0%
171
 
3.2%
170
 
3.2%
170
 
3.2%
152
 
2.8%
147
 
2.7%
Other values (69) 2531
47.3%
Space Separator
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5352
> 99.9%
Common 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
553
 
10.3%
409
 
7.6%
402
 
7.5%
382
 
7.1%
265
 
5.0%
171
 
3.2%
170
 
3.2%
170
 
3.2%
152
 
2.8%
147
 
2.7%
Other values (69) 2531
47.3%
Common
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5352
> 99.9%
ASCII 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
553
 
10.3%
409
 
7.6%
402
 
7.5%
382
 
7.1%
265
 
5.0%
171
 
3.2%
170
 
3.2%
170
 
3.2%
152
 
2.8%
147
 
2.7%
Other values (69) 2531
47.3%
ASCII
ValueCountFrequency (%)
1
100.0%

설치목적
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size20.9 KiB
교통관리
2644 
교통관리/졸음쉼터
 
15

Length

Max length9
Median length4
Mean length4.0282061
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row교통관리
2nd row교통관리
3rd row교통관리
4th row교통관리
5th row교통관리

Common Values

ValueCountFrequency (%)
교통관리 2644
99.4%
교통관리/졸음쉼터 15
 
0.6%

Length

2023-12-12T08:24:16.642109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:24:16.748364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
교통관리 2644
99.4%
교통관리/졸음쉼터 15
 
0.6%

운영상태
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size20.9 KiB
정상
2654 
<NA>
 
4
가동중지
 
1

Length

Max length4
Median length2
Mean length2.0037608
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row정상
2nd row정상
3rd row정상
4th row정상
5th row정상

Common Values

ValueCountFrequency (%)
정상 2654
99.8%
<NA> 4
 
0.2%
가동중지 1
 
< 0.1%

Length

2023-12-12T08:24:16.901619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:24:17.039279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정상 2654
99.8%
na 4
 
0.2%
가동중지 1
 
< 0.1%

Interactions

2023-12-12T08:24:13.305441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:24:13.081060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:24:13.400649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:24:13.190643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:24:17.133435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선번호노선명이정방향설치목적운영상태
노선번호1.0000.9840.2430.9480.0000.000
노선명0.9841.0000.6550.9940.0630.000
이정0.2430.6551.0000.5820.0000.000
방향0.9480.9940.5821.0000.0410.000
설치목적0.0000.0630.0000.0411.0000.000
운영상태0.0000.0000.0000.0000.0001.000
2023-12-12T08:24:17.251343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설치목적운영상태노선명
설치목적1.0000.0000.052
운영상태0.0001.0000.000
노선명0.0520.0001.000
2023-12-12T08:24:17.355725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
노선번호이정노선명설치목적운영상태
노선번호1.000-0.3880.8770.0000.000
이정-0.3881.0000.3930.0000.000
노선명0.8770.3931.0000.0520.000
설치목적0.0000.0000.0521.0000.000
운영상태0.0000.0000.0000.0001.000

Missing values

2023-12-12T08:24:13.576420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:24:13.710665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

노선번호노선명CCTV명이정방향설치목적운영상태
035중부선호법분기점323.745하남교통관리정상
135중부선안평325.3하남교통관리정상
235중부선마장분기점325.981남이교통관리정상
335중부선서이천329.825남이교통관리정상
435중부선이천휴게소331.57하남교통관리정상
535중부선용면교332.359남이교통관리정상
635중부선용면333.734하남교통관리정상
735중부선진우335.004남이교통관리정상
835중부선곤지암339.916남이교통관리정상
935중부선늑현육교341.834하남교통관리정상
노선번호노선명CCTV명이정방향설치목적운영상태
264916울산선울산시점1.3울산교통관리정상
265016울산선장촌2.4울산교통관리정상
265155중앙선원창4교378.14부산교통관리정상
265216울산선울산졸음쉼터4.0울산교통관리정상
265316울산선반연육교5.0울산교통관리정상
265415서해안선홍원육교287.51목포교통관리정상
265515서해안선구포2교312.03서울교통관리정상
265616울산선입암9.5언양교통관리정상
265740평택제천선청룡교22.0제천교통관리정상
265816울산선울산종점13.53언양교통관리정상

Duplicate rows

Most frequently occurring

노선번호노선명CCTV명이정방향설치목적운영상태# duplicates
055중앙선죽령졸음쉼터243.8부산교통관리/졸음쉼터정상2
155중앙선죽령졸음쉼터244.1춘천교통관리/졸음쉼터정상2
260서울양양선남춘천영업소56.27서울교통관리정상2