Overview

Dataset statistics

Number of variables8
Number of observations396
Missing cells16
Missing cells (%)0.5%
Duplicate rows99
Duplicate rows (%)25.0%
Total size in memory25.3 KiB
Average record size in memory65.3 B

Variable types

Categorical6
Numeric1
Text1

Alerts

Dataset has 99 (25.0%) duplicate rowsDuplicates
지사 is highly overall correlated with 이정(km) and 5 other fieldsHigh correlation
노선 is highly overall correlated with 본부 and 3 other fieldsHigh correlation
후방IC is highly overall correlated with 이정(km) and 5 other fieldsHigh correlation
본부 is highly overall correlated with 이정(km) and 4 other fieldsHigh correlation
전방IC is highly overall correlated with 이정(km) and 5 other fieldsHigh correlation
이정(km) is highly overall correlated with 본부 and 3 other fieldsHigh correlation
방향 is highly overall correlated with 지사 and 2 other fieldsHigh correlation
우회도로 has 16 (4.0%) missing valuesMissing

Reproduction

Analysis started2024-01-09 20:11:44.388788
Analysis finished2024-01-09 20:11:45.545784
Duration1.16 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

본부
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
광주전남본부
100 
대구경북본부
84 
수도권본부
84 
대전충남본부
72 
부산경남본부
56 

Length

Max length6
Median length6
Mean length5.7878788
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산경남본부
2nd row부산경남본부
3rd row부산경남본부
4th row부산경남본부
5th row부산경남본부

Common Values

ValueCountFrequency (%)
광주전남본부 100
25.3%
대구경북본부 84
21.2%
수도권본부 84
21.2%
대전충남본부 72
18.2%
부산경남본부 56
14.1%

Length

2024-01-10T05:11:45.626904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T05:11:45.756568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
광주전남본부 100
25.3%
대구경북본부 84
21.2%
수도권본부 84
21.2%
대전충남본부 72
18.2%
부산경남본부 56
14.1%

지사
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
수원지사
84 
보성지사
64 
대구지사
44 
구미지사
40 
울산지사
36 
Other values (9)
128 

Length

Max length5
Median length4
Mean length4.020202
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울산지사
2nd row서울산지사
3rd row양산지사
4th row양산지사
5th row울산지사

Common Values

ValueCountFrequency (%)
수원지사 84
21.2%
보성지사 64
16.2%
대구지사 44
11.1%
구미지사 40
10.1%
울산지사 36
9.1%
대전지사 28
 
7.1%
남원지사 28
 
7.1%
천안지사 24
 
6.1%
영동지사 20
 
5.1%
서울산지사 8
 
2.0%
Other values (4) 20
 
5.1%

Length

2024-01-10T05:11:45.899990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수원지사 84
21.2%
보성지사 64
16.2%
대구지사 44
11.1%
구미지사 40
10.1%
울산지사 36
9.1%
대전지사 28
 
7.1%
남원지사 28
 
7.1%
천안지사 24
 
6.1%
영동지사 20
 
5.1%
서울산지사 8
 
2.0%
Other values (4) 20
 
5.1%

노선
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
경부선
292 
남해선(영암순천)
64 
광주대구선
32 
남해선(순천부산)
 
8

Length

Max length9
Median length3
Mean length4.2525253
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경부선
2nd row경부선
3rd row경부선
4th row경부선
5th row경부선

Common Values

ValueCountFrequency (%)
경부선 292
73.7%
남해선(영암순천) 64
 
16.2%
광주대구선 32
 
8.1%
남해선(순천부산) 8
 
2.0%

Length

2024-01-10T05:11:46.063667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T05:11:46.209644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
경부선 292
73.7%
남해선(영암순천 64
 
16.2%
광주대구선 32
 
8.1%
남해선(순천부산 8
 
2.0%

이정(km)
Real number (ℝ)

HIGH CORRELATION 

Distinct84
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean186.85424
Minimum1
Maximum415.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.6 KiB
2024-01-10T05:11:46.361326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile19.4
Q166.1
median140.1
Q3321.4
95-th percentile410.925
Maximum415.5
Range414.5
Interquartile range (IQR)255.3

Descriptive statistics

Standard deviation138.74626
Coefficient of variation (CV)0.74253739
Kurtosis-1.4222656
Mean186.85424
Median Absolute Deviation (MAD)92.5
Skewness0.40255235
Sum73994.28
Variance19250.525
MonotonicityNot monotonic
2024-01-10T05:11:46.529290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
140.1 16
 
4.0%
104.9 8
 
2.0%
76.0 8
 
2.0%
12.0 8
 
2.0%
47.6 8
 
2.0%
70.1 8
 
2.0%
399.3 8
 
2.0%
92.8 8
 
2.0%
97.1 8
 
2.0%
103.0 8
 
2.0%
Other values (74) 308
77.8%
ValueCountFrequency (%)
1.0 4
1.0%
12.0 8
2.0%
19.2 4
1.0%
19.4 8
2.0%
23.0 4
1.0%
31.18 4
1.0%
35.07 4
1.0%
41.54 4
1.0%
43.5 4
1.0%
44.2 4
1.0%
ValueCountFrequency (%)
415.5 4
1.0%
413.8 4
1.0%
411.9 4
1.0%
411.3 8
2.0%
410.8 4
1.0%
403.1 4
1.0%
399.3 8
2.0%
398.0 4
1.0%
397.1 4
1.0%
396.5 4
1.0%

우회도로
Text

MISSING 

Distinct76
Distinct (%)20.0%
Missing16
Missing (%)4.0%
Memory size3.2 KiB
2024-01-10T05:11:46.759929image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length48
Median length35
Mean length14.263158
Min length4

Characters and Unicode

Total characters5420
Distinct characters111
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row지방도 1028호선
2nd row국도35호선
3rd row국도35호선
4th row국도35호선-농로-작업로
5th row현장길-국도35호선
ValueCountFrequency (%)
128
 
17.9%
국도2호선(보성 24
 
3.4%
17 20
 
2.8%
벌교ic 16
 
2.2%
순천만ic 16
 
2.2%
국도2호선(순천 16
 
2.2%
국지도23호선 12
 
1.7%
지방도514호 12
 
1.7%
지방도 12
 
1.7%
부체도로-국도 12
 
1.7%
Other values (90) 448
62.6%
2024-01-10T05:11:47.160240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
584
 
10.8%
360
 
6.6%
340
 
6.3%
296
 
5.5%
272
 
5.0%
244
 
4.5%
216
 
4.0%
3 200
 
3.7%
2 168
 
3.1%
4 164
 
3.0%
Other values (101) 2576
47.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3096
57.1%
Decimal Number 1056
 
19.5%
Space Separator 340
 
6.3%
Math Symbol 300
 
5.5%
Uppercase Letter 208
 
3.8%
Dash Punctuation 164
 
3.0%
Open Punctuation 100
 
1.8%
Close Punctuation 100
 
1.8%
Other Punctuation 48
 
0.9%
Other Number 8
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
584
18.9%
360
 
11.6%
296
 
9.6%
272
 
8.8%
216
 
7.0%
160
 
5.2%
116
 
3.7%
68
 
2.2%
60
 
1.9%
60
 
1.9%
Other values (76) 904
29.2%
Decimal Number
ValueCountFrequency (%)
3 200
18.9%
2 168
15.9%
4 164
15.5%
1 124
11.7%
5 104
9.8%
9 76
 
7.2%
0 72
 
6.8%
7 68
 
6.4%
8 56
 
5.3%
6 24
 
2.3%
Uppercase Letter
ValueCountFrequency (%)
C 96
46.2%
I 96
46.2%
B 8
 
3.8%
S 8
 
3.8%
Other Punctuation
ValueCountFrequency (%)
# 16
33.3%
; 16
33.3%
& 16
33.3%
Math Symbol
ValueCountFrequency (%)
244
81.3%
> 56
 
18.7%
Other Number
ValueCountFrequency (%)
4
50.0%
4
50.0%
Space Separator
ValueCountFrequency (%)
340
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 164
100.0%
Open Punctuation
ValueCountFrequency (%)
( 100
100.0%
Close Punctuation
ValueCountFrequency (%)
) 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3096
57.1%
Common 2116
39.0%
Latin 208
 
3.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
584
18.9%
360
 
11.6%
296
 
9.6%
272
 
8.8%
216
 
7.0%
160
 
5.2%
116
 
3.7%
68
 
2.2%
60
 
1.9%
60
 
1.9%
Other values (76) 904
29.2%
Common
ValueCountFrequency (%)
340
16.1%
244
11.5%
3 200
9.5%
2 168
7.9%
4 164
7.8%
- 164
7.8%
1 124
 
5.9%
5 104
 
4.9%
( 100
 
4.7%
) 100
 
4.7%
Other values (11) 408
19.3%
Latin
ValueCountFrequency (%)
C 96
46.2%
I 96
46.2%
B 8
 
3.8%
S 8
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3096
57.1%
ASCII 2072
38.2%
Arrows 244
 
4.5%
Enclosed Alphanum 8
 
0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
584
18.9%
360
 
11.6%
296
 
9.6%
272
 
8.8%
216
 
7.0%
160
 
5.2%
116
 
3.7%
68
 
2.2%
60
 
1.9%
60
 
1.9%
Other values (76) 904
29.2%
ASCII
ValueCountFrequency (%)
340
16.4%
3 200
 
9.7%
2 168
 
8.1%
4 164
 
7.9%
- 164
 
7.9%
1 124
 
6.0%
5 104
 
5.0%
( 100
 
4.8%
) 100
 
4.8%
C 96
 
4.6%
Other values (12) 512
24.7%
Arrows
ValueCountFrequency (%)
244
100.0%
Enclosed Alphanum
ValueCountFrequency (%)
4
50.0%
4
50.0%

전방IC
Categorical

HIGH CORRELATION 

Distinct43
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
경주IC
32 
판교IC
 
24
순천만IC
 
20
<NA>
 
20
남청주IC
 
20
Other values (38)
280 

Length

Max length7
Median length4
Mean length4.3838384
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row구서IC
2nd row통도사IC
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
경주IC 32
 
8.1%
판교IC 24
 
6.1%
순천만IC 20
 
5.1%
<NA> 20
 
5.1%
남청주IC 20
 
5.1%
오산IC 16
 
4.0%
순창IC 16
 
4.0%
벌교IC 16
 
4.0%
경산IC 16
 
4.0%
신갈JCT 16
 
4.0%
Other values (33) 200
50.5%

Length

2024-01-10T05:11:47.325737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경주ic 32
 
8.1%
판교ic 24
 
6.1%
순천만ic 20
 
5.1%
na 20
 
5.1%
남청주ic 20
 
5.1%
벌교ic 16
 
4.0%
경산ic 16
 
4.0%
신갈jct 16
 
4.0%
순창ic 16
 
4.0%
오산ic 16
 
4.0%
Other values (33) 200
50.5%

후방IC
Categorical

HIGH CORRELATION 

Distinct44
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
<NA>
32 
경주IC
28 
판교IC
 
24
순천만IC
 
20
안성JCT
 
20
Other values (39)
272 

Length

Max length7
Median length4
Mean length4.4242424
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row구서IC
2nd row양산IC
3rd row<NA>
4th row<NA>
5th row경주IC

Common Values

ValueCountFrequency (%)
<NA> 32
 
8.1%
경주IC 28
 
7.1%
판교IC 24
 
6.1%
순천만IC 20
 
5.1%
안성JCT 20
 
5.1%
벌교IC 16
 
4.0%
양재IC 16
 
4.0%
영천IC 16
 
4.0%
남이JCT 16
 
4.0%
지리산IC 12
 
3.0%
Other values (34) 196
49.5%

Length

2024-01-10T05:11:47.474962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 32
 
8.1%
경주ic 28
 
7.1%
판교ic 24
 
6.1%
순천만ic 20
 
5.1%
안성jct 20
 
5.1%
영천ic 16
 
4.0%
남이jct 16
 
4.0%
양재ic 16
 
4.0%
벌교ic 16
 
4.0%
지리산ic 12
 
3.0%
Other values (34) 196
49.5%

방향
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
종점
184 
기점
164 
양방향
48 

Length

Max length3
Median length2
Mean length2.1212121
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row종점
2nd row양방향
3rd row기점
4th row종점
5th row기점

Common Values

ValueCountFrequency (%)
종점 184
46.5%
기점 164
41.4%
양방향 48
 
12.1%

Length

2024-01-10T05:11:47.626108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T05:11:47.757267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
종점 184
46.5%
기점 164
41.4%
양방향 48
 
12.1%

Interactions

2024-01-10T05:11:45.187733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T05:11:47.848134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부지사노선이정(km)우회도로전방IC후방IC방향
본부1.0001.0000.6430.9741.0000.9940.9910.387
지사1.0001.0001.0000.9131.0000.9950.9970.756
노선0.6431.0001.0000.6101.0001.0001.0000.486
이정(km)0.9740.9130.6101.0000.9980.9960.9980.498
우회도로1.0001.0001.0000.9981.0000.9980.9980.979
전방IC0.9940.9951.0000.9960.9981.0000.9980.950
후방IC0.9910.9971.0000.9980.9980.9981.0000.938
방향0.3870.7560.4860.4980.9790.9500.9381.000
2024-01-10T05:11:47.993041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지사노선후방IC본부전방IC방향
지사1.0000.9870.9230.9880.9020.581
노선0.9871.0000.9440.5720.9480.483
후방IC0.9230.9441.0000.8950.9080.757
본부0.9880.5720.8951.0000.9170.317
전방IC0.9020.9480.9080.9171.0000.739
방향0.5810.4830.7570.3170.7391.000
2024-01-10T05:11:48.126745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
이정(km)본부지사노선전방IC후방IC방향
이정(km)1.0000.7670.6860.4100.9160.9320.342
본부0.7671.0000.9880.5720.9170.8950.317
지사0.6860.9881.0000.9870.9020.9230.581
노선0.4100.5720.9871.0000.9480.9440.483
전방IC0.9160.9170.9020.9481.0000.9080.739
후방IC0.9320.8950.9230.9440.9081.0000.757
방향0.3420.3170.5810.4830.7390.7571.000

Missing values

2024-01-10T05:11:45.329093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T05:11:45.495322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

본부지사노선이정(km)우회도로전방IC후방IC방향
0부산경남본부서울산지사경부선1.0<NA>구서IC구서IC종점
1부산경남본부서울산지사경부선19.2지방도 1028호선통도사IC양산IC양방향
2부산경남본부양산지사경부선19.4국도35호선<NA><NA>기점
3부산경남본부양산지사경부선19.4국도35호선<NA><NA>종점
4부산경남본부울산지사경부선43.5국도35호선-농로-작업로<NA>경주IC기점
5부산경남본부울산지사경부선44.2현장길-국도35호선경주IC<NA>종점
6부산경남본부울산지사경부선50.4국도35호선-군도28호선경주IC<NA>종점
7부산경남본부울산지사경부선52.1리도203호선-국도35호선경주IC<NA>종점
8부산경남본부울산지사경부선53.2리도203호선-국도35호선<NA>경주IC기점
9부산경남본부울산지사경부선57.2국도35호선-군도31호선-북울산IC경주IC<NA>종점
본부지사노선이정(km)우회도로전방IC후방IC방향
386광주전남본부순천지사남해선(순천부산)31.18<NA>진월IC진월IC종점
387부산경남본부진주지사남해선(순천부산)53.6지방도1002호선 접속곤양IC축동IC기점
388광주전남본부광주지사광주대구선23.0부체도로>국도24호순창IC담양IC종점
389광주전남본부남원지사광주대구선35.07지방도730순창IC남원JCT양방향
390광주전남본부남원지사광주대구선41.54<NA>순창IC남원JCT양방향
391광주전남본부남원지사광주대구선44.5<NA>순창IC남원JCT양방향
392광주전남본부남원지사광주대구선60.24이백내척길&#44;국도24남원IC동남원IC양방향
393광주전남본부남원지사광주대구선69.14농로&#44;구88선&#44;국도19동남원IC지리산IC양방향
394광주전남본부남원지사광주대구선74.12지방도743동남원IC지리산IC양방향
395광주전남본부남원지사광주대구선80.57지방도37지리산IC지리산IC양방향

Duplicate rows

Most frequently occurring

본부지사노선이정(km)우회도로전방IC후방IC방향# duplicates
0광주전남본부광주지사광주대구선23.0부체도로>국도24호순창IC담양IC종점4
1광주전남본부남원지사광주대구선35.07지방도730순창IC남원JCT양방향4
2광주전남본부남원지사광주대구선41.54<NA>순창IC남원JCT양방향4
3광주전남본부남원지사광주대구선44.5<NA>순창IC남원JCT양방향4
4광주전남본부남원지사광주대구선60.24이백내척길&#44;국도24남원IC동남원IC양방향4
5광주전남본부남원지사광주대구선69.14농로&#44;구88선&#44;국도19동남원IC지리산IC양방향4
6광주전남본부남원지사광주대구선74.12지방도743동남원IC지리산IC양방향4
7광주전남본부남원지사광주대구선80.57지방도37지리산IC지리산IC양방향4
8광주전남본부보성지사남해선(영암순천)12.0학용로->국도2호선(강진)->남성전삼거리->지방도830호선->강진무위사IC강진무위사IC학산IC종점4
9광주전남본부보성지사남해선(영암순천)12.0학용로->국도2호선(영암)->서호교차로->에프1경주장로->서영암IC서영암IC강진무위사IC기점4