Overview

Dataset statistics

Number of variables5
Number of observations520
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.9 KiB
Average record size in memory41.3 B

Variable types

Categorical2
Text2
Numeric1

Dataset

Description본 데이터는 한국도로공사 본부, 지사별에 오수처리시설 설치위치 현황을 나타내는 데이터입니다. 해당 데이터는 오수처리 시설 처리 용량을 포함한 데이터입니다.
URLhttps://www.data.go.kr/data/15085960/fileData.do

Alerts

용량 is highly overall correlated with 구분High correlation
구분 is highly overall correlated with 용량High correlation

Reproduction

Analysis started2023-12-12 14:56:28.623391
Analysis finished2023-12-12 14:56:29.259755
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

본부
Categorical

Distinct8
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
부산경남본부
94 
대구경북본부
73 
강원본부
72 
광주전남본부
72 
충북본부
65 
Other values (3)
144 

Length

Max length6
Median length6
Mean length5.1788462
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수도권본부
2nd row수도권본부
3rd row강원본부
4th row강원본부
5th row강원본부

Common Values

ValueCountFrequency (%)
부산경남본부 94
18.1%
대구경북본부 73
14.0%
강원본부 72
13.8%
광주전남본부 72
13.8%
충북본부 65
12.5%
전북본부 55
10.6%
대전충남본부 46
8.8%
수도권본부 43
8.3%

Length

2023-12-12T23:56:29.662552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:56:29.825046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산경남본부 94
18.1%
대구경북본부 73
14.0%
강원본부 72
13.8%
광주전남본부 72
13.8%
충북본부 65
12.5%
전북본부 55
10.6%
대전충남본부 46
8.8%
수도권본부 43
8.3%

지사
Text

Distinct96
Distinct (%)18.5%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
2023-12-12T23:56:30.063581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length4
Mean length4.4961538
Min length4

Characters and Unicode

Total characters2338
Distinct characters66
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)2.7%

Sample

1st row수원지사
2nd row경기광주지사
3rd row원주지사
4th row원주지사
5th row원주지사
ValueCountFrequency (%)
보성지사 17
 
3.3%
구례지사 17
 
3.3%
엄정지사 16
 
3.1%
춘천지사 15
 
2.9%
양산지사 15
 
2.9%
서울산지사 15
 
2.9%
영천지사 14
 
2.7%
진주지사 14
 
2.7%
진안지사 13
 
2.5%
울산지사 13
 
2.5%
Other values (50) 371
71.3%
2023-12-12T23:56:30.471916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
518
22.2%
518
22.2%
196
 
8.4%
105
 
4.5%
77
 
3.3%
55
 
2.4%
45
 
1.9%
44
 
1.9%
43
 
1.8%
41
 
1.8%
Other values (56) 696
29.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2136
91.4%
Space Separator 196
 
8.4%
Open Punctuation 3
 
0.1%
Close Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
518
24.3%
518
24.3%
105
 
4.9%
77
 
3.6%
55
 
2.6%
45
 
2.1%
44
 
2.1%
43
 
2.0%
41
 
1.9%
39
 
1.8%
Other values (53) 651
30.5%
Space Separator
ValueCountFrequency (%)
196
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2136
91.4%
Common 202
 
8.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
518
24.3%
518
24.3%
105
 
4.9%
77
 
3.6%
55
 
2.6%
45
 
2.1%
44
 
2.1%
43
 
2.0%
41
 
1.9%
39
 
1.8%
Other values (53) 651
30.5%
Common
ValueCountFrequency (%)
196
97.0%
( 3
 
1.5%
) 3
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2136
91.4%
ASCII 202
 
8.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
518
24.3%
518
24.3%
105
 
4.9%
77
 
3.6%
55
 
2.6%
45
 
2.1%
44
 
2.1%
43
 
2.0%
41
 
1.9%
39
 
1.8%
Other values (53) 651
30.5%
ASCII
ValueCountFrequency (%)
196
97.0%
( 3
 
1.5%
) 3
 
1.5%

구분
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
영업소
282 
휴게소
98 
터널관리동
98 
지사
42 

Length

Max length7
Median length3
Mean length3.6730769
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row휴게소
2nd row휴게소
3rd row휴게소
4th row휴게소
5th row휴게소

Common Values

ValueCountFrequency (%)
영업소 282
54.2%
휴게소 98
 
18.8%
터널관리동 98
 
18.8%
지사 42
 
8.1%

Length

2023-12-12T23:56:30.636797image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:56:30.749652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
영업소 282
54.2%
휴게소 98
 
18.8%
터널관리동 98
 
18.8%
지사 42
 
8.1%
Distinct518
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
2023-12-12T23:56:31.076555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length6.2461538
Min length4

Characters and Unicode

Total characters3248
Distinct characters219
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique516 ?
Unique (%)99.2%

Sample

1st row안성(서울)휴게소
2nd row이천(하남)휴게소
3rd row문막(강릉)휴게소
4th row횡성(인천)휴게소
5th row횡성(강릉)휴게소
ValueCountFrequency (%)
진주지사(본관동 2
 
0.4%
2
 
0.4%
2
 
0.4%
2
 
0.4%
2
 
0.4%
서서울영업소 2
 
0.4%
단장4 1
 
0.2%
북의성영업소 1
 
0.2%
문수영업소 1
 
0.2%
선산영업소 1
 
0.2%
Other values (521) 521
97.0%
2023-12-12T23:56:31.605384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
388
 
11.9%
306
 
9.4%
282
 
8.7%
229
 
7.1%
( 106
 
3.3%
) 106
 
3.3%
98
 
3.0%
98
 
3.0%
86
 
2.6%
72
 
2.2%
Other values (209) 1477
45.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2741
84.4%
Space Separator 229
 
7.1%
Open Punctuation 106
 
3.3%
Close Punctuation 106
 
3.3%
Decimal Number 55
 
1.7%
Other Symbol 6
 
0.2%
Other Punctuation 5
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
388
 
14.2%
306
 
11.2%
282
 
10.3%
98
 
3.6%
98
 
3.6%
86
 
3.1%
72
 
2.6%
62
 
2.3%
61
 
2.2%
56
 
2.0%
Other values (199) 1232
44.9%
Decimal Number
ValueCountFrequency (%)
1 17
30.9%
2 13
23.6%
3 12
21.8%
4 7
12.7%
5 6
 
10.9%
Space Separator
ValueCountFrequency (%)
229
100.0%
Open Punctuation
ValueCountFrequency (%)
( 106
100.0%
Close Punctuation
ValueCountFrequency (%)
) 106
100.0%
Other Symbol
ValueCountFrequency (%)
6
100.0%
Other Punctuation
ValueCountFrequency (%)
, 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2747
84.6%
Common 501
 
15.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
388
 
14.1%
306
 
11.1%
282
 
10.3%
98
 
3.6%
98
 
3.6%
86
 
3.1%
72
 
2.6%
62
 
2.3%
61
 
2.2%
56
 
2.0%
Other values (200) 1238
45.1%
Common
ValueCountFrequency (%)
229
45.7%
( 106
21.2%
) 106
21.2%
1 17
 
3.4%
2 13
 
2.6%
3 12
 
2.4%
4 7
 
1.4%
5 6
 
1.2%
, 5
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2741
84.4%
ASCII 501
 
15.4%
None 6
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
388
 
14.2%
306
 
11.2%
282
 
10.3%
98
 
3.6%
98
 
3.6%
86
 
3.1%
72
 
2.6%
62
 
2.3%
61
 
2.2%
56
 
2.0%
Other values (199) 1232
44.9%
ASCII
ValueCountFrequency (%)
229
45.7%
( 106
21.2%
) 106
21.2%
1 17
 
3.4%
2 13
 
2.6%
3 12
 
2.4%
4 7
 
1.4%
5 6
 
1.2%
, 5
 
1.0%
None
ValueCountFrequency (%)
6
100.0%

용량
Real number (ℝ)

HIGH CORRELATION 

Distinct52
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.857692
Minimum1
Maximum900
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.7 KiB
2023-12-12T23:56:31.793408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q18
median10
Q330
95-th percentile301
Maximum900
Range899
Interquartile range (IQR)22

Descriptive statistics

Standard deviation121.39478
Coefficient of variation (CV)1.9624848
Kurtosis10.425751
Mean61.857692
Median Absolute Deviation (MAD)5
Skewness2.9131971
Sum32166
Variance14736.693
MonotonicityNot monotonic
2023-12-12T23:56:31.951805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10 179
34.4%
5 68
 
13.1%
3 33
 
6.3%
8 27
 
5.2%
300 20
 
3.8%
12 14
 
2.7%
14 13
 
2.5%
250 11
 
2.1%
35 11
 
2.1%
20 11
 
2.1%
Other values (42) 133
25.6%
ValueCountFrequency (%)
1 2
 
0.4%
2 5
 
1.0%
3 33
 
6.3%
4 4
 
0.8%
5 68
 
13.1%
6 9
 
1.7%
7 3
 
0.6%
8 27
 
5.2%
9 2
 
0.4%
10 179
34.4%
ValueCountFrequency (%)
900 1
 
0.2%
840 1
 
0.2%
650 1
 
0.2%
600 1
 
0.2%
550 1
 
0.2%
500 2
 
0.4%
450 3
 
0.6%
400 6
1.2%
350 9
1.7%
320 1
 
0.2%

Interactions

2023-12-12T23:56:28.973895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T23:56:32.047992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부지사구분용량
본부1.0001.0000.2270.118
지사1.0001.0000.8310.000
구분0.2270.8311.0000.711
용량0.1180.0000.7111.000
2023-12-12T23:56:32.162724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부구분
본부1.0000.103
구분0.1031.000
2023-12-12T23:56:32.256519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용량본부구분
용량1.0000.0570.544
본부0.0571.0000.103
구분0.5440.1031.000

Missing values

2023-12-12T23:56:29.095402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:56:29.211904image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

본부지사구분설치위치용량
0수도권본부수원지사휴게소안성(서울)휴게소550
1수도권본부경기광주지사휴게소이천(하남)휴게소400
2강원본부원주지사휴게소문막(강릉)휴게소300
3강원본부원주지사휴게소횡성(인천)휴게소300
4강원본부원주지사휴게소횡성(강릉)휴게소500
5강원본부대관령지사휴게소평창(강릉)휴게소300
6강원본부홍천지사휴게소원주(부산)휴게소300
7강원본부홍천지사휴게소홍천강(춘천)휴게소110
8강원본부홍천지사휴게소춘천(부산)휴게소150
9강원본부춘천지사휴게소홍천(양양)휴게소350
본부지사구분설치위치용량
510부산경남본부경주지사터널관리동양북15
511부산경남본부경주지사터널관리동양북53
512부산경남본부경주지사터널관리동오천53
513부산경남본부고성지사터널관리동고성1,23
514부산경남본부고성지사터널관리동통영23
515부산경남본부서울산지사터널관리동산외33
516부산경남본부서울산지사터널관리동단장1,25
517부산경남본부서울산지사터널관리동단장46
518부산경남본부서울산지사터널관리동재약산6
519부산경남본부서울산지사터널관리동신불산6