Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells8162
Missing cells (%)16.3%
Duplicate rows665
Duplicate rows (%)6.7%
Total size in memory478.5 KiB
Average record size in memory49.0 B

Variable types

Numeric2
Text2
Categorical1

Dataset

Description경상남도 도로대장전산화 시스템 데이터의 중장기개방계획에 따른 데이터입니다. 시스템 상에서의 기본공사 도면 정보(공사관련,도면 정보)를 가지고 있으며, 도로대장의 공사도면정보 데이터를 포함하고있습니다.
Author경상남도
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15091953

Alerts

Dataset has 665 (6.7%) duplicate rowsDuplicates
공사코드 is highly overall correlated with 이미지파일코드High correlation
이미지파일코드 is highly overall correlated with 공사코드High correlation
비고 has 8160 (81.6%) missing valuesMissing

Reproduction

Analysis started2023-12-11 00:54:51.544294
Analysis finished2023-12-11 00:54:54.600793
Duration3.06 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

공사코드
Real number (ℝ)

HIGH CORRELATION 

Distinct305
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9743074 × 1010
Minimum30001
Maximum2.0161042 × 1010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T09:54:54.674368image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum30001
5-th percentile1.9901034 × 1010
Q11.9981008 × 1010
median2.0051021 × 1010
Q32.0100058 × 1010
95-th percentile2.0121084 × 1010
Maximum2.0161042 × 1010
Range2.0161012 × 1010
Interquartile range (IQR)1.190499 × 108

Descriptive statistics

Standard deviation2.4149395 × 109
Coefficient of variation (CV)0.12231831
Kurtosis62.198554
Mean1.9743074 × 1010
Median Absolute Deviation (MAD)49036997
Skewness-8.0062368
Sum1.9743074 × 1014
Variance5.8319328 × 1018
MonotonicityNot monotonic
2023-12-11T09:54:54.832669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20100058000 2967
29.7%
20051001002 406
 
4.1%
20130060000 306
 
3.1%
20051084001 208
 
2.1%
20051001003 165
 
1.7%
20121084000 146
 
1.5%
20051021003 137
 
1.4%
20070058048 131
 
1.3%
19961084025 130
 
1.3%
20070069001 118
 
1.2%
Other values (295) 5286
52.9%
ValueCountFrequency (%)
30001 23
0.2%
1001012 40
0.4%
1005001 23
0.2%
1006142 11
 
0.1%
1021001 33
0.3%
1080001 10
 
0.1%
1998100104 9
 
0.1%
19771089001 25
0.2%
19841018053 19
0.2%
19860060058 5
 
0.1%
ValueCountFrequency (%)
20161042000 114
 
1.1%
20130060000 306
 
3.1%
20121084000 146
 
1.5%
20121077000 3
 
< 0.1%
20121040000 1
 
< 0.1%
20121029000 26
 
0.3%
20121022000 22
 
0.2%
20101005000 21
 
0.2%
20100058000 2967
29.7%
20091080000 41
 
0.4%
Distinct6840
Distinct (%)68.4%
Missing2
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-11T09:54:55.172745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length85
Median length67
Mean length13.791058
Min length2

Characters and Unicode

Total characters137883
Distinct characters566
Distinct categories15 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5938 ?
Unique (%)59.4%

Sample

1st row횡단면도(88)
2nd row호안 횡단면도(79)
3rd row14 BLOCK 평면도
4th row터널 전등성비 평면도(3)
5th row호안 횡단면도(45)
ValueCountFrequency (%)
715
 
3.5%
설계도 596
 
2.9%
509
 
2.5%
횡단면도 507
 
2.4%
도로 497
 
2.4%
of 261
 
1.3%
상세도 221
 
1.1%
포장공사 218
 
1.1%
확,포장공사 212
 
1.0%
도로확장 191
 
0.9%
Other values (6499) 16788
81.0%
2023-12-11T09:54:55.689397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10813
 
7.8%
10273
 
7.5%
( 6775
 
4.9%
) 6761
 
4.9%
1 3496
 
2.5%
3426
 
2.5%
2 2752
 
2.0%
2566
 
1.9%
2201
 
1.6%
2106
 
1.5%
Other values (556) 86714
62.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 81478
59.1%
Decimal Number 13732
 
10.0%
Uppercase Letter 12486
 
9.1%
Space Separator 10813
 
7.8%
Open Punctuation 6779
 
4.9%
Close Punctuation 6765
 
4.9%
Other Punctuation 2712
 
2.0%
Dash Punctuation 1849
 
1.3%
Math Symbol 923
 
0.7%
Lowercase Letter 268
 
0.2%
Other values (5) 78
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10273
 
12.6%
3426
 
4.2%
2566
 
3.1%
2201
 
2.7%
2106
 
2.6%
2020
 
2.5%
1818
 
2.2%
1732
 
2.1%
1641
 
2.0%
1619
 
2.0%
Other values (466) 52076
63.9%
Uppercase Letter
ValueCountFrequency (%)
E 1541
 
12.3%
T 1084
 
8.7%
O 1024
 
8.2%
S 868
 
7.0%
A 792
 
6.3%
P 650
 
5.2%
C 630
 
5.0%
L 613
 
4.9%
N 591
 
4.7%
R 590
 
4.7%
Other values (17) 4103
32.9%
Lowercase Letter
ValueCountFrequency (%)
x 86
32.1%
m 38
14.2%
w 21
 
7.8%
d 19
 
7.1%
g 19
 
7.1%
e 13
 
4.9%
p 9
 
3.4%
a 8
 
3.0%
y 8
 
3.0%
c 7
 
2.6%
Other values (12) 40
14.9%
Other Punctuation
ValueCountFrequency (%)
. 1298
47.9%
, 952
35.1%
· 137
 
5.1%
* 125
 
4.6%
" 92
 
3.4%
@ 52
 
1.9%
: 25
 
0.9%
/ 11
 
0.4%
' 9
 
0.3%
& 5
 
0.2%
Other values (2) 6
 
0.2%
Decimal Number
ValueCountFrequency (%)
1 3496
25.5%
2 2752
20.0%
0 1519
11.1%
3 1393
 
10.1%
4 1244
 
9.1%
5 1093
 
8.0%
6 655
 
4.8%
7 581
 
4.2%
8 571
 
4.2%
9 428
 
3.1%
Open Punctuation
ValueCountFrequency (%)
( 6775
99.9%
[ 3
 
< 0.1%
{ 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 6761
99.9%
] 3
 
< 0.1%
} 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
~ 607
65.8%
+ 171
 
18.5%
= 145
 
15.7%
Modifier Symbol
ValueCountFrequency (%)
˚ 12
92.3%
` 1
 
7.7%
Letter Number
ValueCountFrequency (%)
4
66.7%
2
33.3%
Other Number
ValueCountFrequency (%)
2
66.7%
1
33.3%
Space Separator
ValueCountFrequency (%)
10813
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1849
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 34
100.0%
Other Symbol
ValueCountFrequency (%)
° 22
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 81478
59.1%
Common 43645
31.7%
Latin 12760
 
9.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10273
 
12.6%
3426
 
4.2%
2566
 
3.1%
2201
 
2.7%
2106
 
2.6%
2020
 
2.5%
1818
 
2.2%
1732
 
2.1%
1641
 
2.0%
1619
 
2.0%
Other values (466) 52076
63.9%
Latin
ValueCountFrequency (%)
E 1541
 
12.1%
T 1084
 
8.5%
O 1024
 
8.0%
S 868
 
6.8%
A 792
 
6.2%
P 650
 
5.1%
C 630
 
4.9%
L 613
 
4.8%
N 591
 
4.6%
R 590
 
4.6%
Other values (41) 4377
34.3%
Common
ValueCountFrequency (%)
10813
24.8%
( 6775
15.5%
) 6761
15.5%
1 3496
 
8.0%
2 2752
 
6.3%
- 1849
 
4.2%
0 1519
 
3.5%
3 1393
 
3.2%
. 1298
 
3.0%
4 1244
 
2.9%
Other values (29) 5745
13.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 81478
59.1%
ASCII 56221
40.8%
None 166
 
0.1%
Modifier Letters 12
 
< 0.1%
Number Forms 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
10813
19.2%
( 6775
 
12.1%
) 6761
 
12.0%
1 3496
 
6.2%
2 2752
 
4.9%
- 1849
 
3.3%
E 1541
 
2.7%
0 1519
 
2.7%
3 1393
 
2.5%
. 1298
 
2.3%
Other values (71) 18024
32.1%
Hangul
ValueCountFrequency (%)
10273
 
12.6%
3426
 
4.2%
2566
 
3.1%
2201
 
2.7%
2106
 
2.6%
2020
 
2.5%
1818
 
2.2%
1732
 
2.1%
1641
 
2.0%
1619
 
2.0%
Other values (466) 52076
63.9%
None
ValueCountFrequency (%)
· 137
82.5%
° 22
 
13.3%
Ø 3
 
1.8%
2
 
1.2%
1
 
0.6%
1
 
0.6%
Modifier Letters
ValueCountFrequency (%)
˚ 12
100.0%
Number Forms
ValueCountFrequency (%)
4
66.7%
2
33.3%

이미지파일코드
Real number (ℝ)

HIGH CORRELATION 

Distinct140
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.9768692 × 1013
Minimum30001004
Maximum2.02 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T09:54:55.925440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum30001004
5-th percentile1.99 × 1013
Q12 × 1013
median2.01 × 1013
Q32.01 × 1013
95-th percentile2.01 × 1013
Maximum2.02 × 1013
Range2.019997 × 1013
Interquartile range (IQR)1 × 1011

Descriptive statistics

Standard deviation2.3567016 × 1012
Coefficient of variation (CV)0.11921384
Kurtosis66.350007
Mean1.9768692 × 1013
Median Absolute Deviation (MAD)0
Skewness-8.2626396
Sum1.9768692 × 1017
Variance5.5540424 × 1024
MonotonicityNot monotonic
2023-12-11T09:54:56.093746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20100000000000 5782
57.8%
20000000000000 2866
28.7%
19900000000000 1054
 
10.5%
20200000000000 114
 
1.1%
19800000000000 44
 
0.4%
1001012011 2
 
< 0.1%
1001012059 2
 
< 0.1%
1080001022 2
 
< 0.1%
1021001033 2
 
< 0.1%
1080001017 2
 
< 0.1%
Other values (130) 130
 
1.3%
ValueCountFrequency (%)
30001004 1
< 0.1%
30001009 1
< 0.1%
30001010 1
< 0.1%
30001013 1
< 0.1%
30001018 1
< 0.1%
30001027 1
< 0.1%
30001035 1
< 0.1%
30001038 1
< 0.1%
30001048 1
< 0.1%
30001064 1
< 0.1%
ValueCountFrequency (%)
20200000000000 114
 
1.1%
20100000000000 5782
57.8%
20000000000000 2866
28.7%
19900000000000 1054
 
10.5%
19800000000000 44
 
0.4%
1080001022 2
 
< 0.1%
1080001020 1
 
< 0.1%
1080001017 2
 
< 0.1%
1080001013 1
 
< 0.1%
1080001012 1
 
< 0.1%

파일종류
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
R
5360 
<NA>
2957 
V
1683 

Length

Max length4
Median length1
Mean length1.8871
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd rowV
3rd rowV
4th rowV
5th rowV

Common Values

ValueCountFrequency (%)
R 5360
53.6%
<NA> 2957
29.6%
V 1683
 
16.8%

Length

2023-12-11T09:54:56.253182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:54:56.354846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
r 5360
53.6%
na 2957
29.6%
v 1683
 
16.8%

비고
Text

MISSING 

Distinct704
Distinct (%)38.3%
Missing8160
Missing (%)81.6%
Memory size156.2 KiB
2023-12-11T09:54:56.647447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length48
Median length40
Mean length12.67337
Min length2

Characters and Unicode

Total characters23319
Distinct characters297
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique472 ?
Unique (%)25.7%

Sample

1st row서포-용현간 교량(사천대교)가설공사설계도
2nd row횡단면도(3)
3rd row횡단면도(1)
4th row상봉-집현(2)간 도로확장 및 포장공사
5th row용지도(3)(2공구)
ValueCountFrequency (%)
서포-용현간 131
 
3.8%
교량(사천대교)가설공사설계도 131
 
3.8%
원리-영포간 118
 
3.5%
2차로 118
 
3.5%
축조공사 118
 
3.5%
116
 
3.4%
포장공사 77
 
2.3%
도로확장 62
 
1.8%
설계도 48
 
1.4%
마천-수동간 45
 
1.3%
Other values (809) 2446
71.7%
2023-12-11T09:54:57.094603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1573
 
6.7%
1520
 
6.5%
( 924
 
4.0%
) 922
 
4.0%
918
 
3.9%
821
 
3.5%
577
 
2.5%
551
 
2.4%
529
 
2.3%
507
 
2.2%
Other values (287) 14477
62.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 15319
65.7%
Decimal Number 2119
 
9.1%
Space Separator 1573
 
6.7%
Uppercase Letter 1242
 
5.3%
Open Punctuation 924
 
4.0%
Close Punctuation 922
 
4.0%
Dash Punctuation 497
 
2.1%
Other Punctuation 459
 
2.0%
Math Symbol 217
 
0.9%
Lowercase Letter 30
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1520
 
9.9%
918
 
6.0%
821
 
5.4%
577
 
3.8%
551
 
3.6%
529
 
3.5%
507
 
3.3%
504
 
3.3%
476
 
3.1%
416
 
2.7%
Other values (236) 8500
55.5%
Uppercase Letter
ValueCountFrequency (%)
S 170
13.7%
T 126
10.1%
A 125
10.1%
E 87
 
7.0%
I 82
 
6.6%
L 82
 
6.6%
O 72
 
5.8%
M 72
 
5.8%
C 67
 
5.4%
N 59
 
4.8%
Other values (14) 300
24.2%
Decimal Number
ValueCountFrequency (%)
1 491
23.2%
2 399
18.8%
0 389
18.4%
5 203
9.6%
3 180
 
8.5%
4 131
 
6.2%
6 111
 
5.2%
7 74
 
3.5%
9 72
 
3.4%
8 69
 
3.3%
Other Punctuation
ValueCountFrequency (%)
. 325
70.8%
, 93
 
20.3%
@ 38
 
8.3%
' 1
 
0.2%
/ 1
 
0.2%
& 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
~ 96
44.2%
+ 63
29.0%
= 58
26.7%
Lowercase Letter
ValueCountFrequency (%)
x 26
86.7%
m 4
 
13.3%
Modifier Symbol
ValueCountFrequency (%)
˚ 11
64.7%
` 6
35.3%
Space Separator
ValueCountFrequency (%)
1573
100.0%
Open Punctuation
ValueCountFrequency (%)
( 924
100.0%
Close Punctuation
ValueCountFrequency (%)
) 922
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 497
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 15319
65.7%
Common 6728
28.9%
Latin 1272
 
5.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1520
 
9.9%
918
 
6.0%
821
 
5.4%
577
 
3.8%
551
 
3.6%
529
 
3.5%
507
 
3.3%
504
 
3.3%
476
 
3.1%
416
 
2.7%
Other values (236) 8500
55.5%
Latin
ValueCountFrequency (%)
S 170
13.4%
T 126
 
9.9%
A 125
 
9.8%
E 87
 
6.8%
I 82
 
6.4%
L 82
 
6.4%
O 72
 
5.7%
M 72
 
5.7%
C 67
 
5.3%
N 59
 
4.6%
Other values (16) 330
25.9%
Common
ValueCountFrequency (%)
1573
23.4%
( 924
13.7%
) 922
13.7%
- 497
 
7.4%
1 491
 
7.3%
2 399
 
5.9%
0 389
 
5.8%
. 325
 
4.8%
5 203
 
3.0%
3 180
 
2.7%
Other values (15) 825
12.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 15319
65.7%
ASCII 7989
34.3%
Modifier Letters 11
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1573
19.7%
( 924
11.6%
) 922
11.5%
- 497
 
6.2%
1 491
 
6.1%
2 399
 
5.0%
0 389
 
4.9%
. 325
 
4.1%
5 203
 
2.5%
3 180
 
2.3%
Other values (40) 2086
26.1%
Hangul
ValueCountFrequency (%)
1520
 
9.9%
918
 
6.0%
821
 
5.4%
577
 
3.8%
551
 
3.6%
529
 
3.5%
507
 
3.3%
504
 
3.3%
476
 
3.1%
416
 
2.7%
Other values (236) 8500
55.5%
Modifier Letters
ValueCountFrequency (%)
˚ 11
100.0%

Interactions

2023-12-11T09:54:52.891291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:54:52.205980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:54:53.246427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T09:54:52.316905image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T09:54:57.216393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사코드이미지파일코드파일종류
공사코드1.0000.9980.000
이미지파일코드0.9981.0000.000
파일종류0.0000.0001.000
2023-12-11T09:54:57.325126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공사코드이미지파일코드파일종류
공사코드1.0000.8800.000
이미지파일코드0.8801.0000.414
파일종류0.0000.4141.000

Missing values

2023-12-11T09:54:54.355983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T09:54:54.447681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T09:54:54.542767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

공사코드이미지파일명이미지파일코드파일종류비고
68220100058000횡단면도(88)20100000000000<NA><NA>
1224520051001002호안 횡단면도(79)20100000000000V<NA>
414822005100100414 BLOCK 평면도20100000000000V<NA>
1969420070058048터널 전등성비 평면도(3)20100000000000V서포-용현간 교량(사천대교)가설공사설계도
1221120051001002호안 횡단면도(45)20100000000000V<NA>
4665220100058000서측RAMP일반도-내부배치횡단면도(1)20100000000000<NA><NA>
122820100058000옹벽일반도(1)(RAMP-A)20100000000000<NA><NA>
3178719951004001횡 단 면 도 (156)20000000000000R<NA>
160120100058000외측가로보상세도(5)20100000000000<NA><NA>
4834320100058000D.S형가드레일-노견용(토공용)20100000000000<NA><NA>
공사코드이미지파일명이미지파일코드파일종류비고
4234020051084001실시 횡단면도(74)20100000000000R<NA>
437220100058000배수관날개벽(2)20100000000000<NA><NA>
4902720100058000초기우수장치상부층평면도,단면도(DSSF-2장치)20100000000000<NA><NA>
3340219910067001옹벽일반도19900000000000R<NA>
2741820041047002암거 연결상세및 방수상세도20000000000000R<NA>
1743620061089002완대도로굴곡개량공사 설계도(2006)20100000000000R횡단면도(6)
3002619871010000석축전개도19900000000000R<NA>
306631005001유토곡선-101005001130R<NA>
3936020041084001역T형옹벽전개도20000000000000R<NA>
4633620100058000함체배근도(E18)-평면도(2),PRIMARY END20100000000000<NA><NA>

Duplicate rows

Most frequently occurring

공사코드이미지파일명이미지파일코드파일종류비고# duplicates
4719921023001횡단면도19900000000000R<NA>16
37720061024001횡단면도20100000000000R<NA>16
16320001006168이설도로 횡단면도20000000000000R<NA>14
7519950060069횡단면도20000000000000R마천-수동간 도로확장 및 포장공사12
48120100058000도면목차20100000000000<NA><NA>12
8319951026002횡단면도20000000000000R<NA>9
10719961041015칠원 ~ 대산간 4차선 도로 확포장공사 설계도20000000000000R유토곡선8
40120071004000배수구조물횡단면도20100000000000R<NA>8
6619941034009횡단면도19900000000000R<NA>5
48220100058000도면목차(1)20100000000000<NA><NA>5