Overview

Dataset statistics

Number of variables5
Number of observations104
Missing cells20
Missing cells (%)3.8%
Duplicate rows1
Duplicate rows (%)1.0%
Total size in memory4.4 KiB
Average record size in memory43.3 B

Variable types

Text2
Numeric2
Categorical1

Dataset

Description대구광역시 북구 관내 특정토양오염관리대상시설 현황(시설명, 소재지도로명주소, 위치정보 등) 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15006308/fileData.do

Alerts

Dataset has 1 (1.0%) duplicate rowsDuplicates
위도 is highly overall correlated with 데이터기준일자High correlation
경도 is highly overall correlated with 데이터기준일자High correlation
데이터기준일자 is highly overall correlated with 위도 and 1 other fieldsHigh correlation
데이터기준일자 is highly imbalanced (72.2%)Imbalance
시설명 has 5 (4.8%) missing valuesMissing
소재지 도로명주소 has 5 (4.8%) missing valuesMissing
위도 has 5 (4.8%) missing valuesMissing
경도 has 5 (4.8%) missing valuesMissing

Reproduction

Analysis started2023-12-12 02:38:42.821232
Analysis finished2023-12-12 02:38:43.958484
Duration1.14 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시설명
Text

MISSING 

Distinct99
Distinct (%)100.0%
Missing5
Missing (%)4.8%
Memory size964.0 B
2023-12-12T11:38:44.193694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length18
Mean length9.1515152
Min length4

Characters and Unicode

Total characters906
Distinct characters180
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique99 ?
Unique (%)100.0%

Sample

1st row선진주유소
2nd row태왕주유소
3rd row대훈남주유소
4th row(주)보광산업
5th row미희주유소
ValueCountFrequency (%)
주식회사 4
 
3.3%
에이치디현대오일뱅크(주)직영 4
 
3.3%
주)에이치씨대하에너지 2
 
1.7%
케이케이(주)침산주유소 1
 
0.8%
산동주유소 1
 
0.8%
지에스칼텍스(주)구암주유소 1
 
0.8%
칠곡매천주유소 1
 
0.8%
한국광유(주 1
 
0.8%
오일월드주유소 1
 
0.8%
sk에너지(주)칠곡ic주유소 1
 
0.8%
Other values (103) 103
85.8%
2023-12-12T11:38:44.650736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
103
 
11.4%
77
 
8.5%
69
 
7.6%
39
 
4.3%
( 35
 
3.9%
) 35
 
3.9%
26
 
2.9%
23
 
2.5%
21
 
2.3%
15
 
1.7%
Other values (170) 463
51.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 781
86.2%
Open Punctuation 35
 
3.9%
Close Punctuation 35
 
3.9%
Space Separator 21
 
2.3%
Uppercase Letter 18
 
2.0%
Decimal Number 9
 
1.0%
Lowercase Letter 5
 
0.6%
Other Punctuation 2
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
103
 
13.2%
77
 
9.9%
69
 
8.8%
39
 
5.0%
26
 
3.3%
23
 
2.9%
15
 
1.9%
15
 
1.9%
14
 
1.8%
11
 
1.4%
Other values (149) 389
49.8%
Uppercase Letter
ValueCountFrequency (%)
S 7
38.9%
K 6
33.3%
C 1
 
5.6%
I 1
 
5.6%
E 1
 
5.6%
L 1
 
5.6%
F 1
 
5.6%
Decimal Number
ValueCountFrequency (%)
5 3
33.3%
2 2
22.2%
1 1
 
11.1%
8 1
 
11.1%
7 1
 
11.1%
6 1
 
11.1%
Lowercase Letter
ValueCountFrequency (%)
e 2
40.0%
s 1
20.0%
l 1
20.0%
f 1
20.0%
Open Punctuation
ValueCountFrequency (%)
( 35
100.0%
Close Punctuation
ValueCountFrequency (%)
) 35
100.0%
Space Separator
ValueCountFrequency (%)
21
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 781
86.2%
Common 102
 
11.3%
Latin 23
 
2.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
103
 
13.2%
77
 
9.9%
69
 
8.8%
39
 
5.0%
26
 
3.3%
23
 
2.9%
15
 
1.9%
15
 
1.9%
14
 
1.8%
11
 
1.4%
Other values (149) 389
49.8%
Latin
ValueCountFrequency (%)
S 7
30.4%
K 6
26.1%
e 2
 
8.7%
C 1
 
4.3%
I 1
 
4.3%
s 1
 
4.3%
E 1
 
4.3%
L 1
 
4.3%
F 1
 
4.3%
l 1
 
4.3%
Common
ValueCountFrequency (%)
( 35
34.3%
) 35
34.3%
21
20.6%
5 3
 
2.9%
, 2
 
2.0%
2 2
 
2.0%
1 1
 
1.0%
8 1
 
1.0%
7 1
 
1.0%
6 1
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 781
86.2%
ASCII 125
 
13.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
103
 
13.2%
77
 
9.9%
69
 
8.8%
39
 
5.0%
26
 
3.3%
23
 
2.9%
15
 
1.9%
15
 
1.9%
14
 
1.8%
11
 
1.4%
Other values (149) 389
49.8%
ASCII
ValueCountFrequency (%)
( 35
28.0%
) 35
28.0%
21
16.8%
S 7
 
5.6%
K 6
 
4.8%
5 3
 
2.4%
, 2
 
1.6%
2 2
 
1.6%
e 2
 
1.6%
C 1
 
0.8%
Other values (11) 11
 
8.8%
Distinct96
Distinct (%)97.0%
Missing5
Missing (%)4.8%
Memory size964.0 B
2023-12-12T11:38:45.022032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length28
Mean length23.707071
Min length20

Characters and Unicode

Total characters2347
Distinct characters81
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)94.9%

Sample

1st row대구광역시 북구 연암로 119 (산격동)
2nd row대구광역시 북구 동북로 9 (산격동)
3rd row대구광역시 북구 칠곡중앙대로 174 (태전동)
4th row대구광역시 북구 유통단지로3길 40 (산격동)
5th row대구광역시 북구 칠곡중앙대로 624 (읍내동)
ValueCountFrequency (%)
대구광역시 99
19.9%
북구 98
19.7%
산격동 17
 
3.4%
노원동3가 15
 
3.0%
침산동 11
 
2.2%
칠곡중앙대로 11
 
2.2%
노원로 10
 
2.0%
태전동 8
 
1.6%
호국로 7
 
1.4%
동북로 7
 
1.4%
Other values (159) 215
43.2%
2023-12-12T11:38:45.512778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
400
17.0%
205
 
8.7%
120
 
5.1%
113
 
4.8%
107
 
4.6%
( 99
 
4.2%
99
 
4.2%
99
 
4.2%
99
 
4.2%
) 99
 
4.2%
Other values (71) 907
38.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1395
59.4%
Space Separator 400
 
17.0%
Decimal Number 333
 
14.2%
Open Punctuation 99
 
4.2%
Close Punctuation 99
 
4.2%
Dash Punctuation 17
 
0.7%
Other Punctuation 4
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
205
14.7%
120
 
8.6%
113
 
8.1%
107
 
7.7%
99
 
7.1%
99
 
7.1%
99
 
7.1%
99
 
7.1%
32
 
2.3%
29
 
2.1%
Other values (56) 393
28.2%
Decimal Number
ValueCountFrequency (%)
1 58
17.4%
2 57
17.1%
3 52
15.6%
4 38
11.4%
5 26
7.8%
6 23
 
6.9%
8 21
 
6.3%
9 20
 
6.0%
0 19
 
5.7%
7 19
 
5.7%
Space Separator
ValueCountFrequency (%)
400
100.0%
Open Punctuation
ValueCountFrequency (%)
( 99
100.0%
Close Punctuation
ValueCountFrequency (%)
) 99
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%
Other Punctuation
ValueCountFrequency (%)
, 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1395
59.4%
Common 952
40.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
205
14.7%
120
 
8.6%
113
 
8.1%
107
 
7.7%
99
 
7.1%
99
 
7.1%
99
 
7.1%
99
 
7.1%
32
 
2.3%
29
 
2.1%
Other values (56) 393
28.2%
Common
ValueCountFrequency (%)
400
42.0%
( 99
 
10.4%
) 99
 
10.4%
1 58
 
6.1%
2 57
 
6.0%
3 52
 
5.5%
4 38
 
4.0%
5 26
 
2.7%
6 23
 
2.4%
8 21
 
2.2%
Other values (5) 79
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1395
59.4%
ASCII 952
40.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
400
42.0%
( 99
 
10.4%
) 99
 
10.4%
1 58
 
6.1%
2 57
 
6.0%
3 52
 
5.5%
4 38
 
4.0%
5 26
 
2.7%
6 23
 
2.4%
8 21
 
2.2%
Other values (5) 79
 
8.3%
Hangul
ValueCountFrequency (%)
205
14.7%
120
 
8.6%
113
 
8.1%
107
 
7.7%
99
 
7.1%
99
 
7.1%
99
 
7.1%
99
 
7.1%
32
 
2.3%
29
 
2.1%
Other values (56) 393
28.2%

위도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct96
Distinct (%)97.0%
Missing5
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean35.908296
Minimum35.877979
Maximum35.958235
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-12T11:38:45.661433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum35.877979
5-th percentile35.884085
Q135.894484
median35.901872
Q335.914187
95-th percentile35.950846
Maximum35.958235
Range0.08025617
Interquartile range (IQR)0.019702661

Descriptive statistics

Standard deviation0.02095534
Coefficient of variation (CV)0.00058357936
Kurtosis-0.0050652344
Mean35.908296
Median Absolute Deviation (MAD)0.010032052
Skewness1.0181505
Sum3554.9213
Variance0.00043912628
MonotonicityNot monotonic
2023-12-12T11:38:45.828233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35.9012692821 3
 
2.9%
35.9494992868 2
 
1.9%
35.9406258489 1
 
1.0%
35.9017300941 1
 
1.0%
35.9430719567 1
 
1.0%
35.9512318365 1
 
1.0%
35.9062959781 1
 
1.0%
35.9142007231 1
 
1.0%
35.8908803058 1
 
1.0%
35.9473620601 1
 
1.0%
Other values (86) 86
82.7%
(Missing) 5
 
4.8%
ValueCountFrequency (%)
35.8779787946 1
1.0%
35.8803484609 1
1.0%
35.882037633 1
1.0%
35.8831140319 1
1.0%
35.8834715843 1
1.0%
35.8841533661 1
1.0%
35.8846158888 1
1.0%
35.8861778348 1
1.0%
35.886255507 1
1.0%
35.8865775176 1
1.0%
ValueCountFrequency (%)
35.958234965 1
1.0%
35.9581151841 1
1.0%
35.9561460236 1
1.0%
35.9527552483 1
1.0%
35.9512318365 1
1.0%
35.9508028327 1
1.0%
35.9505034096 1
1.0%
35.9494992868 2
1.9%
35.9494032125 1
1.0%
35.9473620601 1
1.0%

경도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct96
Distinct (%)97.0%
Missing5
Missing (%)4.8%
Infinite0
Infinite (%)0.0%
Mean128.58029
Minimum128.52157
Maximum128.75746
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-12T11:38:45.987717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum128.52157
5-th percentile128.54166
Q1128.55495
median128.57838
Q3128.59975
95-th percentile128.62143
Maximum128.75746
Range0.23588871
Interquartile range (IQR)0.044807543

Descriptive statistics

Standard deviation0.031787371
Coefficient of variation (CV)0.00024721807
Kurtosis8.3872001
Mean128.58029
Median Absolute Deviation (MAD)0.022283118
Skewness1.663965
Sum12729.448
Variance0.0010104369
MonotonicityNot monotonic
2023-12-12T11:38:46.145048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
128.5877135216 3
 
2.9%
128.5717287264 2
 
1.9%
128.5703140804 1
 
1.0%
128.5697193142 1
 
1.0%
128.5420186051 1
 
1.0%
128.5698335336 1
 
1.0%
128.5477044603 1
 
1.0%
128.5995646477 1
 
1.0%
128.605705834 1
 
1.0%
128.5727332193 1
 
1.0%
Other values (86) 86
82.7%
(Missing) 5
 
4.8%
ValueCountFrequency (%)
128.5215702633 1
1.0%
128.5256339777 1
1.0%
128.5302647748 1
1.0%
128.5339747579 1
1.0%
128.5396443381 1
1.0%
128.5418826955 1
1.0%
128.5420186051 1
1.0%
128.5420643565 1
1.0%
128.5460638757 1
1.0%
128.5465315407 1
1.0%
ValueCountFrequency (%)
128.7574589687 1
1.0%
128.628997975 1
1.0%
128.6280030532 1
1.0%
128.6245031608 1
1.0%
128.6231196524 1
1.0%
128.6212433362 1
1.0%
128.6184251751 1
1.0%
128.6180468192 1
1.0%
128.6159290608 1
1.0%
128.615565403 1
1.0%

데이터기준일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size964.0 B
2023-05-15
99 
<NA>
 
5

Length

Max length10
Median length10
Mean length9.7115385
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-05-15
2nd row2023-05-15
3rd row2023-05-15
4th row2023-05-15
5th row2023-05-15

Common Values

ValueCountFrequency (%)
2023-05-15 99
95.2%
<NA> 5
 
4.8%

Length

2023-12-12T11:38:46.319395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:38:46.414508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-05-15 99
95.2%
na 5
 
4.8%

Interactions

2023-12-12T11:38:43.411164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:38:43.177227image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:38:43.506480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:38:43.310193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:38:46.485424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설명소재지 도로명주소위도경도
시설명1.0001.0001.0001.000
소재지 도로명주소1.0001.0001.0001.000
위도1.0001.0001.0000.405
경도1.0001.0000.4051.000
2023-12-12T11:38:46.602329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도경도데이터기준일자
위도1.000-0.2081.000
경도-0.2081.0001.000
데이터기준일자1.0001.0001.000

Missing values

2023-12-12T11:38:43.633626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:38:43.765237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T11:38:43.889589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시설명소재지 도로명주소위도경도데이터기준일자
0선진주유소대구광역시 북구 연암로 119 (산격동)35.895222128.5942852023-05-15
1태왕주유소대구광역시 북구 동북로 9 (산격동)35.901872128.5946062023-05-15
2대훈남주유소대구광역시 북구 칠곡중앙대로 174 (태전동)35.91137128.548922023-05-15
3(주)보광산업대구광역시 북구 유통단지로3길 40 (산격동)35.907459128.603972023-05-15
4미희주유소대구광역시 북구 칠곡중앙대로 624 (읍내동)35.950803128.5533722023-05-15
5중앙주유소대구광역시 북구 동북로 315 (복현동)35.89184128.6245032023-05-15
6흥구석유(주)원대주유소대구광역시 북구 팔달로 67 (노원동3가)35.890868128.558092023-05-15
7흥구석유(주)산격주유소대구광역시 북구 동북로 39 (산격동)35.902312128.5979772023-05-15
8해바라기self주유소대구광역시 북구 대현로 81 (대현동)35.883114128.6080692023-05-15
9공명주유소대구광역시 북구 침산남로 43 (노원동1가)35.889303128.5806992023-05-15
시설명소재지 도로명주소위도경도데이터기준일자
94주식회사 북대구아이씨주유소대구광역시 북구 동북로 113 (산격동)35.902289128.6063562023-05-15
95대현주유소대구광역시 북구 대학로 16 (산격동)35.888845128.6034362023-05-15
96대한송유관공사 영남지사대구광역시 동구 대경로 31-27 (내곡동)35.880348128.7574592023-05-15
97현대윤활유대구광역시 북구 3공단로 144-1 (노원동3가)35.898312128.5676112023-05-15
98(주)에이치씨 대하에너지 연경주유소대구광역시 북구 동화천로 207 (연경동)35.9416128.6137372023-05-15
99<NA><NA><NA><NA><NA>
100<NA><NA><NA><NA><NA>
101<NA><NA><NA><NA><NA>
102<NA><NA><NA><NA><NA>
103<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

시설명소재지 도로명주소위도경도데이터기준일자# duplicates
0<NA><NA><NA><NA><NA>5