Overview

Dataset statistics

Number of variables6
Number of observations378
Missing cells780
Missing cells (%)34.4%
Duplicate rows1
Duplicate rows (%)0.3%
Total size in memory17.8 KiB
Average record size in memory48.3 B

Variable types

Text4
Categorical2

Dataset

Description경기도 안산시 소재 특정토양오염 관리대상시설 현황입니다. 신고번호,상호명,대표자,구분,소재지,데이터기준일자를 제공합니다.
URLhttps://www.data.go.kr/data/15068604/fileData.do

Alerts

Dataset has 1 (0.3%) duplicate rowsDuplicates
구분 is highly overall correlated with 데이터기준일자High correlation
데이터기준일자 is highly overall correlated with 구분High correlation
신고번호 has 195 (51.6%) missing valuesMissing
상호명 has 195 (51.6%) missing valuesMissing
대표자 has 195 (51.6%) missing valuesMissing
소재지 has 195 (51.6%) missing valuesMissing

Reproduction

Analysis started2023-12-12 19:43:30.632723
Analysis finished2023-12-12 19:43:31.361933
Duration0.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신고번호
Text

MISSING 

Distinct183
Distinct (%)100.0%
Missing195
Missing (%)51.6%
Memory size3.1 KiB
2023-12-13T04:43:31.627569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.1912568
Min length8

Characters and Unicode

Total characters1682
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique183 ?
Unique (%)100.0%

Sample

1st row1996-001
2nd row1996-003
3rd row1996-004
4th row1996-005
5th row1996-007
ValueCountFrequency (%)
1996-018 1
 
0.5%
2010-00008 1
 
0.5%
1997-10461 1
 
0.5%
1998-10466 1
 
0.5%
2001-00004 1
 
0.5%
2002-00002 1
 
0.5%
2005-00002 1
 
0.5%
2006-00001 1
 
0.5%
2006-00003 1
 
0.5%
2008-00002 1
 
0.5%
Other values (173) 173
94.5%
2023-12-13T04:43:32.091520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 552
32.8%
1 272
16.2%
9 211
 
12.5%
- 183
 
10.9%
2 158
 
9.4%
6 110
 
6.5%
7 52
 
3.1%
3 51
 
3.0%
8 31
 
1.8%
5 31
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1499
89.1%
Dash Punctuation 183
 
10.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 552
36.8%
1 272
18.1%
9 211
 
14.1%
2 158
 
10.5%
6 110
 
7.3%
7 52
 
3.5%
3 51
 
3.4%
8 31
 
2.1%
5 31
 
2.1%
4 31
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
- 183
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1682
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 552
32.8%
1 272
16.2%
9 211
 
12.5%
- 183
 
10.9%
2 158
 
9.4%
6 110
 
6.5%
7 52
 
3.1%
3 51
 
3.0%
8 31
 
1.8%
5 31
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1682
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 552
32.8%
1 272
16.2%
9 211
 
12.5%
- 183
 
10.9%
2 158
 
9.4%
6 110
 
6.5%
7 52
 
3.1%
3 51
 
3.0%
8 31
 
1.8%
5 31
 
1.8%

상호명
Text

MISSING 

Distinct183
Distinct (%)100.0%
Missing195
Missing (%)51.6%
Memory size3.1 KiB
2023-12-13T04:43:32.461360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length8.3333333
Min length3

Characters and Unicode

Total characters1525
Distinct characters254
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique183 ?
Unique (%)100.0%

Sample

1st row현대오일뱅크㈜직영 동양주유소
2nd row현대오일뱅크㈜직영 팔곡셀프주유소
3rd row대상주유소
4th row㈜성홍 안산터미널주유소
5th row세광에너지㈜소망주유소2
ValueCountFrequency (%)
현대오일뱅크㈜직영 6
 
2.7%
안산공장 3
 
1.4%
주식회사 2
 
0.9%
서울석유㈜본오주유소 1
 
0.5%
서울금속공업㈜ 1
 
0.5%
풍림유화공업㈜ 1
 
0.5%
㈜파인켐텍 1
 
0.5%
㈜삼창유화 1
 
0.5%
주)동서 1
 
0.5%
주)아팩 1
 
0.5%
Other values (203) 203
91.9%
2023-12-13T04:43:33.089136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
104
 
6.8%
94
 
6.2%
92
 
6.0%
78
 
5.1%
41
 
2.7%
36
 
2.4%
34
 
2.2%
) 32
 
2.1%
( 32
 
2.1%
29
 
1.9%
Other values (244) 953
62.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1290
84.6%
Other Symbol 104
 
6.8%
Space Separator 41
 
2.7%
Close Punctuation 32
 
2.1%
Open Punctuation 32
 
2.1%
Uppercase Letter 16
 
1.0%
Decimal Number 10
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
94
 
7.3%
92
 
7.1%
78
 
6.0%
36
 
2.8%
34
 
2.6%
29
 
2.2%
29
 
2.2%
25
 
1.9%
24
 
1.9%
23
 
1.8%
Other values (225) 826
64.0%
Uppercase Letter
ValueCountFrequency (%)
S 3
18.8%
K 2
12.5%
T 2
12.5%
R 2
12.5%
W 1
 
6.2%
N 1
 
6.2%
M 1
 
6.2%
V 1
 
6.2%
I 1
 
6.2%
G 1
 
6.2%
Decimal Number
ValueCountFrequency (%)
2 5
50.0%
1 3
30.0%
7 1
 
10.0%
4 1
 
10.0%
Other Symbol
ValueCountFrequency (%)
104
100.0%
Space Separator
ValueCountFrequency (%)
41
100.0%
Close Punctuation
ValueCountFrequency (%)
) 32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 32
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1394
91.4%
Common 115
 
7.5%
Latin 16
 
1.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
104
 
7.5%
94
 
6.7%
92
 
6.6%
78
 
5.6%
36
 
2.6%
34
 
2.4%
29
 
2.1%
29
 
2.1%
25
 
1.8%
24
 
1.7%
Other values (226) 849
60.9%
Latin
ValueCountFrequency (%)
S 3
18.8%
K 2
12.5%
T 2
12.5%
R 2
12.5%
W 1
 
6.2%
N 1
 
6.2%
M 1
 
6.2%
V 1
 
6.2%
I 1
 
6.2%
G 1
 
6.2%
Common
ValueCountFrequency (%)
41
35.7%
) 32
27.8%
( 32
27.8%
2 5
 
4.3%
1 3
 
2.6%
7 1
 
0.9%
4 1
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1290
84.6%
ASCII 131
 
8.6%
None 104
 
6.8%

Most frequent character per block

None
ValueCountFrequency (%)
104
100.0%
Hangul
ValueCountFrequency (%)
94
 
7.3%
92
 
7.1%
78
 
6.0%
36
 
2.8%
34
 
2.6%
29
 
2.2%
29
 
2.2%
25
 
1.9%
24
 
1.9%
23
 
1.8%
Other values (225) 826
64.0%
ASCII
ValueCountFrequency (%)
41
31.3%
) 32
24.4%
( 32
24.4%
2 5
 
3.8%
S 3
 
2.3%
1 3
 
2.3%
K 2
 
1.5%
T 2
 
1.5%
R 2
 
1.5%
W 1
 
0.8%
Other values (8) 8
 
6.1%

대표자
Text

MISSING 

Distinct92
Distinct (%)50.3%
Missing195
Missing (%)51.6%
Memory size3.1 KiB
2023-12-13T04:43:33.490645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length3.9945355
Min length2

Characters and Unicode

Total characters731
Distinct characters128
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)47.0%

Sample

1st row대표이사
2nd row대표이사
3rd row박진승
4th row대표이사
5th row서영철
ValueCountFrequency (%)
대표이사 87
44.4%
3
 
1.5%
이명자 2
 
1.0%
1인 2
 
1.0%
조합장 2
 
1.0%
서영철 2
 
1.0%
한국전력공사장 2
 
1.0%
김형국 2
 
1.0%
허세홍 2
 
1.0%
김종민 1
 
0.5%
Other values (91) 91
46.4%
2023-12-13T04:43:34.066342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
106
 
14.5%
95
 
13.0%
91
 
12.4%
89
 
12.2%
26
 
3.6%
14
 
1.9%
13
 
1.8%
10
 
1.4%
8
 
1.1%
8
 
1.1%
Other values (118) 271
37.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 703
96.2%
Space Separator 14
 
1.9%
Other Punctuation 5
 
0.7%
Decimal Number 5
 
0.7%
Close Punctuation 2
 
0.3%
Open Punctuation 2
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
106
15.1%
95
 
13.5%
91
 
12.9%
89
 
12.7%
26
 
3.7%
13
 
1.8%
10
 
1.4%
8
 
1.1%
8
 
1.1%
7
 
1.0%
Other values (113) 250
35.6%
Space Separator
ValueCountFrequency (%)
14
100.0%
Other Punctuation
ValueCountFrequency (%)
, 5
100.0%
Decimal Number
ValueCountFrequency (%)
1 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 703
96.2%
Common 28
 
3.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
106
15.1%
95
 
13.5%
91
 
12.9%
89
 
12.7%
26
 
3.7%
13
 
1.8%
10
 
1.4%
8
 
1.1%
8
 
1.1%
7
 
1.0%
Other values (113) 250
35.6%
Common
ValueCountFrequency (%)
14
50.0%
, 5
 
17.9%
1 5
 
17.9%
) 2
 
7.1%
( 2
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 703
96.2%
ASCII 28
 
3.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
106
15.1%
95
 
13.5%
91
 
12.9%
89
 
12.7%
26
 
3.7%
13
 
1.8%
10
 
1.4%
8
 
1.1%
8
 
1.1%
7
 
1.0%
Other values (113) 250
35.6%
ASCII
ValueCountFrequency (%)
14
50.0%
, 5
 
17.9%
1 5
 
17.9%
) 2
 
7.1%
( 2
 
7.1%

구분
Categorical

HIGH CORRELATION 

Distinct7
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
<NA>
195 
산업시설
97 
주유소
70 
기타
 
10
유독물(산업시설)
 
4
Other values (2)
 
2

Length

Max length9
Median length4
Mean length3.8121693
Min length2

Unique

Unique2 ?
Unique (%)0.5%

Sample

1st row주유소
2nd row주유소
3rd row주유소
4th row주유소
5th row주유소

Common Values

ValueCountFrequency (%)
<NA> 195
51.6%
산업시설 97
25.7%
주유소 70
 
18.5%
기타 10
 
2.6%
유독물(산업시설) 4
 
1.1%
운수업 1
 
0.3%
행정기관 1
 
0.3%

Length

2023-12-13T04:43:34.291727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:43:34.486849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 195
51.6%
산업시설 97
25.7%
주유소 70
 
18.5%
기타 10
 
2.6%
유독물(산업시설 4
 
1.1%
운수업 1
 
0.3%
행정기관 1
 
0.3%

소재지
Text

MISSING 

Distinct183
Distinct (%)100.0%
Missing195
Missing (%)51.6%
Memory size3.1 KiB
2023-12-13T04:43:35.008730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length32
Mean length24.153005
Min length17

Characters and Unicode

Total characters4420
Distinct characters96
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique183 ?
Unique (%)100.0%

Sample

1st row경기도 안산시 단원구 신길동 786-1
2nd row경기도 안산시 상록구 팔곡2동 445-4
3rd row경기도 안산시 단원구 선부동 459-1
4th row경기도 안산시 상록구 성포동 590
5th row경기도 안산시 상록구 사동 1318-3
ValueCountFrequency (%)
경기도 183
18.0%
안산시 183
18.0%
단원구 147
14.5%
성곡동 39
 
3.8%
상록구 36
 
3.5%
목내동 22
 
2.2%
원시동 19
 
1.9%
신길동 14
 
1.4%
별망로 12
 
1.2%
원시로 8
 
0.8%
Other values (252) 353
34.7%
2023-12-13T04:43:35.753013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
869
19.7%
217
 
4.9%
200
 
4.5%
192
 
4.3%
190
 
4.3%
186
 
4.2%
186
 
4.2%
184
 
4.2%
183
 
4.1%
174
 
3.9%
Other values (86) 1839
41.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2598
58.8%
Space Separator 869
 
19.7%
Decimal Number 686
 
15.5%
Open Punctuation 97
 
2.2%
Close Punctuation 97
 
2.2%
Dash Punctuation 64
 
1.4%
Other Punctuation 8
 
0.2%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
217
 
8.4%
200
 
7.7%
192
 
7.4%
190
 
7.3%
186
 
7.2%
186
 
7.2%
184
 
7.1%
183
 
7.0%
174
 
6.7%
165
 
6.4%
Other values (70) 721
27.8%
Decimal Number
ValueCountFrequency (%)
1 161
23.5%
2 88
12.8%
3 68
9.9%
4 68
9.9%
5 63
 
9.2%
6 57
 
8.3%
7 56
 
8.2%
8 48
 
7.0%
9 41
 
6.0%
0 36
 
5.2%
Space Separator
ValueCountFrequency (%)
869
100.0%
Open Punctuation
ValueCountFrequency (%)
( 97
100.0%
Close Punctuation
ValueCountFrequency (%)
) 97
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 64
100.0%
Other Punctuation
ValueCountFrequency (%)
, 8
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2598
58.8%
Common 1821
41.2%
Latin 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
217
 
8.4%
200
 
7.7%
192
 
7.4%
190
 
7.3%
186
 
7.2%
186
 
7.2%
184
 
7.1%
183
 
7.0%
174
 
6.7%
165
 
6.4%
Other values (70) 721
27.8%
Common
ValueCountFrequency (%)
869
47.7%
1 161
 
8.8%
( 97
 
5.3%
) 97
 
5.3%
2 88
 
4.8%
3 68
 
3.7%
4 68
 
3.7%
- 64
 
3.5%
5 63
 
3.5%
6 57
 
3.1%
Other values (5) 189
 
10.4%
Latin
ValueCountFrequency (%)
B 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2598
58.8%
ASCII 1822
41.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
869
47.7%
1 161
 
8.8%
( 97
 
5.3%
) 97
 
5.3%
2 88
 
4.8%
3 68
 
3.7%
4 68
 
3.7%
- 64
 
3.5%
5 63
 
3.5%
6 57
 
3.1%
Other values (6) 190
 
10.4%
Hangul
ValueCountFrequency (%)
217
 
8.4%
200
 
7.7%
192
 
7.4%
190
 
7.3%
186
 
7.2%
186
 
7.2%
184
 
7.1%
183
 
7.0%
174
 
6.7%
165
 
6.4%
Other values (70) 721
27.8%

데이터기준일자
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.1 KiB
<NA>
195 
2023-08-07
183 

Length

Max length10
Median length4
Mean length6.9047619
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-08-07
2nd row2023-08-07
3rd row2023-08-07
4th row2023-08-07
5th row2023-08-07

Common Values

ValueCountFrequency (%)
<NA> 195
51.6%
2023-08-07 183
48.4%

Length

2023-12-13T04:43:35.979028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T04:43:36.154091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 195
51.6%
2023-08-07 183
48.4%

Correlations

2023-12-13T04:43:36.258470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대표자구분
대표자1.0000.860
구분0.8601.000
2023-12-13T04:43:36.386421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분데이터기준일자
구분1.0001.000
데이터기준일자1.0001.000
2023-12-13T04:43:36.516694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분데이터기준일자
구분1.0001.000
데이터기준일자1.0001.000

Missing values

2023-12-13T04:43:31.035814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T04:43:31.155851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T04:43:31.278265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

신고번호상호명대표자구분소재지데이터기준일자
01996-001현대오일뱅크㈜직영 동양주유소대표이사주유소경기도 안산시 단원구 신길동 786-12023-08-07
11996-003현대오일뱅크㈜직영 팔곡셀프주유소대표이사주유소경기도 안산시 상록구 팔곡2동 445-42023-08-07
21996-004대상주유소박진승주유소경기도 안산시 단원구 선부동 459-12023-08-07
31996-005㈜성홍 안산터미널주유소대표이사주유소경기도 안산시 상록구 성포동 5902023-08-07
41996-007세광에너지㈜소망주유소2서영철주유소경기도 안산시 상록구 사동 1318-32023-08-07
51996-008세방연료㈜대표이사주유소경기도 안산시 상록구 일동119-42023-08-07
61996-013준오토조이㈜ 성우주유소이장호주유소경기도 안산시 상록구 월피동 4812023-08-07
71996-014옹진수산업협동조합조합장기타경기도 안산시 단원구 선감동 717-22023-08-07
81996-016역전주유소이천석주유소경기도 안산시 상록구 이동 593-112023-08-07
91996-017작전주유소김형관주유소경기도 안산시 단원구 대부동동 221-12023-08-07
신고번호상호명대표자구분소재지데이터기준일자
368<NA><NA><NA><NA><NA><NA>
369<NA><NA><NA><NA><NA><NA>
370<NA><NA><NA><NA><NA><NA>
371<NA><NA><NA><NA><NA><NA>
372<NA><NA><NA><NA><NA><NA>
373<NA><NA><NA><NA><NA><NA>
374<NA><NA><NA><NA><NA><NA>
375<NA><NA><NA><NA><NA><NA>
376<NA><NA><NA><NA><NA><NA>
377<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

신고번호상호명대표자구분소재지데이터기준일자# duplicates
0<NA><NA><NA><NA><NA><NA>195