Overview

Dataset statistics

Number of variables6
Number of observations96
Missing cells20
Missing cells (%)3.5%
Duplicate rows1
Duplicate rows (%)1.0%
Total size in memory4.7 KiB
Average record size in memory50.4 B

Variable types

Numeric1
Text3
Categorical1
DateTime1

Dataset

Description충청북도 증평군 환경오염 배출 사업장 현황입니다. (연번, 사업장명, 사업장 업종, 종별구분(종), 소재지도로명주소, 데이터기준일자)
Author충청북도 증평군
URLhttps://www.data.go.kr/data/15123409/fileData.do

Alerts

데이터기준일자 has constant value ""Constant
Dataset has 1 (1.0%) duplicate rowsDuplicates
종별구분(종) is highly imbalanced (57.4%)Imbalance
연번 has 4 (4.2%) missing valuesMissing
사업장명 has 4 (4.2%) missing valuesMissing
사업장 업종 has 4 (4.2%) missing valuesMissing
소재지도로명주소 has 4 (4.2%) missing valuesMissing
데이터기준일자 has 4 (4.2%) missing valuesMissing

Reproduction

Analysis started2023-12-12 07:45:57.946637
Analysis finished2023-12-12 07:45:59.112699
Duration1.17 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

MISSING 

Distinct92
Distinct (%)100.0%
Missing4
Missing (%)4.2%
Infinite0
Infinite (%)0.0%
Mean46.5
Minimum1
Maximum92
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size996.0 B
2023-12-12T16:45:59.199026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5.55
Q123.75
median46.5
Q369.25
95-th percentile87.45
Maximum92
Range91
Interquartile range (IQR)45.5

Descriptive statistics

Standard deviation26.70206
Coefficient of variation (CV)0.57423785
Kurtosis-1.2
Mean46.5
Median Absolute Deviation (MAD)23
Skewness0
Sum4278
Variance713
MonotonicityStrictly increasing
2023-12-12T16:45:59.401884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
60 1
 
1.0%
69 1
 
1.0%
68 1
 
1.0%
67 1
 
1.0%
66 1
 
1.0%
65 1
 
1.0%
64 1
 
1.0%
63 1
 
1.0%
62 1
 
1.0%
61 1
 
1.0%
Other values (82) 82
85.4%
(Missing) 4
 
4.2%
ValueCountFrequency (%)
1 1
1.0%
2 1
1.0%
3 1
1.0%
4 1
1.0%
5 1
1.0%
6 1
1.0%
7 1
1.0%
8 1
1.0%
9 1
1.0%
10 1
1.0%
ValueCountFrequency (%)
92 1
1.0%
91 1
1.0%
90 1
1.0%
89 1
1.0%
88 1
1.0%
87 1
1.0%
86 1
1.0%
85 1
1.0%
84 1
1.0%
83 1
1.0%

사업장명
Text

MISSING 

Distinct92
Distinct (%)100.0%
Missing4
Missing (%)4.2%
Memory size900.0 B
2023-12-12T16:45:59.712608image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length13
Mean length8.4021739
Min length3

Characters and Unicode

Total characters773
Distinct characters221
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique92 ?
Unique (%)100.0%

Sample

1st row(주)농협홍삼
2nd row(주)덕산식품 증평공장
3rd row(주)두산전자사업증평공장
4th row(주)디엔피코퍼레이션
5th row(주)엘골인바이오
ValueCountFrequency (%)
주식회사 4
 
3.9%
5019부대 1
 
1.0%
장풍세차장 1
 
1.0%
증평농업기술센터 1
 
1.0%
증평lpg충전소 1
 
1.0%
주안에프엔씨 1
 
1.0%
주식회사해마루 1
 
1.0%
송정현대주유소 1
 
1.0%
광덕 1
 
1.0%
제5019부대(보수대 1
 
1.0%
Other values (89) 89
87.3%
2023-12-12T16:46:00.241737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
40
 
5.2%
( 28
 
3.6%
) 28
 
3.6%
19
 
2.5%
19
 
2.5%
19
 
2.5%
19
 
2.5%
16
 
2.1%
16
 
2.1%
14
 
1.8%
Other values (211) 555
71.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 661
85.5%
Open Punctuation 28
 
3.6%
Close Punctuation 28
 
3.6%
Decimal Number 24
 
3.1%
Uppercase Letter 16
 
2.1%
Space Separator 10
 
1.3%
Other Symbol 3
 
0.4%
Lowercase Letter 2
 
0.3%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
40
 
6.1%
19
 
2.9%
19
 
2.9%
19
 
2.9%
19
 
2.9%
16
 
2.4%
16
 
2.4%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (183) 473
71.6%
Uppercase Letter
ValueCountFrequency (%)
C 3
18.8%
H 2
12.5%
S 2
12.5%
P 1
 
6.2%
L 1
 
6.2%
G 1
 
6.2%
K 1
 
6.2%
A 1
 
6.2%
W 1
 
6.2%
T 1
 
6.2%
Other values (2) 2
12.5%
Decimal Number
ValueCountFrequency (%)
1 6
25.0%
9 4
16.7%
0 3
12.5%
5 3
12.5%
3 2
 
8.3%
2 2
 
8.3%
4 2
 
8.3%
8 1
 
4.2%
7 1
 
4.2%
Lowercase Letter
ValueCountFrequency (%)
s 1
50.0%
x 1
50.0%
Open Punctuation
ValueCountFrequency (%)
( 28
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28
100.0%
Space Separator
ValueCountFrequency (%)
10
100.0%
Other Symbol
ValueCountFrequency (%)
3
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 664
85.9%
Common 91
 
11.8%
Latin 18
 
2.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
40
 
6.0%
19
 
2.9%
19
 
2.9%
19
 
2.9%
19
 
2.9%
16
 
2.4%
16
 
2.4%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (184) 476
71.7%
Latin
ValueCountFrequency (%)
C 3
16.7%
H 2
11.1%
S 2
11.1%
P 1
 
5.6%
L 1
 
5.6%
G 1
 
5.6%
s 1
 
5.6%
K 1
 
5.6%
A 1
 
5.6%
W 1
 
5.6%
Other values (4) 4
22.2%
Common
ValueCountFrequency (%)
( 28
30.8%
) 28
30.8%
10
 
11.0%
1 6
 
6.6%
9 4
 
4.4%
0 3
 
3.3%
5 3
 
3.3%
3 2
 
2.2%
2 2
 
2.2%
4 2
 
2.2%
Other values (3) 3
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 661
85.5%
ASCII 109
 
14.1%
None 3
 
0.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
40
 
6.1%
19
 
2.9%
19
 
2.9%
19
 
2.9%
19
 
2.9%
16
 
2.4%
16
 
2.4%
14
 
2.1%
13
 
2.0%
13
 
2.0%
Other values (183) 473
71.6%
ASCII
ValueCountFrequency (%)
( 28
25.7%
) 28
25.7%
10
 
9.2%
1 6
 
5.5%
9 4
 
3.7%
0 3
 
2.8%
C 3
 
2.8%
5 3
 
2.8%
H 2
 
1.8%
S 2
 
1.8%
Other values (17) 20
18.3%
None
ValueCountFrequency (%)
3
100.0%

사업장 업종
Text

MISSING 

Distinct58
Distinct (%)63.0%
Missing4
Missing (%)4.2%
Memory size900.0 B
2023-12-12T16:46:00.603281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length23
Mean length12.597826
Min length5

Characters and Unicode

Total characters1159
Distinct characters160
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)47.8%

Sample

1st row인삼식품 제조업
2nd row떡류 제조업
3rd row기타 전자부품 제조업
4th row인쇄회로기판용 적층판 제조업 외 2 종
5th row기타 비알콜음료 제조업
ValueCountFrequency (%)
제조업 43
 
12.2%
28
 
7.9%
자동차 22
 
6.2%
세차업 15
 
4.2%
13
 
3.7%
12
 
3.4%
기타 11
 
3.1%
수리업 7
 
2.0%
행정 6
 
1.7%
운영업 6
 
1.7%
Other values (120) 190
53.8%
2023-12-12T16:46:01.166055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
269
23.2%
86
 
7.4%
55
 
4.7%
48
 
4.1%
44
 
3.8%
28
 
2.4%
28
 
2.4%
24
 
2.1%
24
 
2.1%
18
 
1.6%
Other values (150) 535
46.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 875
75.5%
Space Separator 269
 
23.2%
Decimal Number 12
 
1.0%
Other Punctuation 3
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
86
 
9.8%
55
 
6.3%
48
 
5.5%
44
 
5.0%
28
 
3.2%
28
 
3.2%
24
 
2.7%
24
 
2.7%
18
 
2.1%
18
 
2.1%
Other values (144) 502
57.4%
Decimal Number
ValueCountFrequency (%)
1 5
41.7%
2 3
25.0%
3 2
 
16.7%
4 2
 
16.7%
Space Separator
ValueCountFrequency (%)
269
100.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 875
75.5%
Common 284
 
24.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
86
 
9.8%
55
 
6.3%
48
 
5.5%
44
 
5.0%
28
 
3.2%
28
 
3.2%
24
 
2.7%
24
 
2.7%
18
 
2.1%
18
 
2.1%
Other values (144) 502
57.4%
Common
ValueCountFrequency (%)
269
94.7%
1 5
 
1.8%
, 3
 
1.1%
2 3
 
1.1%
3 2
 
0.7%
4 2
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 874
75.4%
ASCII 284
 
24.5%
Compat Jamo 1
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
269
94.7%
1 5
 
1.8%
, 3
 
1.1%
2 3
 
1.1%
3 2
 
0.7%
4 2
 
0.7%
Hangul
ValueCountFrequency (%)
86
 
9.8%
55
 
6.3%
48
 
5.5%
44
 
5.0%
28
 
3.2%
28
 
3.2%
24
 
2.7%
24
 
2.7%
18
 
2.1%
18
 
2.1%
Other values (143) 501
57.3%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

종별구분(종)
Categorical

IMBALANCE 

Distinct5
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size900.0 B
5종
79 
4종
 
7
3종
 
5
<NA>
 
4
2종
 
1

Length

Max length4
Median length2
Mean length2.0833333
Min length2

Unique

Unique1 ?
Unique (%)1.0%

Sample

1st row3종
2nd row5종
3rd row3종
4th row4종
5th row5종

Common Values

ValueCountFrequency (%)
5종 79
82.3%
4종 7
 
7.3%
3종 5
 
5.2%
<NA> 4
 
4.2%
2종 1
 
1.0%

Length

2023-12-12T16:46:01.320922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T16:46:01.427200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5종 79
82.3%
4종 7
 
7.3%
3종 5
 
5.2%
na 4
 
4.2%
2종 1
 
1.0%
Distinct88
Distinct (%)95.7%
Missing4
Missing (%)4.2%
Memory size900.0 B
2023-12-12T16:46:01.722426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length27
Mean length21.01087
Min length6

Characters and Unicode

Total characters1933
Distinct characters93
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique86 ?
Unique (%)93.5%

Sample

1st row충청북도 증평군 증평읍 중앙로 88
2nd row충청북도 증평군 도안면 노암로 129
3rd row충청북도 증평군 증평읍 두산로 40
4th row충청북도 증평군 증평읍 초정약수로 1549
5th row충청북도 증평군 증평읍 장내길 101
ValueCountFrequency (%)
충청북도 88
18.9%
증평군 88
18.9%
증평읍 59
 
12.7%
도안면 29
 
6.2%
초중리 9
 
1.9%
원명로 6
 
1.3%
증천리 6
 
1.3%
군부대 4
 
0.9%
주소 4
 
0.9%
연탄리 4
 
0.9%
Other values (131) 168
36.1%
2023-12-12T16:46:02.208400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
411
21.3%
154
 
8.0%
151
 
7.8%
119
 
6.2%
92
 
4.8%
89
 
4.6%
89
 
4.6%
88
 
4.6%
59
 
3.1%
1 52
 
2.7%
Other values (83) 629
32.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1204
62.3%
Space Separator 411
 
21.3%
Decimal Number 283
 
14.6%
Dash Punctuation 35
 
1.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
154
12.8%
151
12.5%
119
9.9%
92
 
7.6%
89
 
7.4%
89
 
7.4%
88
 
7.3%
59
 
4.9%
48
 
4.0%
29
 
2.4%
Other values (71) 286
23.8%
Decimal Number
ValueCountFrequency (%)
1 52
18.4%
3 32
11.3%
8 32
11.3%
2 31
11.0%
4 31
11.0%
5 30
10.6%
7 20
 
7.1%
6 19
 
6.7%
0 18
 
6.4%
9 18
 
6.4%
Space Separator
ValueCountFrequency (%)
411
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 35
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1204
62.3%
Common 729
37.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
154
12.8%
151
12.5%
119
9.9%
92
 
7.6%
89
 
7.4%
89
 
7.4%
88
 
7.3%
59
 
4.9%
48
 
4.0%
29
 
2.4%
Other values (71) 286
23.8%
Common
ValueCountFrequency (%)
411
56.4%
1 52
 
7.1%
- 35
 
4.8%
3 32
 
4.4%
8 32
 
4.4%
2 31
 
4.3%
4 31
 
4.3%
5 30
 
4.1%
7 20
 
2.7%
6 19
 
2.6%
Other values (2) 36
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1204
62.3%
ASCII 729
37.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
411
56.4%
1 52
 
7.1%
- 35
 
4.8%
3 32
 
4.4%
8 32
 
4.4%
2 31
 
4.3%
4 31
 
4.3%
5 30
 
4.1%
7 20
 
2.7%
6 19
 
2.6%
Other values (2) 36
 
4.9%
Hangul
ValueCountFrequency (%)
154
12.8%
151
12.5%
119
9.9%
92
 
7.6%
89
 
7.4%
89
 
7.4%
88
 
7.3%
59
 
4.9%
48
 
4.0%
29
 
2.4%
Other values (71) 286
23.8%

데이터기준일자
Date

CONSTANT  MISSING 

Distinct1
Distinct (%)1.1%
Missing4
Missing (%)4.2%
Memory size900.0 B
Minimum2023-09-19 00:00:00
Maximum2023-09-19 00:00:00
2023-12-12T16:46:02.409565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T16:46:02.520375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Interactions

2023-12-12T16:45:58.631428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T16:46:02.593289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업장명사업장 업종종별구분(종)소재지도로명주소
연번1.0001.0000.7900.4090.939
사업장명1.0001.0001.0001.0001.000
사업장 업종0.7901.0001.0000.8560.998
종별구분(종)0.4091.0000.8561.0001.000
소재지도로명주소0.9391.0000.9981.0001.000
2023-12-12T16:46:02.717129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번종별구분(종)
연번1.0000.245
종별구분(종)0.2451.000

Missing values

2023-12-12T16:45:58.752187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T16:45:58.883680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T16:45:59.019662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번사업장명사업장 업종종별구분(종)소재지도로명주소데이터기준일자
01(주)농협홍삼인삼식품 제조업3종충청북도 증평군 증평읍 중앙로 882023-09-19
12(주)덕산식품 증평공장떡류 제조업5종충청북도 증평군 도안면 노암로 1292023-09-19
23(주)두산전자사업증평공장기타 전자부품 제조업3종충청북도 증평군 증평읍 두산로 402023-09-19
34(주)디엔피코퍼레이션인쇄회로기판용 적층판 제조업 외 2 종4종충청북도 증평군 증평읍 초정약수로 15492023-09-19
45(주)엘골인바이오기타 비알콜음료 제조업5종충청북도 증평군 증평읍 장내길 1012023-09-19
56에이치티에스(HTS)그외 기타 분류안된 비금속 광물제품 제조업4종충청북도 증평군 도안면 화성리 388-42023-09-19
67(주)케이에스피플라스틱 필름 시트 및 판 제조업5종충청북도 증평군 도안면 행갈길 532023-09-19
78(주)코스텍혼성 및 재생 플라스틱 소재 물질 제조업5종충청북도 증평군 도안면 노암로 852023-09-19
89(주)풀무원녹즙기타 과실ㆍ채소 가공 및 저장 처리업3종충청북도 증평군 도안면 원명로 352023-09-19
910(주)피엔티코리아포장용 플라스틱 성형용기 제조업5종충청북도 증평군 도안면 구봉정길 872023-09-19
연번사업장명사업장 업종종별구분(종)소재지도로명주소데이터기준일자
8687증평1급자동차공업사자동차 종합 수리업5종충청북도 증평군 증평읍 미암리 771-1외 5필지2023-09-19
8788진천증평농협조합공동사업법인곡물 도정업4종충청북도 증평군 증평읍 죽리 652023-09-19
8889초원농산조사료동물용 사료 및 조제식품 제조업5종충청북도 증평군 도안면 화성리 산 1 외 2필지2023-09-19
8990현대자동차 증평서비스자동차 전문 수리업5종충청북도 증평군 증평읍 증천리 931 외 3필지2023-09-19
9091㈜블랙스톤에듀팜리조트휴양콘도 운영업5종충청북도 증평군 도안면 연촌리 산 582023-09-19
9192주식회사 유니버셜리프트앤히타치코리아기타 물품취급장비 제조업5종충청북도 증평군 도안면 도당리 284-32023-09-19
92<NA><NA><NA><NA><NA><NA>
93<NA><NA><NA><NA><NA><NA>
94<NA><NA><NA><NA><NA><NA>
95<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번사업장명사업장 업종종별구분(종)소재지도로명주소데이터기준일자# duplicates
0<NA><NA><NA><NA><NA><NA>4