Overview

Dataset statistics

Number of variables8
Number of observations133
Missing cells54
Missing cells (%)5.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.7 KiB
Average record size in memory67.0 B

Variable types

Numeric1
Text4
Categorical3

Dataset

Description전라북도 고창군 관내에 등록된 대기오염배출 사업장 인허가일자 사업장명 주소(도로명) 전봐번호 업종 대기종별 인허가 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15080455/fileData.do

Alerts

업종 is highly overall correlated with 대기종별 and 1 other fieldsHigh correlation
대기종별 is highly overall correlated with 업종 and 1 other fieldsHigh correlation
인허가 is highly overall correlated with 업종 and 1 other fieldsHigh correlation
인허가 is highly imbalanced (76.9%)Imbalance
전화번호 has 54 (40.6%) missing valuesMissing
연번 has unique valuesUnique
사업장명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:18:46.520071
Analysis finished2023-12-12 23:18:47.201372
Duration0.68 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct133
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean67
Minimum1
Maximum133
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.3 KiB
2023-12-13T08:18:47.266597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7.6
Q134
median67
Q3100
95-th percentile126.4
Maximum133
Range132
Interquartile range (IQR)66

Descriptive statistics

Standard deviation38.53786
Coefficient of variation (CV)0.57519194
Kurtosis-1.2
Mean67
Median Absolute Deviation (MAD)33
Skewness0
Sum8911
Variance1485.1667
MonotonicityStrictly increasing
2023-12-13T08:18:47.395840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.8%
85 1
 
0.8%
99 1
 
0.8%
98 1
 
0.8%
97 1
 
0.8%
96 1
 
0.8%
95 1
 
0.8%
94 1
 
0.8%
93 1
 
0.8%
92 1
 
0.8%
Other values (123) 123
92.5%
ValueCountFrequency (%)
1 1
0.8%
2 1
0.8%
3 1
0.8%
4 1
0.8%
5 1
0.8%
6 1
0.8%
7 1
0.8%
8 1
0.8%
9 1
0.8%
10 1
0.8%
ValueCountFrequency (%)
133 1
0.8%
132 1
0.8%
131 1
0.8%
130 1
0.8%
129 1
0.8%
128 1
0.8%
127 1
0.8%
126 1
0.8%
125 1
0.8%
124 1
0.8%
Distinct119
Distinct (%)89.5%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-13T08:18:47.640387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1330
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique108 ?
Unique (%)81.2%

Sample

1st row1972-08-17
2nd row1988-11-23
3rd row2007-05-07
4th row2001-11-08
5th row1990-08-11
ValueCountFrequency (%)
1991-07-13 5
 
3.8%
1995-12-20 2
 
1.5%
2007-05-28 2
 
1.5%
2001-01-26 2
 
1.5%
1996-11-26 2
 
1.5%
1996-02-08 2
 
1.5%
1996-01-16 2
 
1.5%
1991-07-15 2
 
1.5%
2005-11-15 2
 
1.5%
2002-03-04 2
 
1.5%
Other values (109) 110
82.7%
2023-12-13T08:18:48.040460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 303
22.8%
- 266
20.0%
1 212
15.9%
2 207
15.6%
9 107
 
8.0%
6 48
 
3.6%
7 47
 
3.5%
3 40
 
3.0%
8 36
 
2.7%
5 34
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1064
80.0%
Dash Punctuation 266
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 303
28.5%
1 212
19.9%
2 207
19.5%
9 107
 
10.1%
6 48
 
4.5%
7 47
 
4.4%
3 40
 
3.8%
8 36
 
3.4%
5 34
 
3.2%
4 30
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 266
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1330
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 303
22.8%
- 266
20.0%
1 212
15.9%
2 207
15.6%
9 107
 
8.0%
6 48
 
3.6%
7 47
 
3.5%
3 40
 
3.0%
8 36
 
2.7%
5 34
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1330
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 303
22.8%
- 266
20.0%
1 212
15.9%
2 207
15.6%
9 107
 
8.0%
6 48
 
3.6%
7 47
 
3.5%
3 40
 
3.0%
8 36
 
2.7%
5 34
 
2.6%

사업장명
Text

UNIQUE 

Distinct133
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-13T08:18:48.264518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length16
Mean length7.7293233
Min length2

Characters and Unicode

Total characters1028
Distinct characters184
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique133 ?
Unique (%)100.0%

Sample

1st row흥덕도정공장
2nd row고창자동차공업사
3rd row동백호텔
4th row영농오성미곡종합처리장
5th row신성레미콘
ValueCountFrequency (%)
양만장 6
 
3.5%
수산 4
 
2.3%
고창군 4
 
2.3%
주식회사 3
 
1.8%
공공 3
 
1.8%
양어장 3
 
1.8%
농업회사법인 2
 
1.2%
유한회사 2
 
1.2%
쓰레기 2
 
1.2%
유)고창레미콘 2
 
1.2%
Other values (140) 140
81.9%
2023-12-13T08:18:48.614789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 45
 
4.4%
( 42
 
4.1%
38
 
3.7%
36
 
3.5%
31
 
3.0%
31
 
3.0%
30
 
2.9%
24
 
2.3%
23
 
2.2%
23
 
2.2%
Other values (174) 705
68.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 901
87.6%
Close Punctuation 45
 
4.4%
Open Punctuation 42
 
4.1%
Space Separator 38
 
3.7%
Decimal Number 2
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
36
 
4.0%
31
 
3.4%
31
 
3.4%
30
 
3.3%
24
 
2.7%
23
 
2.6%
23
 
2.6%
23
 
2.6%
22
 
2.4%
21
 
2.3%
Other values (169) 637
70.7%
Decimal Number
ValueCountFrequency (%)
2 1
50.0%
1 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 45
100.0%
Open Punctuation
ValueCountFrequency (%)
( 42
100.0%
Space Separator
ValueCountFrequency (%)
38
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 901
87.6%
Common 127
 
12.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
36
 
4.0%
31
 
3.4%
31
 
3.4%
30
 
3.3%
24
 
2.7%
23
 
2.6%
23
 
2.6%
23
 
2.6%
22
 
2.4%
21
 
2.3%
Other values (169) 637
70.7%
Common
ValueCountFrequency (%)
) 45
35.4%
( 42
33.1%
38
29.9%
2 1
 
0.8%
1 1
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 901
87.6%
ASCII 127
 
12.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 45
35.4%
( 42
33.1%
38
29.9%
2 1
 
0.8%
1 1
 
0.8%
Hangul
ValueCountFrequency (%)
36
 
4.0%
31
 
3.4%
31
 
3.4%
30
 
3.3%
24
 
2.7%
23
 
2.6%
23
 
2.6%
23
 
2.6%
22
 
2.4%
21
 
2.3%
Other values (169) 637
70.7%
Distinct125
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
2023-12-13T08:18:48.792300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length26
Mean length22.533835
Min length18

Characters and Unicode

Total characters2997
Distinct characters142
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique117 ?
Unique (%)88.0%

Sample

1st row전라북도 고창군 흥덕면 잿말길 40
2nd row전라북도 고창군 신림면 고인돌대로 2051-7
3rd row전라북도 고창군 아산면 중촌길 26
4th row전라북도 고창군 해리면 동서대로 435
5th row전라북도 고창군 흥덕면 동헌길 146-167
ValueCountFrequency (%)
전라북도 133
19.9%
고창군 133
19.9%
심원면 24
 
3.6%
아산면 18
 
2.7%
흥덕면 18
 
2.7%
선운대로 18
 
2.7%
해리면 14
 
2.1%
고수면 11
 
1.6%
아산농공단지길 10
 
1.5%
고창읍 9
 
1.3%
Other values (194) 281
42.0%
2023-12-13T08:18:49.072416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
538
18.0%
168
 
5.6%
143
 
4.8%
139
 
4.6%
134
 
4.5%
133
 
4.4%
133
 
4.4%
133
 
4.4%
124
 
4.1%
1 80
 
2.7%
Other values (132) 1272
42.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1917
64.0%
Space Separator 538
 
18.0%
Decimal Number 476
 
15.9%
Dash Punctuation 58
 
1.9%
Open Punctuation 4
 
0.1%
Close Punctuation 4
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
168
 
8.8%
143
 
7.5%
139
 
7.3%
134
 
7.0%
133
 
6.9%
133
 
6.9%
133
 
6.9%
124
 
6.5%
76
 
4.0%
53
 
2.8%
Other values (118) 681
35.5%
Decimal Number
ValueCountFrequency (%)
1 80
16.8%
2 65
13.7%
3 63
13.2%
4 62
13.0%
6 45
9.5%
5 36
7.6%
9 33
6.9%
0 32
 
6.7%
8 31
 
6.5%
7 29
 
6.1%
Space Separator
ValueCountFrequency (%)
538
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 58
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1917
64.0%
Common 1080
36.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
168
 
8.8%
143
 
7.5%
139
 
7.3%
134
 
7.0%
133
 
6.9%
133
 
6.9%
133
 
6.9%
124
 
6.5%
76
 
4.0%
53
 
2.8%
Other values (118) 681
35.5%
Common
ValueCountFrequency (%)
538
49.8%
1 80
 
7.4%
2 65
 
6.0%
3 63
 
5.8%
4 62
 
5.7%
- 58
 
5.4%
6 45
 
4.2%
5 36
 
3.3%
9 33
 
3.1%
0 32
 
3.0%
Other values (4) 68
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1917
64.0%
ASCII 1080
36.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
538
49.8%
1 80
 
7.4%
2 65
 
6.0%
3 63
 
5.8%
4 62
 
5.7%
- 58
 
5.4%
6 45
 
4.2%
5 36
 
3.3%
9 33
 
3.1%
0 32
 
3.0%
Other values (4) 68
 
6.3%
Hangul
ValueCountFrequency (%)
168
 
8.8%
143
 
7.5%
139
 
7.3%
134
 
7.0%
133
 
6.9%
133
 
6.9%
133
 
6.9%
124
 
6.5%
76
 
4.0%
53
 
2.8%
Other values (118) 681
35.5%

전화번호
Text

MISSING 

Distinct75
Distinct (%)94.9%
Missing54
Missing (%)40.6%
Memory size1.2 KiB
2023-12-13T08:18:49.288067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters948
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique71 ?
Unique (%)89.9%

Sample

1st row063-562-6005
2nd row063-563-9000
3rd row063-563-6770
4th row063-561-0488
5th row063-563-8322
ValueCountFrequency (%)
063-563-6770 2
 
2.5%
063-564-4747 2
 
2.5%
063-561-3106 2
 
2.5%
063-561-0008 2
 
2.5%
063-561-0014 1
 
1.3%
063-560-7582 1
 
1.3%
063-561-5771 1
 
1.3%
063-903-9906 1
 
1.3%
063-563-2730 1
 
1.3%
063-561-5156 1
 
1.3%
Other values (65) 65
82.3%
2023-12-13T08:18:49.652165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6 185
19.5%
- 158
16.7%
0 153
16.1%
3 128
13.5%
5 112
11.8%
1 49
 
5.2%
2 41
 
4.3%
4 37
 
3.9%
7 35
 
3.7%
8 28
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 790
83.3%
Dash Punctuation 158
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6 185
23.4%
0 153
19.4%
3 128
16.2%
5 112
14.2%
1 49
 
6.2%
2 41
 
5.2%
4 37
 
4.7%
7 35
 
4.4%
8 28
 
3.5%
9 22
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 158
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 948
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6 185
19.5%
- 158
16.7%
0 153
16.1%
3 128
13.5%
5 112
11.8%
1 49
 
5.2%
2 41
 
4.3%
4 37
 
3.9%
7 35
 
3.7%
8 28
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 948
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6 185
19.5%
- 158
16.7%
0 153
16.1%
3 128
13.5%
5 112
11.8%
1 49
 
5.2%
2 41
 
4.3%
4 37
 
3.9%
7 35
 
3.7%
8 28
 
3.0%

업종
Categorical

HIGH CORRELATION 

Distinct32
Distinct (%)24.1%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
내수면 어업
48 
곡물 도정업
13 
<NA>
12 
레미콘 제조업
10 
자동차 종합 수리업
Other values (27)
45 

Length

Max length18
Median length6
Mean length6.7744361
Min length3

Unique

Unique15 ?
Unique (%)11.3%

Sample

1st row곡물 도정업
2nd row종합 자동차 수리업
3rd row숙박업
4th row곡물 저장업
5th row레미콘 제조업

Common Values

ValueCountFrequency (%)
내수면 어업 48
36.1%
곡물 도정업 13
 
9.8%
<NA> 12
 
9.0%
레미콘 제조업 10
 
7.5%
자동차 종합 수리업 5
 
3.8%
식품 제조업 4
 
3.0%
곡물 제조업 3
 
2.3%
폐기물 종합 처리업 3
 
2.3%
플라스틱 제품 제조업 3
 
2.3%
숙박업 3
 
2.3%
Other values (22) 29
21.8%

Length

2023-12-13T08:18:49.828771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
내수면 48
17.4%
어업 48
17.4%
제조업 38
13.8%
곡물 17
 
6.2%
도정업 13
 
4.7%
na 12
 
4.3%
종합 11
 
4.0%
레미콘 10
 
3.6%
제품 8
 
2.9%
자동차 7
 
2.5%
Other values (28) 64
23.2%

대기종별
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
5
68 
4
58 
3
 
3
2
 
3
1
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.8%

Sample

1st row5
2nd row5
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
5 68
51.1%
4 58
43.6%
3 3
 
2.3%
2 3
 
2.3%
1 1
 
0.8%

Length

2023-12-13T08:18:49.944750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:18:50.037359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5 68
51.1%
4 58
43.6%
3 3
 
2.3%
2 3
 
2.3%
1 1
 
0.8%

인허가
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.2 KiB
신고
128 
허가
 
5

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row신고
2nd row신고
3rd row신고
4th row신고
5th row신고

Common Values

ValueCountFrequency (%)
신고 128
96.2%
허가 5
 
3.8%

Length

2023-12-13T08:18:50.137500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:18:50.216400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
신고 128
96.2%
허가 5
 
3.8%

Interactions

2023-12-13T08:18:46.910447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:18:50.266543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번전화번호업종대기종별인허가
연번1.0000.0000.6650.3590.073
전화번호0.0001.0000.9900.9981.000
업종0.6650.9901.0000.9501.000
대기종별0.3590.9980.9501.0000.754
인허가0.0731.0001.0000.7541.000
2023-12-13T08:18:50.343858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종대기종별인허가
업종1.0000.7200.870
대기종별0.7201.0000.878
인허가0.8700.8781.000
2023-12-13T08:18:50.411507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번업종대기종별인허가
연번1.0000.2680.1500.049
업종0.2681.0000.7200.870
대기종별0.1500.7201.0000.878
인허가0.0490.8700.8781.000

Missing values

2023-12-13T08:18:47.024902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:18:47.157565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번인허가일자사업장명주소 (도로명)전화번호업종대기종별인허가
011972-08-17흥덕도정공장전라북도 고창군 흥덕면 잿말길 40063-562-6005곡물 도정업5신고
121988-11-23고창자동차공업사전라북도 고창군 신림면 고인돌대로 2051-7063-563-9000종합 자동차 수리업5신고
232007-05-07동백호텔전라북도 고창군 아산면 중촌길 26<NA>숙박업4신고
342001-11-08영농오성미곡종합처리장전라북도 고창군 해리면 동서대로 435063-563-6770곡물 저장업4신고
451990-08-11신성레미콘전라북도 고창군 흥덕면 동헌길 146-167063-561-0488레미콘 제조업4신고
561991-06-10주식회사 동산유지전라북도 고창군 고수면 고수농공단지길 42063-563-8322식용류 제품 제조업3신고
671991-07-13화신양만장전라북도 고창군 아산면 인천강변로 499-8063-564-7855내수면 어업4신고
781991-07-13태흥양만장전라북도 고창군 아산면 인천강변로 469063-564-0577내수면 어업4신고
891991-07-13풍천양만장전라북도 고창군 심원면 월등길 21-32063-564-5033내수면 어업4신고
9101991-07-13마산양만장전라북도 고창군 심원면 연곡길 26063-563-1881내수면 어업5신고
연번인허가일자사업장명주소 (도로명)전화번호업종대기종별인허가
1231242021-01-15(유)은광산업개발전라북도 고창군 부안면 질마재로 271-78063-564-6528비금속광물 채취업3신고
1241252021-02-09고창부안축산업협동조합(경제사업장)전라북도 고창군 흥덕면 부안로 423063-560-3030곡물 제조업4신고
1251262021-06-29주식회사 이지탑전라북도 고창군 흥덕면 선운대로 3619-68063-563-0060조립식 패널생산품 제조업5신고
1261272021-09-17고창중앙현대서비스전라북도 고창군 고창읍 보릿골로 106063-561-1911자동차 종합 수리업5신고
1271282021-09-17(주)엄지식품전라북도 고창군 부안면 복분자로 434-63063-547-0606식품 제조업5신고
1281292022-01-24(주)축복건설전라북도 고창군 성송면 대성로 741063-562-2007비금속광물 채취 가공업5신고
1291302022-02-15(유)대성산업전라북도 고창군 부안면 질마재로 181-22063-564-6900비금속광물 채취 가공업4신고
1301312019-01-08고창군 공공 쓰레기 소각시설전라북도 고창군 아산면 계산리 684063-560-2914<NA>1허가
1311322021-12-33고창군 공공 음식물 쓰레기 처리시설전라북도 고창군 아산면 계산리 684063-560-2879<NA>4신고
1321332022-03-07고창군 공공 분뇨처리시설전라북도 고창군 공음면 덕암리 산111063-560-2905<NA>3신고