Overview

Dataset statistics

Number of variables5
Number of observations687
Missing cells4
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.6 KiB
Average record size in memory41.2 B

Variable types

Numeric1
Text3
Categorical1

Dataset

Description청주시 관내의 대기오염물질 배출사업장(1종~5종)의 데이터로 배출사업장 업체명, 소재지, 업종 및 종별에 대한 데이터
URLhttps://www.data.go.kr/data/15080689/fileData.do

Alerts

종별 is highly imbalanced (59.2%)Imbalance
연 번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:49:14.796416
Analysis finished2023-12-12 12:49:15.720170
Duration0.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연 번
Real number (ℝ)

UNIQUE 

Distinct687
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean344
Minimum1
Maximum687
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.2 KiB
2023-12-12T21:49:15.802776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile35.3
Q1172.5
median344
Q3515.5
95-th percentile652.7
Maximum687
Range686
Interquartile range (IQR)343

Descriptive statistics

Standard deviation198.4641
Coefficient of variation (CV)0.57693053
Kurtosis-1.2
Mean344
Median Absolute Deviation (MAD)172
Skewness0
Sum236328
Variance39388
MonotonicityStrictly increasing
2023-12-12T21:49:15.997013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
463 1
 
0.1%
455 1
 
0.1%
456 1
 
0.1%
457 1
 
0.1%
458 1
 
0.1%
459 1
 
0.1%
460 1
 
0.1%
461 1
 
0.1%
462 1
 
0.1%
Other values (677) 677
98.5%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
687 1
0.1%
686 1
0.1%
685 1
0.1%
684 1
0.1%
683 1
0.1%
682 1
0.1%
681 1
0.1%
680 1
0.1%
679 1
0.1%
678 1
0.1%
Distinct677
Distinct (%)98.5%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2023-12-12T21:49:16.310571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length8.3391557
Min length2

Characters and Unicode

Total characters5729
Distinct characters408
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique668 ?
Unique (%)97.2%

Sample

1st row㈜유니온
2nd row삼화제지㈜
3rd row깨끗한나라㈜
4th row대한제지㈜
5th row나투라페이퍼(주)
ValueCountFrequency (%)
주식회사 43
 
5.4%
농업회사법인 7
 
0.9%
청주공장 6
 
0.8%
청주농산(주 3
 
0.4%
청주지점 3
 
0.4%
우리도시산업(주 2
 
0.3%
광복영농조합법인 2
 
0.3%
청주시시설관리공단 2
 
0.3%
주)창우rs 2
 
0.3%
주)동원레미콘 2
 
0.3%
Other values (708) 721
90.9%
2023-12-12T21:49:16.774900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
501
 
8.7%
) 379
 
6.6%
( 379
 
6.6%
147
 
2.6%
145
 
2.5%
133
 
2.3%
113
 
2.0%
107
 
1.9%
106
 
1.9%
104
 
1.8%
Other values (398) 3615
63.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4748
82.9%
Close Punctuation 379
 
6.6%
Open Punctuation 379
 
6.6%
Space Separator 107
 
1.9%
Decimal Number 50
 
0.9%
Uppercase Letter 33
 
0.6%
Other Symbol 29
 
0.5%
Other Punctuation 2
 
< 0.1%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
501
 
10.6%
147
 
3.1%
145
 
3.1%
133
 
2.8%
113
 
2.4%
106
 
2.2%
104
 
2.2%
80
 
1.7%
79
 
1.7%
75
 
1.6%
Other values (369) 3265
68.8%
Uppercase Letter
ValueCountFrequency (%)
S 6
18.2%
C 5
15.2%
R 4
12.1%
T 2
 
6.1%
G 2
 
6.1%
A 2
 
6.1%
I 2
 
6.1%
O 1
 
3.0%
B 1
 
3.0%
N 1
 
3.0%
Other values (7) 7
21.2%
Decimal Number
ValueCountFrequency (%)
1 26
52.0%
2 17
34.0%
0 2
 
4.0%
8 2
 
4.0%
7 2
 
4.0%
3 1
 
2.0%
Close Punctuation
ValueCountFrequency (%)
) 379
100.0%
Open Punctuation
ValueCountFrequency (%)
( 379
100.0%
Space Separator
ValueCountFrequency (%)
107
100.0%
Other Symbol
ValueCountFrequency (%)
29
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4777
83.4%
Common 919
 
16.0%
Latin 33
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
501
 
10.5%
147
 
3.1%
145
 
3.0%
133
 
2.8%
113
 
2.4%
106
 
2.2%
104
 
2.2%
80
 
1.7%
79
 
1.7%
75
 
1.6%
Other values (370) 3294
69.0%
Latin
ValueCountFrequency (%)
S 6
18.2%
C 5
15.2%
R 4
12.1%
T 2
 
6.1%
G 2
 
6.1%
A 2
 
6.1%
I 2
 
6.1%
O 1
 
3.0%
B 1
 
3.0%
N 1
 
3.0%
Other values (7) 7
21.2%
Common
ValueCountFrequency (%)
) 379
41.2%
( 379
41.2%
107
 
11.6%
1 26
 
2.8%
2 17
 
1.8%
. 2
 
0.2%
0 2
 
0.2%
8 2
 
0.2%
7 2
 
0.2%
- 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4748
82.9%
ASCII 952
 
16.6%
None 29
 
0.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
501
 
10.6%
147
 
3.1%
145
 
3.1%
133
 
2.8%
113
 
2.4%
106
 
2.2%
104
 
2.2%
80
 
1.7%
79
 
1.7%
75
 
1.6%
Other values (369) 3265
68.8%
ASCII
ValueCountFrequency (%)
) 379
39.8%
( 379
39.8%
107
 
11.2%
1 26
 
2.7%
2 17
 
1.8%
S 6
 
0.6%
C 5
 
0.5%
R 4
 
0.4%
. 2
 
0.2%
0 2
 
0.2%
Other values (18) 25
 
2.6%
None
ValueCountFrequency (%)
29
100.0%
Distinct672
Distinct (%)98.0%
Missing1
Missing (%)0.1%
Memory size5.5 KiB
2023-12-12T21:49:17.107988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length56
Median length45
Mean length25.91691
Min length16

Characters and Unicode

Total characters17779
Distinct characters240
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique659 ?
Unique (%)96.1%

Sample

1st row충청북도 청주시 상당구 가덕면 상장인차로 27
2nd row충청북도 청주시 흥덕구 옥산면 오산가좌로 415-14
3rd row충청북도 청주시 흥덕구 강내면 태성1길 64
4th row충청북도 청주시 흥덕구 오송읍 상정쌍청로 256
5th row충청북도 청주시 흥덕구 오송읍 상정쌍청로 171
ValueCountFrequency (%)
청주시 686
 
17.2%
충청북도 544
 
13.6%
청원구 262
 
6.6%
흥덕구 217
 
5.4%
서원구 141
 
3.5%
북이면 106
 
2.7%
오창읍 88
 
2.2%
남이면 72
 
1.8%
강내면 70
 
1.8%
상당구 65
 
1.6%
Other values (953) 1737
43.6%
2023-12-12T21:49:17.709748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3923
22.1%
1533
 
8.6%
710
 
4.0%
693
 
3.9%
692
 
3.9%
658
 
3.7%
585
 
3.3%
560
 
3.1%
1 502
 
2.8%
- 449
 
2.5%
Other values (230) 7474
42.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 10722
60.3%
Space Separator 3923
 
22.1%
Decimal Number 2617
 
14.7%
Dash Punctuation 449
 
2.5%
Close Punctuation 23
 
0.1%
Open Punctuation 23
 
0.1%
Other Punctuation 15
 
0.1%
Uppercase Letter 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1533
 
14.3%
710
 
6.6%
693
 
6.5%
692
 
6.5%
658
 
6.1%
585
 
5.5%
560
 
5.2%
432
 
4.0%
390
 
3.6%
327
 
3.0%
Other values (206) 4142
38.6%
Decimal Number
ValueCountFrequency (%)
1 502
19.2%
2 424
16.2%
3 331
12.6%
4 235
9.0%
6 218
8.3%
8 191
 
7.3%
5 188
 
7.2%
0 185
 
7.1%
7 183
 
7.0%
9 160
 
6.1%
Uppercase Letter
ValueCountFrequency (%)
D 2
28.6%
N 1
14.3%
B 1
14.3%
C 1
14.3%
K 1
14.3%
S 1
14.3%
Close Punctuation
ValueCountFrequency (%)
) 21
91.3%
] 2
 
8.7%
Open Punctuation
ValueCountFrequency (%)
( 21
91.3%
[ 2
 
8.7%
Other Punctuation
ValueCountFrequency (%)
, 12
80.0%
. 3
 
20.0%
Space Separator
ValueCountFrequency (%)
3923
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 449
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 10722
60.3%
Common 7050
39.7%
Latin 7
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1533
 
14.3%
710
 
6.6%
693
 
6.5%
692
 
6.5%
658
 
6.1%
585
 
5.5%
560
 
5.2%
432
 
4.0%
390
 
3.6%
327
 
3.0%
Other values (206) 4142
38.6%
Common
ValueCountFrequency (%)
3923
55.6%
1 502
 
7.1%
- 449
 
6.4%
2 424
 
6.0%
3 331
 
4.7%
4 235
 
3.3%
6 218
 
3.1%
8 191
 
2.7%
5 188
 
2.7%
0 185
 
2.6%
Other values (8) 404
 
5.7%
Latin
ValueCountFrequency (%)
D 2
28.6%
N 1
14.3%
B 1
14.3%
C 1
14.3%
K 1
14.3%
S 1
14.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 10722
60.3%
ASCII 7057
39.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3923
55.6%
1 502
 
7.1%
- 449
 
6.4%
2 424
 
6.0%
3 331
 
4.7%
4 235
 
3.3%
6 218
 
3.1%
8 191
 
2.7%
5 188
 
2.7%
0 185
 
2.6%
Other values (14) 411
 
5.8%
Hangul
ValueCountFrequency (%)
1533
 
14.3%
710
 
6.6%
693
 
6.5%
692
 
6.5%
658
 
6.1%
585
 
5.5%
560
 
5.2%
432
 
4.0%
390
 
3.6%
327
 
3.0%
Other values (206) 4142
38.6%
Distinct319
Distinct (%)46.6%
Missing3
Missing (%)0.4%
Memory size5.5 KiB
2023-12-12T21:49:18.071507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length45
Median length38
Mean length11.983918
Min length1

Characters and Unicode

Total characters8197
Distinct characters248
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique233 ?
Unique (%)34.1%

Sample

1st row석회 및 플라스터 제조업
2nd row종이제품제조
3rd row종이제품제조
4th row종이제품제조
5th row종이제품제조
ValueCountFrequency (%)
제조업 223
 
10.6%
200
 
9.5%
자동차 103
 
4.9%
수리업 95
 
4.5%
기타 93
 
4.4%
종합 58
 
2.8%
플라스틱 46
 
2.2%
처리업 42
 
2.0%
폐기물 33
 
1.6%
가스상물질 29
 
1.4%
Other values (448) 1184
56.2%
2023-12-12T21:49:18.648335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1499
 
18.3%
603
 
7.4%
449
 
5.5%
346
 
4.2%
227
 
2.8%
226
 
2.8%
213
 
2.6%
204
 
2.5%
167
 
2.0%
152
 
1.9%
Other values (238) 4111
50.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6652
81.2%
Space Separator 1499
 
18.3%
Other Punctuation 39
 
0.5%
Decimal Number 3
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
603
 
9.1%
449
 
6.7%
346
 
5.2%
227
 
3.4%
226
 
3.4%
213
 
3.2%
204
 
3.1%
167
 
2.5%
152
 
2.3%
146
 
2.2%
Other values (231) 3919
58.9%
Other Punctuation
ValueCountFrequency (%)
, 35
89.7%
· 3
 
7.7%
? 1
 
2.6%
Space Separator
ValueCountFrequency (%)
1499
100.0%
Decimal Number
ValueCountFrequency (%)
1 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6652
81.2%
Common 1545
 
18.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
603
 
9.1%
449
 
6.7%
346
 
5.2%
227
 
3.4%
226
 
3.4%
213
 
3.2%
204
 
3.1%
167
 
2.5%
152
 
2.3%
146
 
2.2%
Other values (231) 3919
58.9%
Common
ValueCountFrequency (%)
1499
97.0%
, 35
 
2.3%
1 3
 
0.2%
· 3
 
0.2%
( 2
 
0.1%
) 2
 
0.1%
? 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6637
81.0%
ASCII 1542
 
18.8%
Compat Jamo 15
 
0.2%
None 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1499
97.2%
, 35
 
2.3%
1 3
 
0.2%
( 2
 
0.1%
) 2
 
0.1%
? 1
 
0.1%
Hangul
ValueCountFrequency (%)
603
 
9.1%
449
 
6.8%
346
 
5.2%
227
 
3.4%
226
 
3.4%
213
 
3.2%
204
 
3.1%
167
 
2.5%
152
 
2.3%
146
 
2.2%
Other values (230) 3904
58.8%
Compat Jamo
ValueCountFrequency (%)
15
100.0%
None
ValueCountFrequency (%)
· 3
100.0%

종별
Categorical

IMBALANCE 

Distinct8
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
5종
453 
4종
202 
2종
 
11
3종
 
10
1종
 
5
Other values (3)
 
6

Length

Max length5
Median length2
Mean length2.0262009
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row1종
2nd row3종
3rd row1종
4th row1종
5th row1종

Common Values

ValueCountFrequency (%)
5종 453
65.9%
4종 202
29.4%
2종 11
 
1.6%
3종 10
 
1.5%
1종 5
 
0.7%
5종(허) 3
 
0.4%
4종(허) 2
 
0.3%
3종(허) 1
 
0.1%

Length

2023-12-12T21:49:18.866028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:49:19.029456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
5종 453
65.9%
4종 202
29.4%
2종 11
 
1.6%
3종 10
 
1.5%
1종 5
 
0.7%
5종(허 3
 
0.4%
4종(허 2
 
0.3%
3종(허 1
 
0.1%

Interactions

2023-12-12T21:49:15.315098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:49:19.127318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연 번종별
연 번1.0000.209
종별0.2091.000
2023-12-12T21:49:19.222007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연 번종별
연 번1.0000.101
종별0.1011.000

Missing values

2023-12-12T21:49:15.439837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:49:15.556821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:49:15.660006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연 번업체명소재지업 종종별
01㈜유니온충청북도 청주시 상당구 가덕면 상장인차로 27석회 및 플라스터 제조업1종
12삼화제지㈜충청북도 청주시 흥덕구 옥산면 오산가좌로 415-14종이제품제조3종
23깨끗한나라㈜충청북도 청주시 흥덕구 강내면 태성1길 64종이제품제조1종
34대한제지㈜충청북도 청주시 흥덕구 오송읍 상정쌍청로 256종이제품제조1종
45나투라페이퍼(주)충청북도 청주시 흥덕구 오송읍 상정쌍청로 171종이제품제조1종
56㈜팜스토리한냉충청북도 청주시 청원구 오창읍 성재2길 21육지동물가공처리4종
67대진산업㈜충청북도 청주시 흥덕구 강내면 황탄리길 169도장및피막처리업2종
78㈜금진충청북도 청주시 흥덕구 옥산면 환희길 337벽지 및 장판제조업2종
89영보화학㈜충청북도 청주시 흥덕구 강내면 서부로 230-23플라스틱 발포 성형제품제조업2종
910(주)청주석회충청북도 청주시 상당구 가덕면 금거리 281-7비금속광물 분쇄물 생산업4종
연 번업체명소재지업 종종별
677678(주)유라코퍼레이션충청북도 청주시 흥덕구 오송읍 연제리 388-10자동차부품 제조업5종
678679롯데쇼핑(주) 아울렛청주점충청북도 청주시 흥덕구 비하동 811 롯데아울렛 청주점기타 대형 종합 소매업4종
679680(주)유한충청북도 청주시 흥덕구 옥산면 수락리 356-6포장용 플라스틱 성형용기 제조업5종
680681(주)똥광미곡처리장충청북도 청주시 흥덕구 강내면 월탄리 222곡물 도정업4종
681682(주)유니켐텍충청북도 청주시 흥덕구 옥산면 환희리 27-25 27-26플라스틱제품 제조업5종
682683동아식품(주)충청북도 청주시 흥덕구 송절동 89 동아식품(주)육류 가공식품 도매업5종
683684농협은행 주식회사(지웰시티몰2)충청북도 청주시 흥덕구 복대동 3381금융업5종
684685주식회사 아이케이엠앤에스충청북도 청주시 흥덕구 옥산면 환희리 산 87 산90-2비금속광물 분쇄물 생산업4종
685686(주)서룡개발충청북도 청주시 흥덕구 옥산면 동림리 247-7 산86 산84-4비금속광물 분쇄물 생산업4종
686687(주)창우RS충청북도 청주시 흥덕구 옥산면 국사리 182-2폐기물 처리업5종