Overview

Dataset statistics

Number of variables8
Number of observations3930
Missing cells630
Missing cells (%)2.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory253.4 KiB
Average record size in memory66.0 B

Variable types

Numeric1
Categorical5
Text2

Dataset

Description광역지자체별 지역별온실가스현황 정보를 제공합니다.
Author충청남도
URLhttps://alldam.chungnam.go.kr/bigdata/collect/view.chungnam?menuCd=DOM_000000201001001000&apiIdx=117

Alerts

행정동코드 has constant value ""Constant
소분류 is highly overall correlated with 구분 and 2 other fieldsHigh correlation
구분 is highly overall correlated with 대분류 and 2 other fieldsHigh correlation
세분류 is highly overall correlated with 구분 and 2 other fieldsHigh correlation
대분류 is highly overall correlated with 구분 and 2 other fieldsHigh correlation
소분류 is highly imbalanced (66.7%)Imbalance
세분류 is highly imbalanced (86.2%)Imbalance
중분류 has 630 (16.0%) missing valuesMissing

Reproduction

Analysis started2024-01-09 23:03:08.872650
Analysis finished2024-01-09 23:03:09.613281
Duration0.74 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

Distinct30
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.5
Minimum1990
Maximum2019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size34.7 KiB
2024-01-10T08:03:09.673335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile1991
Q11997
median2004.5
Q32012
95-th percentile2018
Maximum2019
Range29
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.6565429
Coefficient of variation (CV)0.0043185547
Kurtosis-1.2026729
Mean2004.5
Median Absolute Deviation (MAD)7.5
Skewness0
Sum7877685
Variance74.935734
MonotonicityIncreasing
2024-01-10T08:03:09.791616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=30)
ValueCountFrequency (%)
1990 131
 
3.3%
2006 131
 
3.3%
2019 131
 
3.3%
2018 131
 
3.3%
2017 131
 
3.3%
2016 131
 
3.3%
2015 131
 
3.3%
2014 131
 
3.3%
2013 131
 
3.3%
2012 131
 
3.3%
Other values (20) 2620
66.7%
ValueCountFrequency (%)
1990 131
3.3%
1991 131
3.3%
1992 131
3.3%
1993 131
3.3%
1994 131
3.3%
1995 131
3.3%
1996 131
3.3%
1997 131
3.3%
1998 131
3.3%
1999 131
3.3%
ValueCountFrequency (%)
2019 131
3.3%
2018 131
3.3%
2017 131
3.3%
2016 131
3.3%
2015 131
3.3%
2014 131
3.3%
2013 131
3.3%
2012 131
3.3%
2011 131
3.3%
2010 131
3.3%

행정동코드
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
4400000000
3930 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row4400000000
2nd row4400000000
3rd row4400000000
4th row4400000000
5th row4400000000

Common Values

ValueCountFrequency (%)
4400000000 3930
100.0%

Length

2024-01-10T08:03:09.916714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T08:03:10.003431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
4400000000 3930
100.0%

구분
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
농업
1080 
에너지
1020 
산업공정
810 
LULUCF
780 
폐기물
240 

Length

Max length6
Median length4
Mean length3.5267176
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row에너지
2nd row에너지
3rd row에너지
4th row에너지
5th row에너지

Common Values

ValueCountFrequency (%)
농업 1080
27.5%
에너지 1020
26.0%
산업공정 810
20.6%
LULUCF 780
19.8%
폐기물 240
 
6.1%

Length

2024-01-10T08:03:10.105959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T08:03:10.219410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
농업 1080
27.5%
에너지 1020
26.0%
산업공정 810
20.6%
lulucf 780
19.8%
폐기물 240
 
6.1%

대분류
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
A. 연료연소
870 
A. 장내발효
330 
B. 가축분뇨처리
330 
F. 할로카본 및 육불화황 소비
300 
A. 광물산업
210 
Other values (20)
1890 

Length

Max length17
Median length10
Mean length8.1526718
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA. 연료연소
2nd rowA. 연료연소
3rd rowA. 연료연소
4th rowA. 연료연소
5th rowA. 연료연소

Common Values

ValueCountFrequency (%)
A. 연료연소 870
22.1%
A. 장내발효 330
 
8.4%
B. 가축분뇨처리 330
 
8.4%
F. 할로카본 및 육불화황 소비 300
 
7.6%
A. 광물산업 210
 
5.3%
B. 농경지 210
 
5.3%
F. 작물잔사소각 180
 
4.6%
A. 산림지 180
 
4.6%
C. 금속산업 150
 
3.8%
B. 탈루 150
 
3.8%
Other values (15) 1020
26.0%

Length

2024-01-10T08:03:10.337165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a 1680
18.5%
연료연소 870
 
9.6%
b 810
 
8.9%
f 510
 
5.6%
c 420
 
4.6%
할로카본 390
 
4.3%
390
 
4.3%
육불화황 390
 
4.3%
장내발효 330
 
3.6%
가축분뇨처리 330
 
3.6%
Other values (25) 2940
32.5%

중분류
Text

MISSING 

Distinct71
Distinct (%)2.2%
Missing630
Missing (%)16.0%
Memory size30.8 KiB
2024-01-10T08:03:10.583088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length34
Median length25
Mean length10.827273
Min length4

Characters and Unicode

Total characters35730
Distinct characters152
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1. 에너지산업
2nd row1. 에너지산업
3rd row1. 에너지산업
4th row1. 에너지산업
5th row2. 제조업 및 건설업
ValueCountFrequency (%)
2 990
 
9.2%
750
 
7.0%
1 660
 
6.2%
3 510
 
4.8%
4 420
 
3.9%
건설업 390
 
3.6%
배출 390
 
3.6%
제조업 390
 
3.6%
기타 240
 
2.2%
5 240
 
2.2%
Other values (104) 5730
53.5%
2024-01-10T08:03:10.979477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
7770
21.7%
. 3300
 
9.2%
2 1170
 
3.3%
1110
 
3.1%
960
 
2.7%
750
 
2.1%
1 660
 
1.8%
600
 
1.7%
570
 
1.6%
540
 
1.5%
Other values (142) 18300
51.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 19860
55.6%
Space Separator 7770
 
21.7%
Decimal Number 3510
 
9.8%
Other Punctuation 3300
 
9.2%
Uppercase Letter 540
 
1.5%
Close Punctuation 270
 
0.8%
Open Punctuation 270
 
0.8%
Lowercase Letter 150
 
0.4%
Dash Punctuation 60
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1110
 
5.6%
960
 
4.8%
750
 
3.8%
600
 
3.0%
570
 
2.9%
540
 
2.7%
510
 
2.6%
480
 
2.4%
450
 
2.3%
450
 
2.3%
Other values (120) 13440
67.7%
Decimal Number
ValueCountFrequency (%)
2 1170
33.3%
1 660
18.8%
3 510
14.5%
4 420
 
12.0%
5 240
 
6.8%
6 240
 
6.8%
8 90
 
2.6%
9 90
 
2.6%
7 90
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
O 210
38.9%
C 120
22.2%
N 90
16.7%
S 60
 
11.1%
D 30
 
5.6%
F 30
 
5.6%
Lowercase Letter
ValueCountFrequency (%)
n 90
60.0%
o 60
40.0%
Space Separator
ValueCountFrequency (%)
7770
100.0%
Other Punctuation
ValueCountFrequency (%)
. 3300
100.0%
Close Punctuation
ValueCountFrequency (%)
) 270
100.0%
Open Punctuation
ValueCountFrequency (%)
( 270
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 60
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 19860
55.6%
Common 15180
42.5%
Latin 690
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1110
 
5.6%
960
 
4.8%
750
 
3.8%
600
 
3.0%
570
 
2.9%
540
 
2.7%
510
 
2.6%
480
 
2.4%
450
 
2.3%
450
 
2.3%
Other values (120) 13440
67.7%
Common
ValueCountFrequency (%)
7770
51.2%
. 3300
21.7%
2 1170
 
7.7%
1 660
 
4.3%
3 510
 
3.4%
4 420
 
2.8%
) 270
 
1.8%
( 270
 
1.8%
5 240
 
1.6%
6 240
 
1.6%
Other values (4) 330
 
2.2%
Latin
ValueCountFrequency (%)
O 210
30.4%
C 120
17.4%
N 90
13.0%
n 90
13.0%
o 60
 
8.7%
S 60
 
8.7%
D 30
 
4.3%
F 30
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 19860
55.6%
ASCII 15870
44.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
7770
49.0%
. 3300
20.8%
2 1170
 
7.4%
1 660
 
4.2%
3 510
 
3.2%
4 420
 
2.6%
) 270
 
1.7%
( 270
 
1.7%
5 240
 
1.5%
6 240
 
1.5%
Other values (12) 1020
 
6.4%
Hangul
ValueCountFrequency (%)
1110
 
5.6%
960
 
4.8%
750
 
3.8%
600
 
3.0%
570
 
2.9%
540
 
2.7%
510
 
2.6%
480
 
2.4%
450
 
2.3%
450
 
2.3%
Other values (120) 13440
67.7%

소분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct20
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
<NA>
3180 
f. 기타
 
210
b. 석유정제
 
30
c. 고체연료 제조 및 기타 에너지 산업
 
30
a. 철강
 
30
Other values (15)
450 

Length

Max length22
Median length4
Mean length4.7175573
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd rowa. 공공전기 및 열 생산
4th rowb. 석유정제
5th rowc. 고체연료 제조 및 기타 에너지 산업

Common Values

ValueCountFrequency (%)
<NA> 3180
80.9%
f. 기타 210
 
5.3%
b. 석유정제 30
 
0.8%
c. 고체연료 제조 및 기타 에너지 산업 30
 
0.8%
a. 철강 30
 
0.8%
b. 비철금속 30
 
0.8%
c. 화학 30
 
0.8%
d. 펄프 제지 및 인쇄 30
 
0.8%
e. 식음료품 가공 및 담배 제조 30
 
0.8%
a. 공공전기 및 열 생산 30
 
0.8%
Other values (10) 300
 
7.6%

Length

2024-01-10T08:03:11.145261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 3180
62.0%
기타 240
 
4.7%
f 210
 
4.1%
a 150
 
2.9%
b 150
 
2.9%
120
 
2.3%
c 120
 
2.3%
d 60
 
1.2%
제조 60
 
1.2%
e 60
 
1.2%
Other values (26) 780
 
15.2%

세분류
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
<NA>
3750 
1. 비금속
 
30
2. 조립금속
 
30
3. 나무 및 목재
 
30
4. 건설
 
30
Other values (2)
 
60

Length

Max length10
Median length4
Mean length4.1603053
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 3750
95.4%
1. 비금속 30
 
0.8%
2. 조립금속 30
 
0.8%
3. 나무 및 목재 30
 
0.8%
4. 건설 30
 
0.8%
5. 섬유 및 가죽 30
 
0.8%
6. 기타제조 30
 
0.8%

Length

2024-01-10T08:03:11.313979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-10T08:03:11.458394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 3750
88.7%
60
 
1.4%
1 30
 
0.7%
비금속 30
 
0.7%
2 30
 
0.7%
조립금속 30
 
0.7%
3 30
 
0.7%
나무 30
 
0.7%
목재 30
 
0.7%
4 30
 
0.7%
Other values (6) 180
 
4.3%


Text

Distinct2511
Distinct (%)63.9%
Missing0
Missing (%)0.0%
Memory size30.8 KiB
2024-01-10T08:03:11.673382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length18
Mean length12.71883
Min length1

Characters and Unicode

Total characters49985
Distinct characters16
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2380 ?
Unique (%)60.6%

Sample

1st row17438.106309598257
2nd row9995.0418337425635
3rd row8054.4654397190452
4th row1930.2230866828536
5th row10.353307340665255
ValueCountFrequency (%)
0 638
 
16.2%
no 360
 
9.2%
ne 150
 
3.8%
neno 120
 
3.1%
ie 30
 
0.8%
19.692193991991548 2
 
0.1%
29.454208612804049 2
 
0.1%
16.542835417560514 2
 
0.1%
479.38973489055246 2
 
0.1%
95.629489998545097 2
 
0.1%
Other values (2501) 2622
66.7%
2024-01-10T08:03:11.986551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5081
10.2%
2 4813
9.6%
1 4708
9.4%
9 4646
9.3%
5 4491
9.0%
3 4478
9.0%
4 4454
8.9%
6 4260
8.5%
7 4260
8.5%
8 4134
8.3%
Other values (6) 4660
9.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 45325
90.7%
Other Punctuation 2632
 
5.3%
Uppercase Letter 1710
 
3.4%
Dash Punctuation 318
 
0.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5081
11.2%
2 4813
10.6%
1 4708
10.4%
9 4646
10.3%
5 4491
9.9%
3 4478
9.9%
4 4454
9.8%
6 4260
9.4%
7 4260
9.4%
8 4134
9.1%
Uppercase Letter
ValueCountFrequency (%)
N 750
43.9%
O 480
28.1%
E 450
26.3%
I 30
 
1.8%
Other Punctuation
ValueCountFrequency (%)
. 2632
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 318
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 48275
96.6%
Latin 1710
 
3.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5081
10.5%
2 4813
10.0%
1 4708
9.8%
9 4646
9.6%
5 4491
9.3%
3 4478
9.3%
4 4454
9.2%
6 4260
8.8%
7 4260
8.8%
8 4134
8.6%
Other values (2) 2950
6.1%
Latin
ValueCountFrequency (%)
N 750
43.9%
O 480
28.1%
E 450
26.3%
I 30
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 49985
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5081
10.2%
2 4813
9.6%
1 4708
9.4%
9 4646
9.3%
5 4491
9.0%
3 4478
9.0%
4 4454
8.9%
6 4260
8.5%
7 4260
8.5%
8 4134
8.3%
Other values (6) 4660
9.3%

Interactions

2024-01-10T08:03:09.335857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-10T08:03:12.083778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도구분대분류중분류소분류세분류
연도1.0000.0000.0000.0000.0000.000
구분0.0001.0001.0001.000NaNNaN
대분류0.0001.0001.0000.9961.000NaN
중분류0.0001.0000.9961.0001.000NaN
소분류0.000NaN1.0001.0001.000NaN
세분류0.000NaNNaNNaNNaN1.000
2024-01-10T08:03:12.190322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
소분류구분세분류대분류
소분류1.0001.0001.0000.989
구분1.0001.0001.0000.997
세분류1.0001.0001.0001.000
대분류0.9890.9971.0001.000
2024-01-10T08:03:12.288671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도구분대분류소분류세분류
연도1.0000.0000.0000.0000.000
구분0.0001.0000.9971.0001.000
대분류0.0000.9971.0000.9891.000
소분류0.0001.0000.9891.0001.000
세분류0.0001.0001.0001.0001.000

Missing values

2024-01-10T08:03:09.457922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-10T08:03:09.567661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도행정동코드구분대분류중분류소분류세분류
019904400000000에너지A. 연료연소<NA><NA><NA>17438.106309598257
119904400000000에너지A. 연료연소1. 에너지산업<NA><NA>9995.0418337425635
219904400000000에너지A. 연료연소1. 에너지산업a. 공공전기 및 열 생산<NA>8054.4654397190452
319904400000000에너지A. 연료연소1. 에너지산업b. 석유정제<NA>1930.2230866828536
419904400000000에너지A. 연료연소1. 에너지산업c. 고체연료 제조 및 기타 에너지 산업<NA>10.353307340665255
519904400000000에너지A. 연료연소2. 제조업 및 건설업<NA><NA>2875.1783477345034
619904400000000에너지A. 연료연소2. 제조업 및 건설업a. 철강<NA>56.207574331033513
719904400000000에너지A. 연료연소2. 제조업 및 건설업b. 비철금속<NA>4.2330096166202313
819904400000000에너지A. 연료연소2. 제조업 및 건설업c. 화학<NA>1918.094578008511
919904400000000에너지A. 연료연소2. 제조업 및 건설업d. 펄프 제지 및 인쇄<NA>253.20576085781644
연도행정동코드구분대분류중분류소분류세분류
392020194400000000LULUCFF. 기타토지<NA><NA><NA>NO
392120194400000000LULUCFG. 기타<NA><NA><NA>21.555957557560678
392220194400000000폐기물A. 폐기물매립<NA><NA><NA>1060.5539005022886
392320194400000000폐기물A. 폐기물매립1. 관리형 매립<NA><NA>896.60023502198555
392420194400000000폐기물A. 폐기물매립2. 비관리형 매립<NA><NA>163.95366548030296
392520194400000000폐기물B. 하폐수처리2. 비관리형 매립<NA><NA>85.491041465426747
392620194400000000폐기물B. 하폐수처리1. 폐수처리<NA><NA>8.8873915737001088
392720194400000000폐기물B. 하폐수처리2. 하수처리<NA><NA>76.603649891726633
392820194400000000폐기물C. 폐기물소각<NA><NA><NA>623.20945362000009
392920194400000000폐기물D. 기타<NA><NA><NA>77.933173848883484