Overview

Dataset statistics

Number of variables5
Number of observations692
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory28.5 KiB
Average record size in memory42.2 B

Variable types

Numeric2
Categorical2
Text1

Dataset

Description한국전기안전공사에서 제공하는 최근 (17년 ~ 22년)동안의 전기저장장치 사용전검사 데이터입니다. 연도, 업무구분, 설비구분, 원동기종류, 용량, 수량을 확인하실 수 있습니다.
URLhttps://www.data.go.kr/data/15103231/fileData.do

Alerts

발전기종류 has constant value ""Constant
Dataset has 1 (0.1%) duplicate rowsDuplicates
용도 is highly imbalanced (97.1%)Imbalance

Reproduction

Analysis started2023-12-12 08:40:41.131120
Analysis finished2023-12-12 08:40:41.989469
Duration0.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연도
Real number (ℝ)

Distinct6
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2018.9509
Minimum2017
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.2 KiB
2023-12-12T17:40:42.061796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2017
5-th percentile2017
Q12018
median2019
Q32020
95-th percentile2021
Maximum2022
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.504614
Coefficient of variation (CV)0.00074524546
Kurtosis-1.1575613
Mean2018.9509
Median Absolute Deviation (MAD)1
Skewness0.18861505
Sum1397114
Variance2.2638632
MonotonicityIncreasing
2023-12-12T17:40:42.215522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2017 161
23.3%
2018 137
19.8%
2020 131
18.9%
2019 127
18.4%
2021 114
16.5%
2022 22
 
3.2%
ValueCountFrequency (%)
2017 161
23.3%
2018 137
19.8%
2019 127
18.4%
2020 131
18.9%
2021 114
16.5%
2022 22
 
3.2%
ValueCountFrequency (%)
2022 22
 
3.2%
2021 114
16.5%
2020 131
18.9%
2019 127
18.4%
2018 137
19.8%
2017 161
23.3%

용도
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
자가용(발전)
690 
사업용(발전)
 
2

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row자가용(발전)
2nd row자가용(발전)
3rd row자가용(발전)
4th row자가용(발전)
5th row자가용(발전)

Common Values

ValueCountFrequency (%)
자가용(발전) 690
99.7%
사업용(발전) 2
 
0.3%

Length

2023-12-12T17:40:42.360849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:40:42.474885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
자가용(발전 690
99.7%
사업용(발전 2
 
0.3%

발전기종류
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
전기저장장치
692 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전기저장장치
2nd row전기저장장치
3rd row전기저장장치
4th row전기저장장치
5th row전기저장장치

Common Values

ValueCountFrequency (%)
전기저장장치 692
100.0%

Length

2023-12-12T17:40:42.602345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T17:40:42.708628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
전기저장장치 692
100.0%

발전기용량
Real number (ℝ)

Distinct351
Distinct (%)50.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean83962.358
Minimum4
Maximum1080000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.2 KiB
2023-12-12T17:40:42.856726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile40
Q1400
median3000
Q350250
95-th percentile481690
Maximum1080000
Range1079996
Interquartile range (IQR)49850

Descriptive statistics

Standard deviation202039.35
Coefficient of variation (CV)2.4063086
Kurtosis12.338051
Mean83962.358
Median Absolute Deviation (MAD)2950
Skewness3.469624
Sum58101952
Variance4.0819898 × 1010
MonotonicityNot monotonic
2023-12-12T17:40:43.052968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
250.0 7
 
1.0%
50.0 7
 
1.0%
2000.0 6
 
0.9%
500.0 6
 
0.9%
99.0 6
 
0.9%
80.0 6
 
0.9%
75.0 6
 
0.9%
150.0 6
 
0.9%
100.0 6
 
0.9%
1000.0 6
 
0.9%
Other values (341) 630
91.0%
ValueCountFrequency (%)
4.0 2
 
0.3%
5.0 1
 
0.1%
6.0 2
 
0.3%
8.0 1
 
0.1%
10.0 5
0.7%
12.0 1
 
0.1%
15.0 2
 
0.3%
16.0 1
 
0.1%
19.0 1
 
0.1%
20.0 4
0.6%
ValueCountFrequency (%)
1080000.0 2
0.3%
1050000.0 2
0.3%
1040000.0 3
0.4%
1022020.0 1
 
0.1%
1022000.0 1
 
0.1%
1020000.0 1
 
0.1%
1019000.0 3
0.4%
1018000.0 3
0.4%
1000000.0 2
0.3%
926000.0 1
 
0.1%

건수
Text

Distinct75
Distinct (%)10.8%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2023-12-12T17:40:43.294326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length1
Mean length1.3424855
Min length1

Characters and Unicode

Total characters929
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)5.1%

Sample

1st row4
2nd row6
3rd row2
4th row2
5th row1
ValueCountFrequency (%)
2 130
18.8%
1 125
18.1%
4 69
 
10.0%
6 44
 
6.4%
3 31
 
4.5%
8 25
 
3.6%
7 22
 
3.2%
5 22
 
3.2%
10 20
 
2.9%
12 18
 
2.6%
Other values (65) 186
26.9%
2023-12-12T17:40:43.688940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 269
29.0%
2 201
21.6%
4 109
11.7%
3 73
 
7.9%
6 72
 
7.8%
5 47
 
5.1%
0 45
 
4.8%
8 43
 
4.6%
7 41
 
4.4%
9 27
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 927
99.8%
Other Punctuation 2
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 269
29.0%
2 201
21.7%
4 109
11.8%
3 73
 
7.9%
6 72
 
7.8%
5 47
 
5.1%
0 45
 
4.9%
8 43
 
4.6%
7 41
 
4.4%
9 27
 
2.9%
Other Punctuation
ValueCountFrequency (%)
, 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 929
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 269
29.0%
2 201
21.6%
4 109
11.7%
3 73
 
7.9%
6 72
 
7.8%
5 47
 
5.1%
0 45
 
4.8%
8 43
 
4.6%
7 41
 
4.4%
9 27
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 929
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 269
29.0%
2 201
21.6%
4 109
11.7%
3 73
 
7.9%
6 72
 
7.8%
5 47
 
5.1%
0 45
 
4.8%
8 43
 
4.6%
7 41
 
4.4%
9 27
 
2.9%

Interactions

2023-12-12T17:40:41.487640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:40:41.277136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:40:41.599699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T17:40:41.385836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T17:40:43.823288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도용도발전기용량건수
연도1.0000.1560.0000.000
용도0.1561.0000.0000.000
발전기용량0.0000.0001.0000.000
건수0.0000.0000.0001.000
2023-12-12T17:40:43.948511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연도발전기용량용도
연도1.000-0.0320.126
발전기용량-0.0321.0000.000
용도0.1260.0001.000

Missing values

2023-12-12T17:40:41.807165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T17:40:41.936908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연도용도발전기종류발전기용량건수
02017자가용(발전)전기저장장치5.04
12017자가용(발전)전기저장장치8.06
22017자가용(발전)전기저장장치10.02
32017자가용(발전)전기저장장치19.02
42017자가용(발전)전기저장장치20.01
52017자가용(발전)전기저장장치22.02
62017자가용(발전)전기저장장치25.04
72017자가용(발전)전기저장장치30.06
82017자가용(발전)전기저장장치32.01
92017자가용(발전)전기저장장치40.01
연도용도발전기종류발전기용량건수
6822022자가용(발전)전기저장장치336.01
6832022자가용(발전)전기저장장치95.04
6842022자가용(발전)전기저장장치99.02
6852022자가용(발전)전기저장장치80.01
6862022자가용(발전)전기저장장치480.01
6872022자가용(발전)전기저장장치750.01
6882022자가용(발전)전기저장장치160.01
6892022자가용(발전)전기저장장치400.01
6902022자가용(발전)전기저장장치2600.02
6912022사업용(발전)전기저장장치250.02

Duplicate rows

Most frequently occurring

연도용도발전기종류발전기용량건수# duplicates
02017자가용(발전)전기저장장치42.012