Overview

Dataset statistics

Number of variables7
Number of observations463
Missing cells221
Missing cells (%)6.8%
Duplicate rows3
Duplicate rows (%)0.6%
Total size in memory25.9 KiB
Average record size in memory57.3 B

Variable types

DateTime2
Text2
Numeric1
Categorical1
Boolean1

Dataset

Description인천광역시 전기사업체등록 현황(허가일자,상호,설비용량,설치장소, 원동력 종류, 사업개시유무, 사업개시일 등)자료입니다
Author인천광역시
URLhttps://www.incheon.go.kr/data/DATA010201/view?docId=15030592

Alerts

Dataset has 3 (0.6%) duplicate rowsDuplicates
원동력 종류 is highly imbalanced (95.0%)Imbalance
사업개시일 has 221 (47.7%) missing valuesMissing

Reproduction

Analysis started2023-12-11 02:56:38.982473
Analysis finished2023-12-11 02:56:39.694923
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct255
Distinct (%)55.1%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
Minimum2005-09-30 00:00:00
Maximum2018-07-31 00:00:00
2023-12-11T11:56:39.785208image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T11:56:39.976469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct449
Distinct (%)97.0%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-11T11:56:40.506524image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length23
Mean length10.784017
Min length4

Characters and Unicode

Total characters4993
Distinct characters331
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique435 ?
Unique (%)94.0%

Sample

1st row동검 발전소
2nd row기린에코 발전소
3rd row한국남동발전㈜ 영흥도 태양광 발전소
4th row해성 태양광발전소
5th row윤성근 태양광발전소
ValueCountFrequency (%)
태양광발전소 268
31.2%
발전소 22
 
2.6%
주식회사 10
 
1.2%
다주 8
 
0.9%
태양광발전호 5
 
0.6%
태양광 4
 
0.5%
제1태양광발전소 4
 
0.5%
2호 4
 
0.5%
3호 3
 
0.3%
인천항 3
 
0.3%
Other values (492) 527
61.4%
2023-12-11T11:56:41.157073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
451
 
9.0%
450
 
9.0%
437
 
8.8%
419
 
8.4%
419
 
8.4%
418
 
8.4%
396
 
7.9%
111
 
2.2%
52
 
1.0%
2 52
 
1.0%
Other values (321) 1788
35.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4311
86.3%
Space Separator 396
 
7.9%
Decimal Number 165
 
3.3%
Uppercase Letter 42
 
0.8%
Close Punctuation 30
 
0.6%
Open Punctuation 30
 
0.6%
Lowercase Letter 9
 
0.2%
Other Symbol 8
 
0.2%
Other Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
451
 
10.5%
450
 
10.4%
437
 
10.1%
419
 
9.7%
419
 
9.7%
418
 
9.7%
111
 
2.6%
52
 
1.2%
47
 
1.1%
38
 
0.9%
Other values (280) 1469
34.1%
Uppercase Letter
ValueCountFrequency (%)
S 8
19.0%
G 6
14.3%
C 4
9.5%
P 3
 
7.1%
E 3
 
7.1%
N 3
 
7.1%
K 3
 
7.1%
T 2
 
4.8%
R 2
 
4.8%
D 1
 
2.4%
Other values (7) 7
16.7%
Decimal Number
ValueCountFrequency (%)
2 52
31.5%
1 49
29.7%
3 30
18.2%
4 15
 
9.1%
5 6
 
3.6%
0 4
 
2.4%
6 4
 
2.4%
8 2
 
1.2%
7 2
 
1.2%
9 1
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
o 2
22.2%
y 1
11.1%
h 1
11.1%
g 1
11.1%
n 1
11.1%
l 1
11.1%
e 1
11.1%
c 1
11.1%
Space Separator
ValueCountFrequency (%)
396
100.0%
Close Punctuation
ValueCountFrequency (%)
) 30
100.0%
Open Punctuation
ValueCountFrequency (%)
( 30
100.0%
Other Symbol
ValueCountFrequency (%)
8
100.0%
Other Punctuation
ValueCountFrequency (%)
& 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4319
86.5%
Common 623
 
12.5%
Latin 51
 
1.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
451
 
10.4%
450
 
10.4%
437
 
10.1%
419
 
9.7%
419
 
9.7%
418
 
9.7%
111
 
2.6%
52
 
1.2%
47
 
1.1%
38
 
0.9%
Other values (281) 1477
34.2%
Latin
ValueCountFrequency (%)
S 8
15.7%
G 6
 
11.8%
C 4
 
7.8%
P 3
 
5.9%
E 3
 
5.9%
N 3
 
5.9%
K 3
 
5.9%
T 2
 
3.9%
R 2
 
3.9%
o 2
 
3.9%
Other values (15) 15
29.4%
Common
ValueCountFrequency (%)
396
63.6%
2 52
 
8.3%
1 49
 
7.9%
3 30
 
4.8%
) 30
 
4.8%
( 30
 
4.8%
4 15
 
2.4%
5 6
 
1.0%
0 4
 
0.6%
6 4
 
0.6%
Other values (5) 7
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4311
86.3%
ASCII 674
 
13.5%
None 8
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
451
 
10.5%
450
 
10.4%
437
 
10.1%
419
 
9.7%
419
 
9.7%
418
 
9.7%
111
 
2.6%
52
 
1.2%
47
 
1.1%
38
 
0.9%
Other values (280) 1469
34.1%
ASCII
ValueCountFrequency (%)
396
58.8%
2 52
 
7.7%
1 49
 
7.3%
3 30
 
4.5%
) 30
 
4.5%
( 30
 
4.5%
4 15
 
2.2%
S 8
 
1.2%
G 6
 
0.9%
5 6
 
0.9%
Other values (30) 52
 
7.7%
None
ValueCountFrequency (%)
8
100.0%

설비용량(kW)
Real number (ℝ)

Distinct265
Distinct (%)57.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206.9968
Minimum3
Maximum3000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 KiB
2023-12-11T11:56:41.317770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile15
Q133
median97.2
Q399.845
95-th percentile997.92
Maximum3000
Range2997
Interquartile range (IQR)66.845

Descriptive statistics

Standard deviation407.97249
Coefficient of variation (CV)1.970912
Kurtosis20.729164
Mean206.9968
Median Absolute Deviation (MAD)55.2
Skewness4.1649144
Sum95839.52
Variance166441.55
MonotonicityNot monotonic
2023-12-11T11:56:41.483433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
99.0 27
 
5.8%
15.0 24
 
5.2%
30.0 18
 
3.9%
97.2 10
 
2.2%
97.92 9
 
1.9%
96.0 9
 
1.9%
99.2 8
 
1.7%
98.82 8
 
1.7%
99.75 6
 
1.3%
97.9 6
 
1.3%
Other values (255) 338
73.0%
ValueCountFrequency (%)
3.0 2
 
0.4%
6.0 1
 
0.2%
9.0 6
1.3%
10.26 1
 
0.2%
10.8 1
 
0.2%
12.0 3
0.6%
12.24 1
 
0.2%
13.5 1
 
0.2%
14.28 1
 
0.2%
14.84 1
 
0.2%
ValueCountFrequency (%)
3000.0 1
0.2%
2993.76 2
0.4%
2929.24 1
0.2%
2081.75 1
0.2%
2003.22 1
0.2%
1900.8 1
0.2%
1620.0 1
0.2%
1584.0 1
0.2%
1566.0 1
0.2%
1506.6 1
0.2%
Distinct419
Distinct (%)90.5%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
2023-12-11T11:56:42.129705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length63
Median length45
Mean length23.658747
Min length11

Characters and Unicode

Total characters10954
Distinct characters225
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique382 ?
Unique (%)82.5%

Sample

1st row강화군 길상면 동검리 555
2nd row옹진군 영흥면 내리 511-8
3rd row옹진군 영흥면 내리 1703-2
4th row남구 용현4동 183-6
5th row강화군 길상면 온수리 산 129-2
ValueCountFrequency (%)
인천광역시 326
 
14.2%
강화군 225
 
9.8%
서구 100
 
4.4%
길상면 44
 
1.9%
중구 41
 
1.8%
하점면 35
 
1.5%
옹진군 29
 
1.3%
1호 27
 
1.2%
남동구 26
 
1.1%
영흥면 22
 
1.0%
Other values (719) 1421
61.9%
2023-12-11T11:56:42.558800image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1834
 
16.7%
1 460
 
4.2%
358
 
3.3%
350
 
3.2%
333
 
3.0%
327
 
3.0%
327
 
3.0%
275
 
2.5%
3 267
 
2.4%
265
 
2.4%
Other values (215) 6158
56.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6405
58.5%
Decimal Number 2098
 
19.2%
Space Separator 1834
 
16.7%
Dash Punctuation 195
 
1.8%
Close Punctuation 172
 
1.6%
Open Punctuation 172
 
1.6%
Other Punctuation 71
 
0.6%
Uppercase Letter 7
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
358
 
5.6%
350
 
5.5%
333
 
5.2%
327
 
5.1%
327
 
5.1%
275
 
4.3%
265
 
4.1%
254
 
4.0%
250
 
3.9%
237
 
3.7%
Other values (194) 3429
53.5%
Decimal Number
ValueCountFrequency (%)
1 460
21.9%
3 267
12.7%
2 244
11.6%
5 201
9.6%
4 186
8.9%
7 182
 
8.7%
6 172
 
8.2%
8 135
 
6.4%
9 127
 
6.1%
0 124
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
L 3
42.9%
B 2
28.6%
O 1
 
14.3%
T 1
 
14.3%
Other Punctuation
ValueCountFrequency (%)
, 69
97.2%
. 1
 
1.4%
/ 1
 
1.4%
Space Separator
ValueCountFrequency (%)
1834
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 195
100.0%
Close Punctuation
ValueCountFrequency (%)
) 172
100.0%
Open Punctuation
ValueCountFrequency (%)
( 172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6405
58.5%
Common 4542
41.5%
Latin 7
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
358
 
5.6%
350
 
5.5%
333
 
5.2%
327
 
5.1%
327
 
5.1%
275
 
4.3%
265
 
4.1%
254
 
4.0%
250
 
3.9%
237
 
3.7%
Other values (194) 3429
53.5%
Common
ValueCountFrequency (%)
1834
40.4%
1 460
 
10.1%
3 267
 
5.9%
2 244
 
5.4%
5 201
 
4.4%
- 195
 
4.3%
4 186
 
4.1%
7 182
 
4.0%
) 172
 
3.8%
( 172
 
3.8%
Other values (7) 629
 
13.8%
Latin
ValueCountFrequency (%)
L 3
42.9%
B 2
28.6%
O 1
 
14.3%
T 1
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6405
58.5%
ASCII 4549
41.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1834
40.3%
1 460
 
10.1%
3 267
 
5.9%
2 244
 
5.4%
5 201
 
4.4%
- 195
 
4.3%
4 186
 
4.1%
7 182
 
4.0%
) 172
 
3.8%
( 172
 
3.8%
Other values (11) 636
 
14.0%
Hangul
ValueCountFrequency (%)
358
 
5.6%
350
 
5.5%
333
 
5.2%
327
 
5.1%
327
 
5.1%
275
 
4.3%
265
 
4.1%
254
 
4.0%
250
 
3.9%
237
 
3.7%
Other values (194) 3429
53.5%

원동력 종류
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size3.7 KiB
태양광
459 
풍력
 
3
소수력
 
1

Length

Max length3
Median length3
Mean length2.9935205
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row태양광
2nd row태양광
3rd row태양광
4th row태양광
5th row태양광

Common Values

ValueCountFrequency (%)
태양광 459
99.1%
풍력 3
 
0.6%
소수력 1
 
0.2%

Length

2023-12-11T11:56:42.689465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T11:56:42.813465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
태양광 459
99.1%
풍력 3
 
0.6%
소수력 1
 
0.2%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size595.0 B
True
241 
False
222 
ValueCountFrequency (%)
True 241
52.1%
False 222
47.9%
2023-12-11T11:56:42.904076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

사업개시일
Date

MISSING 

Distinct183
Distinct (%)75.6%
Missing221
Missing (%)47.7%
Memory size3.7 KiB
Minimum2005-11-23 00:00:00
Maximum2018-08-01 00:00:00
2023-12-11T11:56:43.015386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T11:56:43.196794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-11T11:56:39.407996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T11:56:43.346225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설비용량(kW)원동력 종류사업개시유무
설비용량(kW)1.0000.2460.051
원동력 종류0.2461.0000.000
사업개시유무0.0510.0001.000
2023-12-11T11:56:43.452769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
원동력 종류사업개시유무
원동력 종류1.0000.000
사업개시유무0.0001.000
2023-12-11T11:56:43.549431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
설비용량(kW)원동력 종류사업개시유무
설비용량(kW)1.0000.1600.038
원동력 종류0.1601.0000.000
사업개시유무0.0380.0001.000

Missing values

2023-12-11T11:56:39.529866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T11:56:39.646113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

허가일자상 호설비용량(kW)설치장소원동력 종류사업개시유무사업개시일
02005-09-30동검 발전소3.0강화군 길상면 동검리 555태양광Y2005-11-23
12005-12-12기린에코 발전소3.0옹진군 영흥면 내리 511-8태양광Y2006-03-27
22006-02-27한국남동발전㈜ 영흥도 태양광 발전소1000.0옹진군 영흥면 내리 1703-2태양광Y2006-10-04
32007-11-29해성 태양광발전소6.0남구 용현4동 183-6태양광Y2008-01-02
42008-04-04윤성근 태양광발전소30.0강화군 길상면 온수리 산 129-2태양광Y2008-05-02
52008-04-04이동근 태양광발전소30.0강화군 길상면 온수리 산 129-3태양광Y2008-05-02
62008-04-17이건 태양광발전소27.44인천광역시 남구 도화동 967-3태양광Y2008-05-15
72008-07-10고용민 태양광발전소9.0강화군 길상면 온수리 380-4태양광Y2008-07-10
82008-08-25신은주 태양광발전소9.0강화군 길상면 길직리 808태양광Y2008-09-25
92009-04-17국화리 태양광발전소30.0강화군 강화읍 국화리 67태양광Y2009-07-29
허가일자상 호설비용량(kW)설치장소원동력 종류사업개시유무사업개시일
4532018-07-17천우교양 태양광발전소99.015강화군 양도면 도장리 1647태양광N<NA>
4542018-07-17햇살가득 태양광발전소99.015강화군 양도면 도장리 1647태양광N<NA>
4552018-07-23에너지로드 제1태양광발전소226.44서구 검단천로356번길 11(건물위)태양광N<NA>
4562018-07-23원팜 태양광발전소197.1강화군 교동면 봉소리 654태양광N<NA>
4572018-07-23하음2 태양광발전소86.4강화군 하점면 신봉리 1054-8(건물위)태양광N<NA>
4582018-07-23에너지로드 제1태양광발전소226.44서구 검단천로356번길 11(건물위)태양광N<NA>
4592018-07-24국화 태양광발전소36.0강화군 강화읍 강화대로440번길 5(옥상위)태양광N<NA>
4602018-07-24코퍼스 태양광발전소99.28서구 도담5로 77(건물위)태양광N<NA>
4612018-07-31부흥2 태양광발전소99.96강화군 교동면 고구리 111-5태양광N<NA>
4622018-07-31아이엠써키트 태양광발전소55.845남동구 남동동로154번길 42(지붕위)태양광N<NA>

Duplicate rows

Most frequently occurring

허가일자상 호설비용량(kW)설치장소원동력 종류사업개시유무사업개시일# duplicates
02018-07-02서경 발전소64.8서구 원창동 381-130(건물위)태양광N<NA>2
12018-07-09에이제이토탈㈜ 태양광발전소99.0서구 건지로 45-21(지붕위)태양광N<NA>2
22018-07-23에너지로드 제1태양광발전소226.44서구 검단천로356번길 11(건물위)태양광N<NA>2