Overview

Dataset statistics

Number of variables6
Number of observations256
Missing cells946
Missing cells (%)61.6%
Duplicate rows1
Duplicate rows (%)0.4%
Total size in memory12.1 KiB
Average record size in memory48.5 B

Variable types

Text4
Categorical2

Dataset

Description경상남도 사천시 관내에 경질유, 중질유 사용업체에 관한 데이터 입니다.(상호명, 주소, 사용연료, 생산품, 연간사용량)
Author경상남도 사천시
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15107850

Alerts

Dataset has 1 (0.4%) duplicate rowsDuplicates
데이터기준일자 is highly overall correlated with 사용연료High correlation
사용연료 is highly overall correlated with 데이터기준일자High correlation
사용연료 is highly imbalanced (78.1%)Imbalance
데이터기준일자 is highly imbalanced (71.9%)Imbalance
상호명 has 234 (91.4%) missing valuesMissing
주소 has 234 (91.4%) missing valuesMissing
생산품 has 241 (94.1%) missing valuesMissing
연간 사용량 has 237 (92.6%) missing valuesMissing

Reproduction

Analysis started2023-12-11 00:20:36.751717
Analysis finished2023-12-11 00:20:37.334506
Duration0.58 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

상호명
Text

MISSING 

Distinct21
Distinct (%)95.5%
Missing234
Missing (%)91.4%
Memory size2.1 KiB
2023-12-11T09:20:37.473917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length7.9545455
Min length4

Characters and Unicode

Total characters175
Distinct characters91
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)90.9%

Sample

1st row평화종합정비
2nd row하이종합정비
3rd row삼천포종합정비(주)
4th row베스트종합정비
5th row사천시 자원회수센터
ValueCountFrequency (%)
현대종합정비 2
 
8.0%
삼천포종합정비(주 1
 
4.0%
굿프로모터스 1
 
4.0%
인터내셔널돔하우스(주 1
 
4.0%
삼육비철 1
 
4.0%
㈜굿웰바이오 1
 
4.0%
주)세명공업 1
 
4.0%
주식회사 1
 
4.0%
제일 1
 
4.0%
농업회사법인 1
 
4.0%
Other values (14) 14
56.0%
2023-12-11T09:20:37.866478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10
 
5.7%
9
 
5.1%
9
 
5.1%
7
 
4.0%
6
 
3.4%
6
 
3.4%
5
 
2.9%
4
 
2.3%
( 4
 
2.3%
) 4
 
2.3%
Other values (81) 111
63.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 159
90.9%
Open Punctuation 4
 
2.3%
Close Punctuation 4
 
2.3%
Space Separator 3
 
1.7%
Other Symbol 3
 
1.7%
Decimal Number 2
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
6.3%
9
 
5.7%
9
 
5.7%
7
 
4.4%
6
 
3.8%
6
 
3.8%
5
 
3.1%
4
 
2.5%
3
 
1.9%
3
 
1.9%
Other values (75) 97
61.0%
Decimal Number
ValueCountFrequency (%)
3 1
50.0%
1 1
50.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%
Other Symbol
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 162
92.6%
Common 13
 
7.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
6.2%
9
 
5.6%
9
 
5.6%
7
 
4.3%
6
 
3.7%
6
 
3.7%
5
 
3.1%
4
 
2.5%
3
 
1.9%
3
 
1.9%
Other values (76) 100
61.7%
Common
ValueCountFrequency (%)
( 4
30.8%
) 4
30.8%
3
23.1%
3 1
 
7.7%
1 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 159
90.9%
ASCII 13
 
7.4%
None 3
 
1.7%

Most frequent character per block

Hangul
ValueCountFrequency (%)
10
 
6.3%
9
 
5.7%
9
 
5.7%
7
 
4.4%
6
 
3.8%
6
 
3.8%
5
 
3.1%
4
 
2.5%
3
 
1.9%
3
 
1.9%
Other values (75) 97
61.0%
ASCII
ValueCountFrequency (%)
( 4
30.8%
) 4
30.8%
3
23.1%
3 1
 
7.7%
1 1
 
7.7%
None
ValueCountFrequency (%)
3
100.0%

주소
Text

MISSING 

Distinct22
Distinct (%)100.0%
Missing234
Missing (%)91.4%
Memory size2.1 KiB
2023-12-11T09:20:38.070390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length28
Mean length22.772727
Min length19

Characters and Unicode

Total characters501
Distinct characters73
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique22 ?
Unique (%)100.0%

Sample

1st row경상남도 사천시 곤양면 구고속도로 1636
2nd row경상남도 사천시 남일로 304 (향촌동)
3rd row경상남도 사천시 삼천포대교로 577 (좌룡동)
4th row경상남도 사천시 정동면 진삼로 1206
5th row경상남도 사천시 환경길 71 (사등동)
ValueCountFrequency (%)
경상남도 22
19.5%
사천시 22
19.5%
축동면 4
 
3.5%
사천읍 3
 
2.7%
곤명면 3
 
2.7%
사남면 3
 
2.7%
가산리 2
 
1.8%
경서대로 2
 
1.8%
구암두문로 2
 
1.8%
곤양면 2
 
1.8%
Other values (46) 48
42.5%
2023-12-11T09:20:38.444934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
93
18.6%
30
 
6.0%
28
 
5.6%
26
 
5.2%
25
 
5.0%
23
 
4.6%
22
 
4.4%
22
 
4.4%
1 15
 
3.0%
2 15
 
3.0%
Other values (63) 202
40.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 315
62.9%
Space Separator 93
 
18.6%
Decimal Number 77
 
15.4%
Dash Punctuation 6
 
1.2%
Close Punctuation 5
 
1.0%
Open Punctuation 5
 
1.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
30
 
9.5%
28
 
8.9%
26
 
8.3%
25
 
7.9%
23
 
7.3%
22
 
7.0%
22
 
7.0%
15
 
4.8%
14
 
4.4%
11
 
3.5%
Other values (49) 99
31.4%
Decimal Number
ValueCountFrequency (%)
1 15
19.5%
2 15
19.5%
4 11
14.3%
6 7
9.1%
5 7
9.1%
8 6
 
7.8%
7 5
 
6.5%
3 5
 
6.5%
0 4
 
5.2%
9 2
 
2.6%
Space Separator
ValueCountFrequency (%)
93
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 315
62.9%
Common 186
37.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
30
 
9.5%
28
 
8.9%
26
 
8.3%
25
 
7.9%
23
 
7.3%
22
 
7.0%
22
 
7.0%
15
 
4.8%
14
 
4.4%
11
 
3.5%
Other values (49) 99
31.4%
Common
ValueCountFrequency (%)
93
50.0%
1 15
 
8.1%
2 15
 
8.1%
4 11
 
5.9%
6 7
 
3.8%
5 7
 
3.8%
- 6
 
3.2%
8 6
 
3.2%
7 5
 
2.7%
3 5
 
2.7%
Other values (4) 16
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 315
62.9%
ASCII 186
37.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
93
50.0%
1 15
 
8.1%
2 15
 
8.1%
4 11
 
5.9%
6 7
 
3.8%
5 7
 
3.8%
- 6
 
3.2%
8 6
 
3.2%
7 5
 
2.7%
3 5
 
2.7%
Other values (4) 16
 
8.6%
Hangul
ValueCountFrequency (%)
30
 
9.5%
28
 
8.9%
26
 
8.3%
25
 
7.9%
23
 
7.3%
22
 
7.0%
22
 
7.0%
15
 
4.8%
14
 
4.4%
11
 
3.5%
Other values (49) 99
31.4%

사용연료
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
<NA>
234 
경유
 
13
등유
 
5
중유C
 
2
경유, 중유C
 
1

Length

Max length7
Median length4
Mean length3.875
Min length2

Unique

Unique2 ?
Unique (%)0.8%

Sample

1st row경유
2nd row경유
3rd row경유
4th row경유
5th row경유

Common Values

ValueCountFrequency (%)
<NA> 234
91.4%
경유 13
 
5.1%
등유 5
 
2.0%
중유C 2
 
0.8%
경유, 중유C 1
 
0.4%
부생연료유1호 1
 
0.4%

Length

2023-12-11T09:20:38.596004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:20:38.711464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 234
91.1%
경유 14
 
5.4%
등유 5
 
1.9%
중유c 3
 
1.2%
부생연료유1호 1
 
0.4%

생산품
Text

MISSING 

Distinct14
Distinct (%)93.3%
Missing241
Missing (%)94.1%
Memory size2.1 KiB
2023-12-11T09:20:38.895739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length8
Mean length6.4666667
Min length1

Characters and Unicode

Total characters97
Distinct characters51
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)86.7%

Sample

1st row정비자동차
2nd row항공기, 항공기부품, 건설기계
3rd row도장완료된 자동차
4th row수리된 자동차
5th row도장완료된 자동차
ValueCountFrequency (%)
자동차 4
16.7%
도장완료된 2
 
8.3%
항공기부품 2
 
8.3%
분말활성탄 1
 
4.2%
목재펠릿 1
 
4.2%
정제유 1
 
4.2%
재생수지칩 1
 
4.2%
온수 1
 
4.2%
1
 
4.2%
스팀 1
 
4.2%
Other values (9) 9
37.5%
2023-12-11T09:20:39.186491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9
 
9.3%
5
 
5.2%
5
 
5.2%
5
 
5.2%
5
 
5.2%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
3
 
3.1%
Other values (41) 53
54.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 85
87.6%
Space Separator 9
 
9.3%
Other Punctuation 3
 
3.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
5.9%
5
 
5.9%
5
 
5.9%
5
 
5.9%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
Other values (39) 47
55.3%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
, 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 85
87.6%
Common 12
 
12.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
5.9%
5
 
5.9%
5
 
5.9%
5
 
5.9%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
Other values (39) 47
55.3%
Common
ValueCountFrequency (%)
9
75.0%
, 3
 
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 85
87.6%
ASCII 12
 
12.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9
75.0%
, 3
 
25.0%
Hangul
ValueCountFrequency (%)
5
 
5.9%
5
 
5.9%
5
 
5.9%
5
 
5.9%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
3
 
3.5%
Other values (39) 47
55.3%

연간 사용량
Text

MISSING 

Distinct16
Distinct (%)84.2%
Missing237
Missing (%)92.6%
Memory size2.1 KiB
2023-12-11T09:20:39.337373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length6.4736842
Min length2

Characters and Unicode

Total characters123
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15 ?
Unique (%)78.9%

Sample

1st row2,400L
2nd row39,120L
3rd row56,680L
4th row6,000L
5th row9,648L
ValueCountFrequency (%)
6,000l 4
21.1%
2,400l 1
 
5.3%
56,680l 1
 
5.3%
9,648l 1
 
5.3%
26,640l 1
 
5.3%
17,760l 1
 
5.3%
791,138l 1
 
5.3%
39,120l 1
 
5.3%
1,200l 1
 
5.3%
540,000l 1
 
5.3%
Other values (6) 6
31.6%
2023-12-11T09:20:39.718300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 35
28.5%
L 18
14.6%
, 17
13.8%
6 12
 
9.8%
4 7
 
5.7%
2 6
 
4.9%
9 6
 
4.9%
1 5
 
4.1%
5 4
 
3.3%
8 4
 
3.3%
Other values (4) 9
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 86
69.9%
Uppercase Letter 18
 
14.6%
Other Punctuation 17
 
13.8%
Lowercase Letter 2
 
1.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 35
40.7%
6 12
 
14.0%
4 7
 
8.1%
2 6
 
7.0%
9 6
 
7.0%
1 5
 
5.8%
5 4
 
4.7%
8 4
 
4.7%
7 4
 
4.7%
3 3
 
3.5%
Lowercase Letter
ValueCountFrequency (%)
k 1
50.0%
g 1
50.0%
Uppercase Letter
ValueCountFrequency (%)
L 18
100.0%
Other Punctuation
ValueCountFrequency (%)
, 17
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 103
83.7%
Latin 20
 
16.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 35
34.0%
, 17
16.5%
6 12
 
11.7%
4 7
 
6.8%
2 6
 
5.8%
9 6
 
5.8%
1 5
 
4.9%
5 4
 
3.9%
8 4
 
3.9%
7 4
 
3.9%
Latin
ValueCountFrequency (%)
L 18
90.0%
k 1
 
5.0%
g 1
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 123
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 35
28.5%
L 18
14.6%
, 17
13.8%
6 12
 
9.8%
4 7
 
5.7%
2 6
 
4.9%
9 6
 
4.9%
1 5
 
4.1%
5 4
 
3.3%
8 4
 
3.3%
Other values (4) 9
 
7.3%

데이터기준일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size2.1 KiB
<NA>
234 
2022-11-02
 
21
2022-11-01
 
1

Length

Max length10
Median length4
Mean length4.515625
Min length4

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row2022-11-02
2nd row2022-11-02
3rd row2022-11-02
4th row2022-11-02
5th row2022-11-02

Common Values

ValueCountFrequency (%)
<NA> 234
91.4%
2022-11-02 21
 
8.2%
2022-11-01 1
 
0.4%

Length

2023-12-11T09:20:39.846042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T09:20:39.977556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 234
91.4%
2022-11-02 21
 
8.2%
2022-11-01 1
 
0.4%

Correlations

2023-12-11T09:20:40.085397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상호명주소사용연료생산품연간 사용량데이터기준일자
상호명1.0001.0000.9350.9860.9661.000
주소1.0001.0001.0001.0001.0001.000
사용연료0.9351.0001.0001.0000.9771.000
생산품0.9861.0001.0001.0000.819NaN
연간 사용량0.9661.0000.9770.8191.000NaN
데이터기준일자1.0001.0001.000NaNNaN1.000
2023-12-11T09:20:40.220458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
데이터기준일자사용연료
데이터기준일자1.0000.922
사용연료0.9221.000
2023-12-11T09:20:40.316950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사용연료데이터기준일자
사용연료1.0000.922
데이터기준일자0.9221.000

Missing values

2023-12-11T09:20:37.055535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T09:20:37.157786image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T09:20:37.266538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상호명주소사용연료생산품연간 사용량데이터기준일자
0평화종합정비경상남도 사천시 곤양면 구고속도로 1636경유<NA>2,400L2022-11-02
1하이종합정비경상남도 사천시 남일로 304 (향촌동)경유<NA>39,120L2022-11-02
2삼천포종합정비(주)경상남도 사천시 삼천포대교로 577 (좌룡동)경유<NA>56,680L2022-11-02
3베스트종합정비경상남도 사천시 정동면 진삼로 1206경유정비자동차6,000L2022-11-02
4사천시 자원회수센터경상남도 사천시 환경길 71 (사등동)경유<NA>9,648L2022-11-02
5공군제3훈련비행단경상남도 사천시 사천읍 사천대로 1891-46경유항공기, 항공기부품, 건설기계6,000L2022-11-02
6현대종합정비경상남도 사천시 사천읍 구암두문로 154-32경유도장완료된 자동차26,640L2022-11-02
7신세계종합1급정비공장경상남도 사천시 하궁지길 73 (궁지동)경유수리된 자동차6,000L2022-11-02
8사천자동차종합검사소경상남도 사천시 사천읍 구암두문로 154-42경유도장완료된 자동차17,760L2022-11-02
9송암농축산경상남도 사천시 사남면 송암길 75경유버섯791,138L2022-11-02
상호명주소사용연료생산품연간 사용량데이터기준일자
246<NA><NA><NA><NA><NA><NA>
247<NA><NA><NA><NA><NA><NA>
248<NA><NA><NA><NA><NA><NA>
249<NA><NA><NA><NA><NA><NA>
250<NA><NA><NA><NA><NA><NA>
251<NA><NA><NA><NA><NA><NA>
252<NA><NA><NA><NA><NA><NA>
253<NA><NA><NA><NA><NA><NA>
254<NA><NA><NA><NA><NA><NA>
255<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

상호명주소사용연료생산품연간 사용량데이터기준일자# duplicates
0<NA><NA><NA><NA><NA><NA>234