Overview

Dataset statistics

Number of variables6
Number of observations42
Missing cells45
Missing cells (%)17.9%
Duplicate rows1
Duplicate rows (%)2.4%
Total size in memory2.2 KiB
Average record size in memory53.1 B

Variable types

Numeric2
Categorical1
Text3

Dataset

Description경기도 오산시에 위치한 주유소 현황에 대한 데이터로 주유소명, 사업장소재지, 대표자, 사업장연면적 항목을 제공합니다.
URLhttps://www.data.go.kr/data/3072090/fileData.do

Alerts

Dataset has 1 (2.4%) duplicate rowsDuplicates
연번 is highly overall correlated with 업종High correlation
사업장연면적 is highly overall correlated with 업종High correlation
업종 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
연번 has 9 (21.4%) missing valuesMissing
상호 has 9 (21.4%) missing valuesMissing
성명(법인명) has 9 (21.4%) missing valuesMissing
사업장소재지 has 9 (21.4%) missing valuesMissing
사업장연면적 has 9 (21.4%) missing valuesMissing

Reproduction

Analysis started2023-12-12 11:20:34.350457
Analysis finished2023-12-12 11:20:35.974153
Duration1.62 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct33
Distinct (%)100.0%
Missing9
Missing (%)21.4%
Infinite0
Infinite (%)0.0%
Mean17
Minimum1
Maximum33
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size510.0 B
2023-12-12T20:20:36.136432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2.6
Q19
median17
Q325
95-th percentile31.4
Maximum33
Range32
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.6695398
Coefficient of variation (CV)0.56879646
Kurtosis-1.2
Mean17
Median Absolute Deviation (MAD)8
Skewness0
Sum561
Variance93.5
MonotonicityStrictly increasing
2023-12-12T20:20:36.354561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
26 1
 
2.4%
20 1
 
2.4%
21 1
 
2.4%
22 1
 
2.4%
23 1
 
2.4%
24 1
 
2.4%
25 1
 
2.4%
27 1
 
2.4%
2 1
 
2.4%
28 1
 
2.4%
Other values (23) 23
54.8%
(Missing) 9
 
21.4%
ValueCountFrequency (%)
1 1
2.4%
2 1
2.4%
3 1
2.4%
4 1
2.4%
5 1
2.4%
6 1
2.4%
7 1
2.4%
8 1
2.4%
9 1
2.4%
10 1
2.4%
ValueCountFrequency (%)
33 1
2.4%
32 1
2.4%
31 1
2.4%
30 1
2.4%
29 1
2.4%
28 1
2.4%
27 1
2.4%
26 1
2.4%
25 1
2.4%
24 1
2.4%

업종
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Memory size468.0 B
주유소
33 
<NA>

Length

Max length4
Median length3
Mean length3.2142857
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row주유소
2nd row주유소
3rd row주유소
4th row주유소
5th row주유소

Common Values

ValueCountFrequency (%)
주유소 33
78.6%
<NA> 9
 
21.4%

Length

2023-12-12T20:20:36.587685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:20:36.782592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
주유소 33
78.6%
na 9
 
21.4%

상호
Text

MISSING 

Distinct33
Distinct (%)100.0%
Missing9
Missing (%)21.4%
Memory size468.0 B
2023-12-12T20:20:37.109074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length24
Mean length9.5151515
Min length5

Characters and Unicode

Total characters314
Distinct characters93
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st row(주)양지 남부대로주유소
2nd row동탄세마주유소
3rd row대명에너지(해솔)
4th row㈜다원에너지직영 영일주유소
5th rowSK에너지(주)운암뜰 주유소
ValueCountFrequency (%)
에이치디현대오일뱅크(주)직영 2
 
4.9%
kh에너지(주)직영 2
 
4.9%
주유소 2
 
4.9%
온새미주유소 1
 
2.4%
동탄세마주유소 1
 
2.4%
서동탄셀프주유소 1
 
2.4%
우영주유소 1
 
2.4%
오산세교셀프주유소 1
 
2.4%
삼화주유소 1
 
2.4%
까막셀프주유소 1
 
2.4%
Other values (28) 28
68.3%
2023-12-12T20:20:37.736184image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
38
 
12.1%
32
 
10.2%
31
 
9.9%
) 11
 
3.5%
11
 
3.5%
( 11
 
3.5%
10
 
3.2%
9
 
2.9%
8
 
2.5%
8
 
2.5%
Other values (83) 145
46.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 267
85.0%
Close Punctuation 11
 
3.5%
Open Punctuation 11
 
3.5%
Space Separator 8
 
2.5%
Uppercase Letter 8
 
2.5%
Lowercase Letter 6
 
1.9%
Other Symbol 2
 
0.6%
Decimal Number 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
38
 
14.2%
32
 
12.0%
31
 
11.6%
11
 
4.1%
10
 
3.7%
9
 
3.4%
8
 
3.0%
7
 
2.6%
7
 
2.6%
6
 
2.2%
Other values (69) 108
40.4%
Lowercase Letter
ValueCountFrequency (%)
s 2
33.3%
k 1
16.7%
f 1
16.7%
l 1
16.7%
e 1
16.7%
Uppercase Letter
ValueCountFrequency (%)
K 3
37.5%
S 2
25.0%
H 2
25.0%
G 1
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%
Space Separator
ValueCountFrequency (%)
8
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 269
85.7%
Common 31
 
9.9%
Latin 14
 
4.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
38
 
14.1%
32
 
11.9%
31
 
11.5%
11
 
4.1%
10
 
3.7%
9
 
3.3%
8
 
3.0%
7
 
2.6%
7
 
2.6%
6
 
2.2%
Other values (70) 110
40.9%
Latin
ValueCountFrequency (%)
K 3
21.4%
s 2
14.3%
S 2
14.3%
H 2
14.3%
k 1
 
7.1%
f 1
 
7.1%
l 1
 
7.1%
e 1
 
7.1%
G 1
 
7.1%
Common
ValueCountFrequency (%)
) 11
35.5%
( 11
35.5%
8
25.8%
2 1
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 267
85.0%
ASCII 45
 
14.3%
None 2
 
0.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
38
 
14.2%
32
 
12.0%
31
 
11.6%
11
 
4.1%
10
 
3.7%
9
 
3.4%
8
 
3.0%
7
 
2.6%
7
 
2.6%
6
 
2.2%
Other values (69) 108
40.4%
ASCII
ValueCountFrequency (%)
) 11
24.4%
( 11
24.4%
8
17.8%
K 3
 
6.7%
s 2
 
4.4%
S 2
 
4.4%
H 2
 
4.4%
k 1
 
2.2%
f 1
 
2.2%
l 1
 
2.2%
Other values (3) 3
 
6.7%
None
ValueCountFrequency (%)
2
100.0%

성명(법인명)
Text

MISSING 

Distinct28
Distinct (%)84.8%
Missing9
Missing (%)21.4%
Memory size468.0 B
2023-12-12T20:20:38.054045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters99
Distinct characters47
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique23 ?
Unique (%)69.7%

Sample

1st row이길준
2nd row엄광식
3rd row이시영
4th row조정득
5th row조경목
ValueCountFrequency (%)
조정득 2
 
6.1%
주영민 2
 
6.1%
송준원 2
 
6.1%
이시영 2
 
6.1%
조경목 2
 
6.1%
현오승 1
 
3.0%
김회관 1
 
3.0%
김태형 1
 
3.0%
장지수 1
 
3.0%
김복선 1
 
3.0%
Other values (18) 18
54.5%
2023-12-12T20:20:38.546562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
8.1%
5
 
5.1%
5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
3
 
3.0%
Other values (37) 55
55.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 99
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
8.1%
5
 
5.1%
5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
3
 
3.0%
Other values (37) 55
55.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 99
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
8.1%
5
 
5.1%
5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
3
 
3.0%
Other values (37) 55
55.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 99
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
8.1%
5
 
5.1%
5
 
5.1%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
3
 
3.0%
Other values (37) 55
55.6%

사업장소재지
Text

MISSING 

Distinct33
Distinct (%)100.0%
Missing9
Missing (%)21.4%
Memory size468.0 B
2023-12-12T20:20:38.860313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length8
Mean length8.2424242
Min length6

Characters and Unicode

Total characters272
Distinct characters45
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique33 ?
Unique (%)100.0%

Sample

1st row남부대로 10
2nd row외삼미로 162-12
3rd row경기대로 587
4th row남부대로 354
5th row경기대로 317
ValueCountFrequency (%)
경기대로 12
 
18.2%
남부대로 5
 
7.6%
문시로 2
 
3.0%
동부대로 2
 
3.0%
원동로 2
 
3.0%
수도권제2순환고속도로 2
 
3.0%
129 1
 
1.5%
13-1 1
 
1.5%
127 1
 
1.5%
가장로 1
 
1.5%
Other values (37) 37
56.1%
2023-12-12T20:20:39.467538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
37
 
13.6%
33
 
12.1%
1 20
 
7.4%
19
 
7.0%
4 13
 
4.8%
2 13
 
4.8%
12
 
4.4%
12
 
4.4%
5 9
 
3.3%
7 9
 
3.3%
Other values (35) 95
34.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 135
49.6%
Decimal Number 96
35.3%
Space Separator 37
 
13.6%
Dash Punctuation 4
 
1.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
33
24.4%
19
14.1%
12
 
8.9%
12
 
8.9%
7
 
5.2%
5
 
3.7%
5
 
3.7%
4
 
3.0%
3
 
2.2%
2
 
1.5%
Other values (23) 33
24.4%
Decimal Number
ValueCountFrequency (%)
1 20
20.8%
4 13
13.5%
2 13
13.5%
5 9
9.4%
7 9
9.4%
6 8
 
8.3%
3 8
 
8.3%
0 7
 
7.3%
9 5
 
5.2%
8 4
 
4.2%
Space Separator
ValueCountFrequency (%)
37
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 137
50.4%
Hangul 135
49.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
33
24.4%
19
14.1%
12
 
8.9%
12
 
8.9%
7
 
5.2%
5
 
3.7%
5
 
3.7%
4
 
3.0%
3
 
2.2%
2
 
1.5%
Other values (23) 33
24.4%
Common
ValueCountFrequency (%)
37
27.0%
1 20
14.6%
4 13
 
9.5%
2 13
 
9.5%
5 9
 
6.6%
7 9
 
6.6%
6 8
 
5.8%
3 8
 
5.8%
0 7
 
5.1%
9 5
 
3.6%
Other values (2) 8
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 137
50.4%
Hangul 135
49.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
37
27.0%
1 20
14.6%
4 13
 
9.5%
2 13
 
9.5%
5 9
 
6.6%
7 9
 
6.6%
6 8
 
5.8%
3 8
 
5.8%
0 7
 
5.1%
9 5
 
3.6%
Other values (2) 8
 
5.8%
Hangul
ValueCountFrequency (%)
33
24.4%
19
14.1%
12
 
8.9%
12
 
8.9%
7
 
5.2%
5
 
3.7%
5
 
3.7%
4
 
3.0%
3
 
2.2%
2
 
1.5%
Other values (23) 33
24.4%

사업장연면적
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct32
Distinct (%)97.0%
Missing9
Missing (%)21.4%
Infinite0
Infinite (%)0.0%
Mean577.62121
Minimum121
Maximum1991
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size510.0 B
2023-12-12T20:20:39.692185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum121
5-th percentile134.84
Q1253
median392.81
Q3666.85
95-th percentile1673.4
Maximum1991
Range1870
Interquartile range (IQR)413.85

Descriptive statistics

Standard deviation515.20237
Coefficient of variation (CV)0.8919381
Kurtosis1.5479665
Mean577.62121
Median Absolute Deviation (MAD)184.19
Skewness1.5917176
Sum19061.5
Variance265433.48
MonotonicityNot monotonic
2023-12-12T20:20:39.945848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
432.77 2
 
4.8%
269.64 1
 
2.4%
577.0 1
 
2.4%
1991.0 1
 
2.4%
993.0 1
 
2.4%
1385.0 1
 
2.4%
1848.0 1
 
2.4%
1524.0 1
 
2.4%
666.85 1
 
2.4%
1557.0 1
 
2.4%
Other values (22) 22
52.4%
(Missing) 9
21.4%
ValueCountFrequency (%)
121.0 1
2.4%
127.1 1
2.4%
140.0 1
2.4%
140.8 1
2.4%
185.0 1
2.4%
186.14 1
2.4%
199.59 1
2.4%
235.5 1
2.4%
253.0 1
2.4%
269.64 1
2.4%
ValueCountFrequency (%)
1991.0 1
2.4%
1848.0 1
2.4%
1557.0 1
2.4%
1524.0 1
2.4%
1385.0 1
2.4%
993.0 1
2.4%
855.72 1
2.4%
781.2 1
2.4%
666.85 1
2.4%
577.0 1
2.4%

Interactions

2023-12-12T20:20:35.101494image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:20:34.780888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:20:35.239013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:20:34.942291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:20:40.135440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번상호성명(법인명)사업장소재지사업장연면적
연번1.0001.0000.8501.0000.000
상호1.0001.0001.0001.0001.000
성명(법인명)0.8501.0001.0001.0000.000
사업장소재지1.0001.0001.0001.0001.000
사업장연면적0.0001.0000.0001.0001.000
2023-12-12T20:20:40.311263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업장연면적업종
연번1.0000.3561.000
사업장연면적0.3561.0001.000
업종1.0001.0001.000

Missing values

2023-12-12T20:20:35.424642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:20:35.592531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T20:20:35.771597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번업종상호성명(법인명)사업장소재지사업장연면적
01주유소(주)양지 남부대로주유소이길준남부대로 10283.0
12주유소동탄세마주유소엄광식외삼미로 162-12253.0
23주유소대명에너지(해솔)이시영경기대로 587781.2
34주유소㈜다원에너지직영 영일주유소조정득남부대로 354408.54
45주유소SK에너지(주)운암뜰 주유소조경목경기대로 317379.05
56주유소차사랑주유소이희광경기대로 296480.6
67주유소오산태양주유소채선일동부대로 574392.81
78주유소한일주유소김연분원동로 47235.5
89주유소오산제일주유소김경수원동로 74855.72
910주유소온새미주유소백주현남부대로 482354.37
연번업종상호성명(법인명)사업장소재지사업장연면적
3233주유소대복제2주유소김태형남부대로 511577.0
33<NA><NA><NA><NA><NA><NA>
34<NA><NA><NA><NA><NA><NA>
35<NA><NA><NA><NA><NA><NA>
36<NA><NA><NA><NA><NA><NA>
37<NA><NA><NA><NA><NA><NA>
38<NA><NA><NA><NA><NA><NA>
39<NA><NA><NA><NA><NA><NA>
40<NA><NA><NA><NA><NA><NA>
41<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번업종상호성명(법인명)사업장소재지사업장연면적# duplicates
0<NA><NA><NA><NA><NA><NA>9