Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells19966
Missing cells (%)66.6%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Categorical1
Text2

Dataset

Description전라남도 강진군 담배소매인 지정 현황에 대한 데이터로 민원구분(구내/일반), 업소명, 업소 주소, 업소 연락처에 대한 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15035628/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
민원구분(구내_일반 등) is highly imbalanced (98.2%)Imbalance
업소명 has 9983 (99.8%) missing valuesMissing
업소 주소 has 9983 (99.8%) missing valuesMissing

Reproduction

Analysis started2023-12-12 23:27:03.769557
Analysis finished2023-12-12 23:27:04.156648
Duration0.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

민원구분(구내_일반 등)
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9983 
일반소매인
 
17

Length

Max length5
Median length4
Mean length4.0017
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9983
99.8%
일반소매인 17
 
0.2%

Length

2023-12-13T08:27:04.233664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:27:04.328750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9983
99.8%
일반소매인 17
 
0.2%

업소명
Text

MISSING 

Distinct17
Distinct (%)100.0%
Missing9983
Missing (%)99.8%
Memory size156.2 KiB
2023-12-13T08:27:04.493594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length4
Mean length6.7058824
Min length4

Characters and Unicode

Total characters114
Distinct characters64
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)100.0%

Sample

1st row신흥마트
2nd row병영슈퍼
3rd row씨유 강진마량점
4th row목로상점
5th row강진슈퍼
ValueCountFrequency (%)
이마트24 2
 
8.3%
신흥마트 1
 
4.2%
강진아뜨리움점 1
 
4.2%
장안식당 1
 
4.2%
정민마트 1
 
4.2%
도암연쇄점 1
 
4.2%
주작마트 1
 
4.2%
강진lh점 1
 
4.2%
cu 1
 
4.2%
월궁정류소 1
 
4.2%
Other values (13) 13
54.2%
2023-12-13T08:27:04.773248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8
 
7.0%
7
 
6.1%
7
 
6.1%
7
 
6.1%
6
 
5.3%
5
 
4.4%
2 4
 
3.5%
3
 
2.6%
3
 
2.6%
2
 
1.8%
Other values (54) 62
54.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 88
77.2%
Uppercase Letter 11
 
9.6%
Decimal Number 8
 
7.0%
Space Separator 7
 
6.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
9.1%
7
 
8.0%
7
 
8.0%
6
 
6.8%
5
 
5.7%
3
 
3.4%
3
 
3.4%
2
 
2.3%
2
 
2.3%
2
 
2.3%
Other values (40) 43
48.9%
Uppercase Letter
ValueCountFrequency (%)
S 2
18.2%
C 1
9.1%
U 1
9.1%
L 1
9.1%
H 1
9.1%
G 1
9.1%
M 1
9.1%
A 1
9.1%
R 1
9.1%
T 1
9.1%
Decimal Number
ValueCountFrequency (%)
2 4
50.0%
5 2
25.0%
4 2
25.0%
Space Separator
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 88
77.2%
Common 15
 
13.2%
Latin 11
 
9.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
9.1%
7
 
8.0%
7
 
8.0%
6
 
6.8%
5
 
5.7%
3
 
3.4%
3
 
3.4%
2
 
2.3%
2
 
2.3%
2
 
2.3%
Other values (40) 43
48.9%
Latin
ValueCountFrequency (%)
S 2
18.2%
C 1
9.1%
U 1
9.1%
L 1
9.1%
H 1
9.1%
G 1
9.1%
M 1
9.1%
A 1
9.1%
R 1
9.1%
T 1
9.1%
Common
ValueCountFrequency (%)
7
46.7%
2 4
26.7%
5 2
 
13.3%
4 2
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 88
77.2%
ASCII 26
 
22.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8
 
9.1%
7
 
8.0%
7
 
8.0%
6
 
6.8%
5
 
5.7%
3
 
3.4%
3
 
3.4%
2
 
2.3%
2
 
2.3%
2
 
2.3%
Other values (40) 43
48.9%
ASCII
ValueCountFrequency (%)
7
26.9%
2 4
15.4%
5 2
 
7.7%
S 2
 
7.7%
4 2
 
7.7%
C 1
 
3.8%
U 1
 
3.8%
L 1
 
3.8%
H 1
 
3.8%
G 1
 
3.8%
Other values (4) 4
15.4%

업소 주소
Text

MISSING 

Distinct17
Distinct (%)100.0%
Missing9983
Missing (%)99.8%
Memory size156.2 KiB
2023-12-13T08:27:04.969682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length22
Mean length20.117647
Min length18

Characters and Unicode

Total characters342
Distinct characters56
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)100.0%

Sample

1st row전라남도 강진군 강진읍 보은로4길 7
2nd row전라남도 강진군 병영면 병영성로 107-6
3rd row전라남도 강진군 마량면 미항로 137
4th row전라남도 강진군 성전면 무위사로 2
5th row전라남도 강진군 강진읍 중앙로 152-2
ValueCountFrequency (%)
전라남도 17
20.0%
강진군 17
20.0%
강진읍 7
 
8.2%
칠량면 3
 
3.5%
성전면 3
 
3.5%
칠량로 2
 
2.4%
영랑로 2
 
2.4%
18 1
 
1.2%
77 1
 
1.2%
월하안운길 1
 
1.2%
Other values (31) 31
36.5%
2023-12-13T08:27:05.300719image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
68
19.9%
25
 
7.3%
24
 
7.0%
21
 
6.1%
20
 
5.8%
17
 
5.0%
17
 
5.0%
17
 
5.0%
14
 
4.1%
10
 
2.9%
Other values (46) 109
31.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 229
67.0%
Space Separator 68
 
19.9%
Decimal Number 42
 
12.3%
Dash Punctuation 3
 
0.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
25
10.9%
24
10.5%
21
 
9.2%
20
 
8.7%
17
 
7.4%
17
 
7.4%
17
 
7.4%
14
 
6.1%
10
 
4.4%
7
 
3.1%
Other values (34) 57
24.9%
Decimal Number
ValueCountFrequency (%)
1 10
23.8%
2 6
14.3%
7 6
14.3%
5 5
11.9%
3 4
 
9.5%
8 3
 
7.1%
0 3
 
7.1%
6 2
 
4.8%
9 2
 
4.8%
4 1
 
2.4%
Space Separator
ValueCountFrequency (%)
68
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 229
67.0%
Common 113
33.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
25
10.9%
24
10.5%
21
 
9.2%
20
 
8.7%
17
 
7.4%
17
 
7.4%
17
 
7.4%
14
 
6.1%
10
 
4.4%
7
 
3.1%
Other values (34) 57
24.9%
Common
ValueCountFrequency (%)
68
60.2%
1 10
 
8.8%
2 6
 
5.3%
7 6
 
5.3%
5 5
 
4.4%
3 4
 
3.5%
8 3
 
2.7%
- 3
 
2.7%
0 3
 
2.7%
6 2
 
1.8%
Other values (2) 3
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 229
67.0%
ASCII 113
33.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
68
60.2%
1 10
 
8.8%
2 6
 
5.3%
7 6
 
5.3%
5 5
 
4.4%
3 4
 
3.5%
8 3
 
2.7%
- 3
 
2.7%
0 3
 
2.7%
6 2
 
1.8%
Other values (2) 3
 
2.7%
Hangul
ValueCountFrequency (%)
25
10.9%
24
10.5%
21
 
9.2%
20
 
8.7%
17
 
7.4%
17
 
7.4%
17
 
7.4%
14
 
6.1%
10
 
4.4%
7
 
3.1%
Other values (34) 57
24.9%

Correlations

2023-12-13T08:27:05.395538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업소명업소 주소
업소명1.0001.000
업소 주소1.0001.000

Missing values

2023-12-13T08:27:03.959498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:27:04.026817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:27:04.101434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

민원구분(구내_일반 등)업소명업소 주소
11154<NA><NA><NA>
43745<NA><NA><NA>
19815<NA><NA><NA>
50978<NA><NA><NA>
32344<NA><NA><NA>
35204<NA><NA><NA>
20613<NA><NA><NA>
23003<NA><NA><NA>
29106<NA><NA><NA>
14359<NA><NA><NA>
민원구분(구내_일반 등)업소명업소 주소
56114<NA><NA><NA>
37412<NA><NA><NA>
9845<NA><NA><NA>
47523<NA><NA><NA>
61999<NA><NA><NA>
60590<NA><NA><NA>
30719<NA><NA><NA>
14197<NA><NA><NA>
4044<NA><NA><NA>
55389<NA><NA><NA>

Duplicate rows

Most frequently occurring

민원구분(구내_일반 등)업소명업소 주소# duplicates
0<NA><NA><NA>9983