Overview

Dataset statistics

Number of variables5
Number of observations644
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory25.9 KiB
Average record size in memory41.2 B

Variable types

Numeric1
Categorical1
Text3

Dataset

Description충청남도 서산시 담배소매인 데이터입니다. 항목명은 연번, 소매인구분, 업소명, 업소지번주소, 업소도로명주소 등으로 이루어져 있습니다.
URLhttps://www.data.go.kr/data/15113318/fileData.do

Alerts

소매인구분 is highly imbalanced (65.4%)Imbalance
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 10:48:17.813440
Analysis finished2023-12-12 10:48:18.881310
Duration1.07 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct644
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean322.5
Minimum1
Maximum644
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.8 KiB
2023-12-12T19:48:18.974931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile33.15
Q1161.75
median322.5
Q3483.25
95-th percentile611.85
Maximum644
Range643
Interquartile range (IQR)321.5

Descriptive statistics

Standard deviation186.05107
Coefficient of variation (CV)0.57690254
Kurtosis-1.2
Mean322.5
Median Absolute Deviation (MAD)161
Skewness0
Sum207690
Variance34615
MonotonicityStrictly increasing
2023-12-12T19:48:19.196314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
405 1
 
0.2%
427 1
 
0.2%
428 1
 
0.2%
429 1
 
0.2%
430 1
 
0.2%
431 1
 
0.2%
432 1
 
0.2%
433 1
 
0.2%
434 1
 
0.2%
Other values (634) 634
98.4%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
644 1
0.2%
643 1
0.2%
642 1
0.2%
641 1
0.2%
640 1
0.2%
639 1
0.2%
638 1
0.2%
637 1
0.2%
636 1
0.2%
635 1
0.2%

소매인구분
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
일반소매인
565 
구내소매인
78 
자동판매기
 
1

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row일반소매인
2nd row일반소매인
3rd row일반소매인
4th row일반소매인
5th row구내소매인

Common Values

ValueCountFrequency (%)
일반소매인 565
87.7%
구내소매인 78
 
12.1%
자동판매기 1
 
0.2%

Length

2023-12-12T19:48:19.432838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:48:19.564293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반소매인 565
87.7%
구내소매인 78
 
12.1%
자동판매기 1
 
0.2%
Distinct619
Distinct (%)96.1%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-12T19:48:19.905620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length18
Mean length8.1459627
Min length1

Characters and Unicode

Total characters5246
Distinct characters416
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique606 ?
Unique (%)94.1%

Sample

1st row(주)코리아세븐 서산석림타운점
2nd row(주)코리아세븐 서산수석점
3rd row이마트24 서산일람점
4th row(주)현대그린푸드 서산주행시험장
5th row(주)현대그린푸드오일뱅크
ValueCountFrequency (%)
씨유 53
 
5.8%
세븐일레븐 43
 
4.7%
이마트24 29
 
3.2%
지에스25 16
 
1.7%
주)코리아세븐 13
 
1.4%
gs25 12
 
1.3%
지에스(gs)25 12
 
1.3%
하나로마트 6
 
0.7%
미니스톱 5
 
0.5%
서산예천점 4
 
0.4%
Other values (650) 723
78.9%
2023-12-12T19:48:20.929269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
296
 
5.6%
294
 
5.6%
287
 
5.5%
274
 
5.2%
131
 
2.5%
131
 
2.5%
108
 
2.1%
2 98
 
1.9%
97
 
1.8%
96
 
1.8%
Other values (406) 3434
65.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4468
85.2%
Space Separator 294
 
5.6%
Decimal Number 201
 
3.8%
Uppercase Letter 113
 
2.2%
Close Punctuation 68
 
1.3%
Open Punctuation 67
 
1.3%
Lowercase Letter 30
 
0.6%
Other Punctuation 4
 
0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
296
 
6.6%
287
 
6.4%
274
 
6.1%
131
 
2.9%
131
 
2.9%
108
 
2.4%
97
 
2.2%
96
 
2.1%
85
 
1.9%
80
 
1.8%
Other values (358) 2883
64.5%
Lowercase Letter
ValueCountFrequency (%)
e 7
23.3%
l 3
 
10.0%
a 3
 
10.0%
i 2
 
6.7%
t 2
 
6.7%
c 1
 
3.3%
u 1
 
3.3%
h 1
 
3.3%
o 1
 
3.3%
x 1
 
3.3%
Other values (8) 8
26.7%
Uppercase Letter
ValueCountFrequency (%)
S 37
32.7%
G 35
31.0%
C 14
 
12.4%
U 9
 
8.0%
D 4
 
3.5%
V 2
 
1.8%
K 2
 
1.8%
B 2
 
1.8%
I 2
 
1.8%
X 1
 
0.9%
Other values (5) 5
 
4.4%
Decimal Number
ValueCountFrequency (%)
2 98
48.8%
5 53
26.4%
4 39
 
19.4%
8 3
 
1.5%
1 3
 
1.5%
9 2
 
1.0%
3 1
 
0.5%
0 1
 
0.5%
7 1
 
0.5%
Other Punctuation
ValueCountFrequency (%)
, 3
75.0%
: 1
 
25.0%
Space Separator
ValueCountFrequency (%)
294
100.0%
Close Punctuation
ValueCountFrequency (%)
) 68
100.0%
Open Punctuation
ValueCountFrequency (%)
( 67
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4468
85.2%
Common 635
 
12.1%
Latin 143
 
2.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
296
 
6.6%
287
 
6.4%
274
 
6.1%
131
 
2.9%
131
 
2.9%
108
 
2.4%
97
 
2.2%
96
 
2.1%
85
 
1.9%
80
 
1.8%
Other values (358) 2883
64.5%
Latin
ValueCountFrequency (%)
S 37
25.9%
G 35
24.5%
C 14
 
9.8%
U 9
 
6.3%
e 7
 
4.9%
D 4
 
2.8%
l 3
 
2.1%
a 3
 
2.1%
V 2
 
1.4%
K 2
 
1.4%
Other values (23) 27
18.9%
Common
ValueCountFrequency (%)
294
46.3%
2 98
 
15.4%
) 68
 
10.7%
( 67
 
10.6%
5 53
 
8.3%
4 39
 
6.1%
8 3
 
0.5%
, 3
 
0.5%
1 3
 
0.5%
9 2
 
0.3%
Other values (5) 5
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4468
85.2%
ASCII 778
 
14.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
296
 
6.6%
287
 
6.4%
274
 
6.1%
131
 
2.9%
131
 
2.9%
108
 
2.4%
97
 
2.2%
96
 
2.1%
85
 
1.9%
80
 
1.8%
Other values (358) 2883
64.5%
ASCII
ValueCountFrequency (%)
294
37.8%
2 98
 
12.6%
) 68
 
8.7%
( 67
 
8.6%
5 53
 
6.8%
4 39
 
5.0%
S 37
 
4.8%
G 35
 
4.5%
C 14
 
1.8%
U 9
 
1.2%
Other values (38) 64
 
8.2%
Distinct544
Distinct (%)84.5%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-12T19:48:21.296681image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length46
Median length40
Mean length20.218944
Min length1

Characters and Unicode

Total characters13021
Distinct characters242
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique538 ?
Unique (%)83.5%

Sample

1st row충청남도 서산시 석림동 695-4
2nd row충청남도 서산시 석림동 499-4
3rd row충청남도 서산시 성연면 일람리 788
4th row충청남도 서산시 부석면 갈마리 727
5th row충청남도 서산시 대산읍 화곡리 987-1 현대오일뱅크사원연립주택
ValueCountFrequency (%)
충청남도 548
 
18.7%
서산시 548
 
18.7%
대산읍 71
 
2.4%
동문동 70
 
2.4%
1호 57
 
1.9%
읍내동 52
 
1.8%
해미면 48
 
1.6%
예천동 41
 
1.4%
석림동 37
 
1.3%
2호 31
 
1.1%
Other values (770) 1431
48.8%
2023-12-12T19:48:21.897286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2482
19.1%
707
 
5.4%
572
 
4.4%
565
 
4.3%
562
 
4.3%
551
 
4.2%
551
 
4.2%
549
 
4.2%
1 526
 
4.0%
398
 
3.1%
Other values (232) 5558
42.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8049
61.8%
Space Separator 2482
 
19.1%
Decimal Number 2314
 
17.8%
Dash Punctuation 137
 
1.1%
Other Punctuation 16
 
0.1%
Uppercase Letter 15
 
0.1%
Close Punctuation 3
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Math Symbol 1
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
707
 
8.8%
572
 
7.1%
565
 
7.0%
562
 
7.0%
551
 
6.8%
551
 
6.8%
549
 
6.8%
398
 
4.9%
396
 
4.9%
351
 
4.4%
Other values (209) 2847
35.4%
Decimal Number
ValueCountFrequency (%)
1 526
22.7%
2 284
12.3%
3 227
9.8%
6 202
 
8.7%
0 196
 
8.5%
5 195
 
8.4%
4 194
 
8.4%
9 172
 
7.4%
7 172
 
7.4%
8 146
 
6.3%
Uppercase Letter
ValueCountFrequency (%)
A 5
33.3%
P 3
20.0%
T 3
20.0%
S 2
 
13.3%
C 1
 
6.7%
B 1
 
6.7%
Space Separator
ValueCountFrequency (%)
2482
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 137
100.0%
Other Punctuation
ValueCountFrequency (%)
, 16
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8049
61.8%
Common 4956
38.1%
Latin 16
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
707
 
8.8%
572
 
7.1%
565
 
7.0%
562
 
7.0%
551
 
6.8%
551
 
6.8%
549
 
6.8%
398
 
4.9%
396
 
4.9%
351
 
4.4%
Other values (209) 2847
35.4%
Common
ValueCountFrequency (%)
2482
50.1%
1 526
 
10.6%
2 284
 
5.7%
3 227
 
4.6%
6 202
 
4.1%
0 196
 
4.0%
5 195
 
3.9%
4 194
 
3.9%
9 172
 
3.5%
7 172
 
3.5%
Other values (6) 306
 
6.2%
Latin
ValueCountFrequency (%)
A 5
31.2%
P 3
18.8%
T 3
18.8%
S 2
 
12.5%
C 1
 
6.2%
B 1
 
6.2%
e 1
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8049
61.8%
ASCII 4972
38.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2482
49.9%
1 526
 
10.6%
2 284
 
5.7%
3 227
 
4.6%
6 202
 
4.1%
0 196
 
3.9%
5 195
 
3.9%
4 194
 
3.9%
9 172
 
3.5%
7 172
 
3.5%
Other values (13) 322
 
6.5%
Hangul
ValueCountFrequency (%)
707
 
8.8%
572
 
7.1%
565
 
7.0%
562
 
7.0%
551
 
6.8%
551
 
6.8%
549
 
6.8%
398
 
4.9%
396
 
4.9%
351
 
4.4%
Other values (209) 2847
35.4%
Distinct572
Distinct (%)88.8%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-12T19:48:22.370921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length60
Median length52
Mean length23.454969
Min length1

Characters and Unicode

Total characters15105
Distinct characters271
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique566 ?
Unique (%)87.9%

Sample

1st row충청남도 서산시 한마음16로 36 (석림동)
2nd row충청남도 서산시 동서1로 17, 1층 (석림동)
3rd row충청남도 서산시 성연면 충의로 400
4th row충청남도 서산시 부석면 부남1로 67
5th row충청남도 서산시 대산읍 평신2로 26, 2층 (현대오일뱅크사원연립주택)
ValueCountFrequency (%)
충청남도 576
 
17.8%
서산시 576
 
17.8%
1층 95
 
2.9%
동문동 79
 
2.4%
대산읍 69
 
2.1%
읍내동 52
 
1.6%
해미면 46
 
1.4%
예천동 42
 
1.3%
성연면 35
 
1.1%
충의로 34
 
1.0%
Other values (770) 1637
50.5%
2023-12-12T19:48:23.065770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2733
18.1%
709
 
4.7%
1 702
 
4.6%
658
 
4.4%
624
 
4.1%
611
 
4.0%
593
 
3.9%
591
 
3.9%
579
 
3.8%
520
 
3.4%
Other values (261) 6785
44.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8836
58.5%
Space Separator 2733
 
18.1%
Decimal Number 2442
 
16.2%
Open Punctuation 345
 
2.3%
Close Punctuation 345
 
2.3%
Other Punctuation 274
 
1.8%
Dash Punctuation 112
 
0.7%
Uppercase Letter 13
 
0.1%
Math Symbol 4
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
709
 
8.0%
658
 
7.4%
624
 
7.1%
611
 
6.9%
593
 
6.7%
591
 
6.7%
579
 
6.6%
520
 
5.9%
482
 
5.5%
208
 
2.4%
Other values (240) 3261
36.9%
Decimal Number
ValueCountFrequency (%)
1 702
28.7%
2 323
13.2%
3 278
 
11.4%
0 224
 
9.2%
4 198
 
8.1%
6 171
 
7.0%
7 167
 
6.8%
5 157
 
6.4%
8 118
 
4.8%
9 104
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
B 5
38.5%
A 4
30.8%
C 2
 
15.4%
S 2
 
15.4%
Space Separator
ValueCountFrequency (%)
2733
100.0%
Open Punctuation
ValueCountFrequency (%)
( 345
100.0%
Close Punctuation
ValueCountFrequency (%)
) 345
100.0%
Other Punctuation
ValueCountFrequency (%)
, 274
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 112
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8836
58.5%
Common 6255
41.4%
Latin 14
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
709
 
8.0%
658
 
7.4%
624
 
7.1%
611
 
6.9%
593
 
6.7%
591
 
6.7%
579
 
6.6%
520
 
5.9%
482
 
5.5%
208
 
2.4%
Other values (240) 3261
36.9%
Common
ValueCountFrequency (%)
2733
43.7%
1 702
 
11.2%
( 345
 
5.5%
) 345
 
5.5%
2 323
 
5.2%
3 278
 
4.4%
, 274
 
4.4%
0 224
 
3.6%
4 198
 
3.2%
6 171
 
2.7%
Other values (6) 662
 
10.6%
Latin
ValueCountFrequency (%)
B 5
35.7%
A 4
28.6%
C 2
 
14.3%
S 2
 
14.3%
e 1
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8836
58.5%
ASCII 6269
41.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2733
43.6%
1 702
 
11.2%
( 345
 
5.5%
) 345
 
5.5%
2 323
 
5.2%
3 278
 
4.4%
, 274
 
4.4%
0 224
 
3.6%
4 198
 
3.2%
6 171
 
2.7%
Other values (11) 676
 
10.8%
Hangul
ValueCountFrequency (%)
709
 
8.0%
658
 
7.4%
624
 
7.1%
611
 
6.9%
593
 
6.7%
591
 
6.7%
579
 
6.6%
520
 
5.9%
482
 
5.5%
208
 
2.4%
Other values (240) 3261
36.9%

Interactions

2023-12-12T19:48:18.382228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T19:48:23.195709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번소매인구분
연번1.0000.122
소매인구분0.1221.000
2023-12-12T19:48:23.309691image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번소매인구분
연번1.0000.072
소매인구분0.0721.000

Missing values

2023-12-12T19:48:18.573545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:48:18.819312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번소매인구분업소명업소지번주소업소도로명주소
01일반소매인(주)코리아세븐 서산석림타운점충청남도 서산시 석림동 695-4충청남도 서산시 한마음16로 36 (석림동)
12일반소매인(주)코리아세븐 서산수석점충청남도 서산시 석림동 499-4충청남도 서산시 동서1로 17, 1층 (석림동)
23일반소매인이마트24 서산일람점충청남도 서산시 성연면 일람리 788충청남도 서산시 성연면 충의로 400
34일반소매인(주)현대그린푸드 서산주행시험장충청남도 서산시 부석면 갈마리 727충청남도 서산시 부석면 부남1로 67
45구내소매인(주)현대그린푸드오일뱅크충청남도 서산시 대산읍 화곡리 987-1 현대오일뱅크사원연립주택충청남도 서산시 대산읍 평신2로 26, 2층 (현대오일뱅크사원연립주택)
56일반소매인프렌저 스크린 골프충청남도 서산시 대산읍 기은리 596충청남도 서산시 대산읍 명지1로 270-5, 1층
67일반소매인지에스25 대산삼양점충청남도 서산시 대산읍 영탑리 575-81충청남도 서산시 대산읍 충의로 1843
78구내소매인씨유 서산오스카빌점충청남도 서산시 지곡면 무장리 920 늘푸른오스카빌충청남도 서산시 지곡면 충의로 762-78, 1층 109,110,111호 (늘푸른오스카빌)
89일반소매인씨유 서산동문빌리지점충청남도 서산시 동문동 131-4충청남도 서산시 율지8로 41-13, 1층 (동문동)
910일반소매인이마트24 대산한화토탈점충청남도 서산시 대산읍 독곶리 411-59 한화토탈충청남도 서산시 대산읍 독곶2로 103, 한화토탈
연번소매인구분업소명업소지번주소업소도로명주소
634635일반소매인상홍상회충청남도 서산시 음암면 상홍리 산183번지 3호
635636일반소매인충청남도 서산시 음암면 도당리 90번지 1호충청남도 서산시 음암면 도당로 244-3
636637일반소매인원평슈퍼충청남도 서산시 운산면 원평리 144호
637638일반소매인해태연쇄점충청남도 서산시 운산면 원벌리 49번지 1호충청남도 서산시 운산면 해운로 761
638639일반소매인삼화약국충청남도 서산시 동문1동 968번지 69호충청남도 서산시 번화2로 32(동문동)
639640일반소매인광명사충청남도 서산시 동문1동 900호 장옥400,401호충청남도 서산시 시장3길 3-12(동문동,장옥400,401호)
640641일반소매인벌말상회충청남도 서산시 대산읍 오지리 산241번지 4호
641642일반소매인유명슈퍼충청남도 서산시 고북면 가구리 607번지 13호
642643일반소매인상호없음충청남도 서산시 갈산동 162번지 2호
643644일반소매인고산식품충청남도 서산시 운산면 고산리 66호충청남도 서산시 운산면 장생동로 274