Overview

Dataset statistics

Number of variables6
Number of observations706
Missing cells36
Missing cells (%)0.8%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory33.9 KiB
Average record size in memory49.2 B

Variable types

Numeric1
Text2
Categorical2
DateTime1

Dataset

Description대전광역시 동구 담배소매인 지정업소 현황에 관한 데이터로서, 업소명, 업소주소 및 지정일자 등의 정보를 포함하고 있습니다.
Author대전광역시 동구
URLhttps://www.data.go.kr/data/15030121/fileData.do

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
소매인구분 is highly overall correlated with 데이터기준일자High correlation
데이터기준일자 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
연번 is highly overall correlated with 데이터기준일자High correlation
소매인구분 is highly imbalanced (52.5%)Imbalance
데이터기준일자 is highly imbalanced (90.1%)Imbalance
연번 has 9 (1.3%) missing valuesMissing
업소명 has 9 (1.3%) missing valuesMissing
업소주소 has 9 (1.3%) missing valuesMissing
지정일자 has 9 (1.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 22:52:15.955538
Analysis finished2023-12-12 22:52:16.884634
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct697
Distinct (%)100.0%
Missing9
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean349
Minimum1
Maximum697
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2023-12-13T07:52:16.979594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile35.8
Q1175
median349
Q3523
95-th percentile662.2
Maximum697
Range696
Interquartile range (IQR)348

Descriptive statistics

Standard deviation201.35085
Coefficient of variation (CV)0.57693655
Kurtosis-1.2
Mean349
Median Absolute Deviation (MAD)174
Skewness0
Sum243253
Variance40542.167
MonotonicityStrictly increasing
2023-12-13T07:52:17.158962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
469 1
 
0.1%
461 1
 
0.1%
462 1
 
0.1%
463 1
 
0.1%
464 1
 
0.1%
465 1
 
0.1%
466 1
 
0.1%
467 1
 
0.1%
468 1
 
0.1%
470 1
 
0.1%
Other values (687) 687
97.3%
(Missing) 9
 
1.3%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
697 1
0.1%
696 1
0.1%
695 1
0.1%
694 1
0.1%
693 1
0.1%
692 1
0.1%
691 1
0.1%
690 1
0.1%
689 1
0.1%
688 1
0.1%

업소명
Text

MISSING 

Distinct683
Distinct (%)98.0%
Missing9
Missing (%)1.3%
Memory size5.6 KiB
2023-12-13T07:52:17.498014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length19
Mean length7.8593974
Min length1

Characters and Unicode

Total characters5478
Distinct characters415
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique670 ?
Unique (%)96.1%

Sample

1st row지에스25 가오중앙점
2nd row세븐일레븐 대전용전원룸점
3rd row이마트24 R 대전용전한빛점
4th row지에스25 복합터미널 2호점
5th row지에스25 대전한남대점
ValueCountFrequency (%)
씨유 49
 
5.0%
세븐일레븐 37
 
3.8%
지에스25 32
 
3.3%
이마트24 30
 
3.0%
gs25 12
 
1.2%
주)코리아세븐 10
 
1.0%
주식회사 8
 
0.8%
지에스25(gs25 8
 
0.8%
미니스톱 6
 
0.6%
대전가오점 5
 
0.5%
Other values (729) 787
80.0%
2023-12-13T07:52:18.002091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
293
 
5.3%
270
 
4.9%
261
 
4.8%
239
 
4.4%
190
 
3.5%
177
 
3.2%
2 120
 
2.2%
111
 
2.0%
87
 
1.6%
83
 
1.5%
Other values (405) 3647
66.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4683
85.5%
Space Separator 293
 
5.3%
Decimal Number 258
 
4.7%
Uppercase Letter 107
 
2.0%
Close Punctuation 53
 
1.0%
Open Punctuation 53
 
1.0%
Lowercase Letter 29
 
0.5%
Other Punctuation 1
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
270
 
5.8%
261
 
5.6%
239
 
5.1%
190
 
4.1%
177
 
3.8%
111
 
2.4%
87
 
1.9%
83
 
1.8%
82
 
1.8%
79
 
1.7%
Other values (364) 3104
66.3%
Uppercase Letter
ValueCountFrequency (%)
S 32
29.9%
G 29
27.1%
I 7
 
6.5%
C 6
 
5.6%
R 5
 
4.7%
Y 4
 
3.7%
K 4
 
3.7%
M 3
 
2.8%
L 3
 
2.8%
B 2
 
1.9%
Other values (8) 12
 
11.2%
Lowercase Letter
ValueCountFrequency (%)
s 5
17.2%
o 5
17.2%
e 5
17.2%
l 2
 
6.9%
r 2
 
6.9%
k 2
 
6.9%
w 2
 
6.9%
t 2
 
6.9%
c 2
 
6.9%
m 1
 
3.4%
Decimal Number
ValueCountFrequency (%)
2 120
46.5%
5 75
29.1%
4 46
 
17.8%
1 7
 
2.7%
0 6
 
2.3%
3 3
 
1.2%
6 1
 
0.4%
Space Separator
ValueCountFrequency (%)
293
100.0%
Close Punctuation
ValueCountFrequency (%)
) 53
100.0%
Open Punctuation
ValueCountFrequency (%)
( 53
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4683
85.5%
Common 659
 
12.0%
Latin 136
 
2.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
270
 
5.8%
261
 
5.6%
239
 
5.1%
190
 
4.1%
177
 
3.8%
111
 
2.4%
87
 
1.9%
83
 
1.8%
82
 
1.8%
79
 
1.7%
Other values (364) 3104
66.3%
Latin
ValueCountFrequency (%)
S 32
23.5%
G 29
21.3%
I 7
 
5.1%
C 6
 
4.4%
s 5
 
3.7%
R 5
 
3.7%
o 5
 
3.7%
e 5
 
3.7%
Y 4
 
2.9%
K 4
 
2.9%
Other values (19) 34
25.0%
Common
ValueCountFrequency (%)
293
44.5%
2 120
18.2%
5 75
 
11.4%
) 53
 
8.0%
( 53
 
8.0%
4 46
 
7.0%
1 7
 
1.1%
0 6
 
0.9%
3 3
 
0.5%
. 1
 
0.2%
Other values (2) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4683
85.5%
ASCII 795
 
14.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
293
36.9%
2 120
15.1%
5 75
 
9.4%
) 53
 
6.7%
( 53
 
6.7%
4 46
 
5.8%
S 32
 
4.0%
G 29
 
3.6%
I 7
 
0.9%
1 7
 
0.9%
Other values (31) 80
 
10.1%
Hangul
ValueCountFrequency (%)
270
 
5.8%
261
 
5.6%
239
 
5.1%
190
 
4.1%
177
 
3.8%
111
 
2.4%
87
 
1.9%
83
 
1.8%
82
 
1.8%
79
 
1.7%
Other values (364) 3104
66.3%

업소주소
Text

MISSING 

Distinct684
Distinct (%)98.1%
Missing9
Missing (%)1.3%
Memory size5.6 KiB
2023-12-13T07:52:18.224489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length47
Median length38
Mean length20.466284
Min length15

Characters and Unicode

Total characters14265
Distinct characters222
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique674 ?
Unique (%)96.7%

Sample

1st row대전광역시 동구 가오동 654 굿모닝타운
2nd row대전광역시 동구 용전동 125-23
3rd row대전광역시 동구 용전동 26-1
4th row대전광역시 동구 용전동 63-3 대전복합터미널(서관)
5th row대전광역시 동구 홍도동 78-11
ValueCountFrequency (%)
대전광역시 697
22.6%
동구 697
22.6%
가양동 95
 
3.1%
용전동 79
 
2.6%
용운동 57
 
1.8%
1층 55
 
1.8%
자양동 46
 
1.5%
삼성동 42
 
1.4%
성남동 41
 
1.3%
판암동 35
 
1.1%
Other values (832) 1242
40.2%
2023-12-13T07:52:18.589902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2777
19.5%
1435
 
10.1%
799
 
5.6%
783
 
5.5%
1 750
 
5.3%
711
 
5.0%
706
 
4.9%
703
 
4.9%
697
 
4.9%
- 590
 
4.1%
Other values (212) 4314
30.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7783
54.6%
Decimal Number 3059
 
21.4%
Space Separator 2777
 
19.5%
Dash Punctuation 590
 
4.1%
Other Punctuation 15
 
0.1%
Uppercase Letter 15
 
0.1%
Close Punctuation 12
 
0.1%
Open Punctuation 12
 
0.1%
Math Symbol 1
 
< 0.1%
Lowercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1435
18.4%
799
10.3%
783
10.1%
711
9.1%
706
9.1%
703
9.0%
697
9.0%
170
 
2.2%
152
 
2.0%
140
 
1.8%
Other values (184) 1487
19.1%
Decimal Number
ValueCountFrequency (%)
1 750
24.5%
2 394
12.9%
3 328
10.7%
4 296
 
9.7%
0 253
 
8.3%
5 248
 
8.1%
6 244
 
8.0%
7 202
 
6.6%
9 178
 
5.8%
8 166
 
5.4%
Uppercase Letter
ValueCountFrequency (%)
E 3
20.0%
H 2
13.3%
L 2
13.3%
W 2
13.3%
A 1
 
6.7%
K 1
 
6.7%
S 1
 
6.7%
V 1
 
6.7%
I 1
 
6.7%
C 1
 
6.7%
Other Punctuation
ValueCountFrequency (%)
. 14
93.3%
/ 1
 
6.7%
Space Separator
ValueCountFrequency (%)
2777
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 590
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7783
54.6%
Common 6466
45.3%
Latin 16
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1435
18.4%
799
10.3%
783
10.1%
711
9.1%
706
9.1%
703
9.0%
697
9.0%
170
 
2.2%
152
 
2.0%
140
 
1.8%
Other values (184) 1487
19.1%
Common
ValueCountFrequency (%)
2777
42.9%
1 750
 
11.6%
- 590
 
9.1%
2 394
 
6.1%
3 328
 
5.1%
4 296
 
4.6%
0 253
 
3.9%
5 248
 
3.8%
6 244
 
3.8%
7 202
 
3.1%
Other values (7) 384
 
5.9%
Latin
ValueCountFrequency (%)
E 3
18.8%
H 2
12.5%
L 2
12.5%
W 2
12.5%
A 1
 
6.2%
K 1
 
6.2%
e 1
 
6.2%
S 1
 
6.2%
V 1
 
6.2%
I 1
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7783
54.6%
ASCII 6482
45.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2777
42.8%
1 750
 
11.6%
- 590
 
9.1%
2 394
 
6.1%
3 328
 
5.1%
4 296
 
4.6%
0 253
 
3.9%
5 248
 
3.8%
6 244
 
3.8%
7 202
 
3.1%
Other values (18) 400
 
6.2%
Hangul
ValueCountFrequency (%)
1435
18.4%
799
10.3%
783
10.1%
711
9.1%
706
9.1%
703
9.0%
697
9.0%
170
 
2.2%
152
 
2.0%
140
 
1.8%
Other values (184) 1487
19.1%

소매인구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
일반소매인
577 
구내소매인
120 
<NA>
 
9

Length

Max length5
Median length5
Mean length4.9872521
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반소매인
2nd row일반소매인
3rd row일반소매인
4th row일반소매인
5th row일반소매인

Common Values

ValueCountFrequency (%)
일반소매인 577
81.7%
구내소매인 120
 
17.0%
<NA> 9
 
1.3%

Length

2023-12-13T07:52:18.755847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:52:18.870619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반소매인 577
81.7%
구내소매인 120
 
17.0%
na 9
 
1.3%

지정일자
Date

MISSING 

Distinct602
Distinct (%)86.4%
Missing9
Missing (%)1.3%
Memory size5.6 KiB
Minimum1970-07-01 00:00:00
Maximum2023-02-17 00:00:00
2023-12-13T07:52:19.021112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:52:19.184196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

데이터기준일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2023-02-20
697 
<NA>
 
9

Length

Max length10
Median length10
Mean length9.9235127
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2023-02-20
2nd row2023-02-20
3rd row2023-02-20
4th row2023-02-20
5th row2023-02-20

Common Values

ValueCountFrequency (%)
2023-02-20 697
98.7%
<NA> 9
 
1.3%

Length

2023-12-13T07:52:19.346014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:52:19.476707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2023-02-20 697
98.7%
na 9
 
1.3%

Interactions

2023-12-13T07:52:16.395972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:52:19.548296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번소매인구분
연번1.0000.101
소매인구분0.1011.000
2023-12-13T07:52:19.635764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
소매인구분데이터기준일자
소매인구분1.0001.000
데이터기준일자1.0001.000
2023-12-13T07:52:20.018659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번소매인구분데이터기준일자
연번1.0000.0771.000
소매인구분0.0771.0001.000
데이터기준일자1.0001.0001.000

Missing values

2023-12-13T07:52:16.532451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:52:16.660081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T07:52:16.795006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번업소명업소주소소매인구분지정일자데이터기준일자
01지에스25 가오중앙점대전광역시 동구 가오동 654 굿모닝타운일반소매인2023-02-172023-02-20
12세븐일레븐 대전용전원룸점대전광역시 동구 용전동 125-23일반소매인2023-02-072023-02-20
23이마트24 R 대전용전한빛점대전광역시 동구 용전동 26-1일반소매인2023-02-062023-02-20
34지에스25 복합터미널 2호점대전광역시 동구 용전동 63-3 대전복합터미널(서관)일반소매인2023-02-022023-02-20
45지에스25 대전한남대점대전광역시 동구 홍도동 78-11일반소매인2023-01-272023-02-20
56씨스페이스 대전용운점대전광역시 동구 용운동 317-11일반소매인2023-01-182023-02-20
67전자담배 애드대전광역시 동구 용전동 117-1일반소매인2023-01-122023-02-20
78대전동구청직장상조회대전광역시 동구 가오동 425 동구청구내소매인2023-01-102023-02-20
89터미널전자담배대전광역시 동구 용전동 63-3 대전복합터미널(서관)구내소매인2023-01-062023-02-20
910세븐일레븐 대전용전해피점대전광역시 동구 용전동 177-7 용전빌라일반소매인2023-01-052023-02-20
연번업소명업소주소소매인구분지정일자데이터기준일자
696697북권판매소대전광역시 동구 대1동 152-5일반소매인1970-07-012023-02-20
697<NA><NA><NA><NA><NA><NA>
698<NA><NA><NA><NA><NA><NA>
699<NA><NA><NA><NA><NA><NA>
700<NA><NA><NA><NA><NA><NA>
701<NA><NA><NA><NA><NA><NA>
702<NA><NA><NA><NA><NA><NA>
703<NA><NA><NA><NA><NA><NA>
704<NA><NA><NA><NA><NA><NA>
705<NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번업소명업소주소소매인구분지정일자데이터기준일자# duplicates
0<NA><NA><NA><NA><NA><NA>9