Overview

Dataset statistics

Number of variables5
Number of observations651
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory26.2 KiB
Average record size in memory41.2 B

Variable types

Numeric1
Categorical1
Text2
DateTime1

Dataset

Description청주시 청원구에 소재한 담배소매인 지정 현황에 대한 데이터입니다. (업소명, 도로명주소, 지정일자가 포함되어 있습니다.)
URLhttps://www.data.go.kr/data/15048887/fileData.do

Alerts

민원구분 is highly imbalanced (54.6%)Imbalance
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 21:41:54.361852
Analysis finished2023-12-12 21:41:55.112340
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

UNIQUE 

Distinct651
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean326
Minimum1
Maximum651
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2023-12-13T06:41:55.183209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile33.5
Q1163.5
median326
Q3488.5
95-th percentile618.5
Maximum651
Range650
Interquartile range (IQR)325

Descriptive statistics

Standard deviation188.07179
Coefficient of variation (CV)0.57690735
Kurtosis-1.2
Mean326
Median Absolute Deviation (MAD)163
Skewness0
Sum212226
Variance35371
MonotonicityStrictly increasing
2023-12-13T06:41:55.319515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
449 1
 
0.2%
431 1
 
0.2%
432 1
 
0.2%
433 1
 
0.2%
434 1
 
0.2%
435 1
 
0.2%
436 1
 
0.2%
437 1
 
0.2%
438 1
 
0.2%
Other values (641) 641
98.5%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
651 1
0.2%
650 1
0.2%
649 1
0.2%
648 1
0.2%
647 1
0.2%
646 1
0.2%
645 1
0.2%
644 1
0.2%
643 1
0.2%
642 1
0.2%

민원구분
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
일반소매인
589 
구내소매인
62 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반소매인
2nd row일반소매인
3rd row일반소매인
4th row일반소매인
5th row일반소매인

Common Values

ValueCountFrequency (%)
일반소매인 589
90.5%
구내소매인 62
 
9.5%

Length

2023-12-13T06:41:55.479267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:41:55.584388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반소매인 589
90.5%
구내소매인 62
 
9.5%
Distinct645
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-13T06:41:55.883702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length18
Mean length8.3794163
Min length2

Characters and Unicode

Total characters5455
Distinct characters427
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique639 ?
Unique (%)98.2%

Sample

1st row대원마트
2nd row한양전자담배 오창점
3rd row씨유 청주대기숙사점
4th row세븐일레븐 오창샛별점
5th row등대지기 컨설팅
ValueCountFrequency (%)
씨유 60
 
6.5%
세븐일레븐 49
 
5.3%
gs25 22
 
2.4%
이마트24 22
 
2.4%
주)코리아세븐 10
 
1.1%
주식회사 8
 
0.9%
지에스25 8
 
0.9%
미니스톱 7
 
0.8%
cu 6
 
0.6%
오창점 4
 
0.4%
Other values (694) 733
78.9%
2023-12-13T06:41:56.281055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
315
 
5.8%
279
 
5.1%
215
 
3.9%
183
 
3.4%
132
 
2.4%
130
 
2.4%
119
 
2.2%
118
 
2.2%
115
 
2.1%
2 107
 
2.0%
Other values (417) 3742
68.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4552
83.4%
Space Separator 279
 
5.1%
Decimal Number 247
 
4.5%
Uppercase Letter 222
 
4.1%
Open Punctuation 71
 
1.3%
Close Punctuation 71
 
1.3%
Other Punctuation 12
 
0.2%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
315
 
6.9%
215
 
4.7%
183
 
4.0%
132
 
2.9%
130
 
2.9%
119
 
2.6%
118
 
2.6%
115
 
2.5%
105
 
2.3%
93
 
2.0%
Other values (381) 3027
66.5%
Uppercase Letter
ValueCountFrequency (%)
S 62
27.9%
G 58
26.1%
C 28
12.6%
U 18
 
8.1%
I 7
 
3.2%
A 6
 
2.7%
D 5
 
2.3%
L 5
 
2.3%
R 5
 
2.3%
K 4
 
1.8%
Other values (9) 24
 
10.8%
Decimal Number
ValueCountFrequency (%)
2 107
43.3%
5 71
28.7%
4 36
 
14.6%
1 12
 
4.9%
0 7
 
2.8%
9 5
 
2.0%
3 4
 
1.6%
7 3
 
1.2%
8 1
 
0.4%
6 1
 
0.4%
Other Punctuation
ValueCountFrequency (%)
. 6
50.0%
& 5
41.7%
/ 1
 
8.3%
Space Separator
ValueCountFrequency (%)
279
100.0%
Open Punctuation
ValueCountFrequency (%)
( 71
100.0%
Close Punctuation
ValueCountFrequency (%)
) 71
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4552
83.4%
Common 681
 
12.5%
Latin 222
 
4.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
315
 
6.9%
215
 
4.7%
183
 
4.0%
132
 
2.9%
130
 
2.9%
119
 
2.6%
118
 
2.6%
115
 
2.5%
105
 
2.3%
93
 
2.0%
Other values (381) 3027
66.5%
Latin
ValueCountFrequency (%)
S 62
27.9%
G 58
26.1%
C 28
12.6%
U 18
 
8.1%
I 7
 
3.2%
A 6
 
2.7%
D 5
 
2.3%
L 5
 
2.3%
R 5
 
2.3%
K 4
 
1.8%
Other values (9) 24
 
10.8%
Common
ValueCountFrequency (%)
279
41.0%
2 107
 
15.7%
( 71
 
10.4%
5 71
 
10.4%
) 71
 
10.4%
4 36
 
5.3%
1 12
 
1.8%
0 7
 
1.0%
. 6
 
0.9%
& 5
 
0.7%
Other values (7) 16
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4552
83.4%
ASCII 903
 
16.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
315
 
6.9%
215
 
4.7%
183
 
4.0%
132
 
2.9%
130
 
2.9%
119
 
2.6%
118
 
2.6%
115
 
2.5%
105
 
2.3%
93
 
2.0%
Other values (381) 3027
66.5%
ASCII
ValueCountFrequency (%)
279
30.9%
2 107
 
11.8%
( 71
 
7.9%
5 71
 
7.9%
) 71
 
7.9%
S 62
 
6.9%
G 58
 
6.4%
4 36
 
4.0%
C 28
 
3.1%
U 18
 
2.0%
Other values (26) 102
 
11.3%
Distinct643
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
2023-12-13T06:41:56.598134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length57
Median length51
Mean length30.961598
Min length22

Characters and Unicode

Total characters20156
Distinct characters284
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique639 ?
Unique (%)98.2%

Sample

1st row충청북도 청주시 청원구 율중로 10. 상가동 1층 106호 (율량동. 대원칸타빌4차 아파트)
2nd row충청북도 청주시 청원구 오창읍 중심상업1로 20. 1층 125호
3rd row충청북도 청주시 청원구 안덕벌로19번길 116. 우암마을 (내덕동)
4th row충청북도 청주시 청원구 오창읍 양청택지로 76. 풀하우스
5th row충청북도 청주시 청원구 직지대로872번길 5. 1층 (우암동)
ValueCountFrequency (%)
충청북도 652
 
15.0%
청주시 651
 
15.0%
청원구 651
 
15.0%
오창읍 214
 
4.9%
1층 105
 
2.4%
내덕동 103
 
2.4%
율량동 99
 
2.3%
내수읍 68
 
1.6%
우암동 66
 
1.5%
사천동 30
 
0.7%
Other values (824) 1714
39.4%
2023-12-13T06:41:57.095938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3777
18.7%
2052
 
10.2%
1 807
 
4.0%
741
 
3.7%
692
 
3.4%
685
 
3.4%
684
 
3.4%
677
 
3.4%
661
 
3.3%
657
 
3.3%
Other values (274) 8723
43.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 12308
61.1%
Space Separator 3777
 
18.7%
Decimal Number 2821
 
14.0%
Open Punctuation 394
 
2.0%
Close Punctuation 394
 
2.0%
Other Punctuation 288
 
1.4%
Dash Punctuation 145
 
0.7%
Uppercase Letter 17
 
0.1%
Math Symbol 10
 
< 0.1%
Lowercase Letter 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2052
16.7%
741
 
6.0%
692
 
5.6%
685
 
5.6%
684
 
5.6%
677
 
5.5%
661
 
5.4%
657
 
5.3%
541
 
4.4%
398
 
3.2%
Other values (248) 4520
36.7%
Decimal Number
ValueCountFrequency (%)
1 807
28.6%
2 415
14.7%
0 269
 
9.5%
3 268
 
9.5%
4 214
 
7.6%
6 186
 
6.6%
7 182
 
6.5%
5 169
 
6.0%
8 162
 
5.7%
9 149
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
B 5
29.4%
L 4
23.5%
G 3
17.6%
A 2
 
11.8%
M 1
 
5.9%
S 1
 
5.9%
H 1
 
5.9%
Other Punctuation
ValueCountFrequency (%)
. 287
99.7%
· 1
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
k 1
50.0%
s 1
50.0%
Space Separator
ValueCountFrequency (%)
3777
100.0%
Open Punctuation
ValueCountFrequency (%)
( 394
100.0%
Close Punctuation
ValueCountFrequency (%)
) 394
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 145
100.0%
Math Symbol
ValueCountFrequency (%)
~ 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 12308
61.1%
Common 7829
38.8%
Latin 19
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2052
16.7%
741
 
6.0%
692
 
5.6%
685
 
5.6%
684
 
5.6%
677
 
5.5%
661
 
5.4%
657
 
5.3%
541
 
4.4%
398
 
3.2%
Other values (248) 4520
36.7%
Common
ValueCountFrequency (%)
3777
48.2%
1 807
 
10.3%
2 415
 
5.3%
( 394
 
5.0%
) 394
 
5.0%
. 287
 
3.7%
0 269
 
3.4%
3 268
 
3.4%
4 214
 
2.7%
6 186
 
2.4%
Other values (7) 818
 
10.4%
Latin
ValueCountFrequency (%)
B 5
26.3%
L 4
21.1%
G 3
15.8%
A 2
 
10.5%
k 1
 
5.3%
s 1
 
5.3%
M 1
 
5.3%
S 1
 
5.3%
H 1
 
5.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12308
61.1%
ASCII 7847
38.9%
None 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3777
48.1%
1 807
 
10.3%
2 415
 
5.3%
( 394
 
5.0%
) 394
 
5.0%
. 287
 
3.7%
0 269
 
3.4%
3 268
 
3.4%
4 214
 
2.7%
6 186
 
2.4%
Other values (15) 836
 
10.7%
Hangul
ValueCountFrequency (%)
2052
16.7%
741
 
6.0%
692
 
5.6%
685
 
5.6%
684
 
5.6%
677
 
5.5%
661
 
5.4%
657
 
5.3%
541
 
4.4%
398
 
3.2%
Other values (248) 4520
36.7%
None
ValueCountFrequency (%)
· 1
100.0%
Distinct544
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Memory size5.2 KiB
Minimum1981-05-15 00:00:00
Maximum2023-08-17 00:00:00
2023-12-13T06:41:57.539607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:41:57.685821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-13T06:41:54.800045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:41:57.763194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번민원구분
연번1.0000.293
민원구분0.2931.000
2023-12-13T06:41:57.845998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번민원구분
연번1.0000.223
민원구분0.2231.000

Missing values

2023-12-13T06:41:54.956782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:41:55.071776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번민원구분업소명업소도로명주소지정일자
01일반소매인대원마트충청북도 청주시 청원구 율중로 10. 상가동 1층 106호 (율량동. 대원칸타빌4차 아파트)2023-08-17
12일반소매인한양전자담배 오창점충청북도 청주시 청원구 오창읍 중심상업1로 20. 1층 125호2023-08-14
23일반소매인씨유 청주대기숙사점충청북도 청주시 청원구 안덕벌로19번길 116. 우암마을 (내덕동)2023-08-09
34일반소매인세븐일레븐 오창샛별점충청북도 청주시 청원구 오창읍 양청택지로 76. 풀하우스2023-08-08
45일반소매인등대지기 컨설팅충청북도 청주시 청원구 직지대로872번길 5. 1층 (우암동)2023-08-03
56일반소매인씨유 내수다움점충청북도 청주시 청원구 내수읍 내수로 707-1. 1층2023-08-03
67일반소매인남촌충청북도 청주시 청원구 오창읍 과학산업4로 8-10. 남촌식당 1층2023-07-18
78일반소매인세븐일레븐 청주뉴율량점충청북도 청주시 청원구 율량로 16. 1층 (주중동)2023-07-13
89일반소매인지에스25 율량파크충청북도 청주시 청원구 공항로138번길 27. 1층 (율량동)2023-07-05
910일반소매인홈마트충청북도 청주시 청원구 율봉로 131 (율량동)2023-07-03
연번민원구분업소명업소도로명주소지정일자
641642구내소매인복천탕충청북도 청주시 청원구 율천북로 7 (사천동)1995-02-08
642643일반소매인GS25율량푸른점충청북도 청주시 청원구 사뜸로76번길 46 (율량동)1994-12-22
643644일반소매인효성슈퍼충청북도 청주시 청원구 율봉로 214 (율량동)1993-02-13
644645일반소매인우암하이퍼충청북도 청주시 청원구 상당로244번길 7 (우암동)1990-06-15
645646일반소매인은성슈퍼충청북도 청주시 청원구 향군로160번길 2 (내덕동)1989-08-08
646647일반소매인충북연쇄점충청북도 청주시 청원구 새터로 39 (내덕동)1985-12-20
647648일반소매인미니슈퍼충청북도 청주시 청원구 직지대로 846 (우암동)1985-06-30
648649일반소매인서울세탁충청북도 청주시 청원구 새터로79번길 4-2 (내덕동)1984-12-31
649650일반소매인세븐일레븐 청주내덕우리점충청북도 청주시 청원구 우암로 65 (내덕동)1984-12-31
650651일반소매인건영종합상사충청북도 청주시 청원구 직지대로848번길 29 (우암동)1981-05-15