Overview

Dataset statistics

Number of variables4
Number of observations633
Missing cells8
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.5 KiB
Average record size in memory33.2 B

Variable types

Numeric1
Categorical1
Text2

Dataset

Description서대문구 담배소매인 정보를 업소명, 소재지 등에 따라 구분하여 데이터를 제공합니다. (기준일 2022.5.9.)
Author서울특별시 서대문구
URLhttps://www.data.go.kr/data/15100207/fileData.do

Alerts

연번 is highly overall correlated with 민원구분High correlation
민원구분 is highly overall correlated with 연번High correlation
업소명 has 8 (1.3%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:38:02.699567
Analysis finished2023-12-12 23:38:03.516559
Duration0.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct633
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317
Minimum1
Maximum633
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2023-12-13T08:38:03.577439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile32.6
Q1159
median317
Q3475
95-th percentile601.4
Maximum633
Range632
Interquartile range (IQR)316

Descriptive statistics

Standard deviation182.87564
Coefficient of variation (CV)0.57689477
Kurtosis-1.2
Mean317
Median Absolute Deviation (MAD)158
Skewness0
Sum200661
Variance33443.5
MonotonicityStrictly increasing
2023-12-13T08:38:03.691658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
426 1
 
0.2%
419 1
 
0.2%
420 1
 
0.2%
421 1
 
0.2%
422 1
 
0.2%
423 1
 
0.2%
424 1
 
0.2%
425 1
 
0.2%
427 1
 
0.2%
Other values (623) 623
98.4%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
633 1
0.2%
632 1
0.2%
631 1
0.2%
630 1
0.2%
629 1
0.2%
628 1
0.2%
627 1
0.2%
626 1
0.2%
625 1
0.2%
624 1
0.2%

민원구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
제7조의3제2항에따른경우
357 
237 
제7조의3제3항에따른경우
39 

Length

Max length13
Median length13
Mean length8.507109
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row제7조의3제2항에따른경우
2nd row제7조의3제2항에따른경우
3rd row제7조의3제2항에따른경우
4th row제7조의3제2항에따른경우
5th row제7조의3제2항에따른경우

Common Values

ValueCountFrequency (%)
제7조의3제2항에따른경우 357
56.4%
237
37.4%
제7조의3제3항에따른경우 39
 
6.2%

Length

2023-12-13T08:38:03.798721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:38:03.879727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
제7조의3제2항에따른경우 357
90.2%
제7조의3제3항에따른경우 39
 
9.8%

업소명
Text

MISSING 

Distinct607
Distinct (%)97.1%
Missing8
Missing (%)1.3%
Memory size5.1 KiB
2023-12-13T08:38:04.089445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length19
Mean length8.0592
Min length1

Characters and Unicode

Total characters5037
Distinct characters411
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique596 ?
Unique (%)95.4%

Sample

1st row씨유 이대정문점
2nd row유정상회
3rd row씨유 신촌힐스테이트점
4th row지에스(GS)25 연희성원점
5th row이대퀸즈부동산중개
ValueCountFrequency (%)
씨유 58
 
6.7%
gs25 29
 
3.3%
세븐일레븐 24
 
2.8%
이마트24 21
 
2.4%
주)코리아세븐 16
 
1.8%
지에스25 11
 
1.3%
미니스톱 7
 
0.8%
주식회사 5
 
0.6%
북가좌점 4
 
0.5%
지에스(gs)25 4
 
0.5%
Other values (642) 690
79.4%
2023-12-13T08:38:04.421709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
309
 
6.1%
258
 
5.1%
2 121
 
2.4%
121
 
2.4%
99
 
2.0%
99
 
2.0%
5 92
 
1.8%
87
 
1.7%
86
 
1.7%
85
 
1.7%
Other values (401) 3680
73.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4055
80.5%
Uppercase Letter 285
 
5.7%
Decimal Number 263
 
5.2%
Space Separator 258
 
5.1%
Open Punctuation 69
 
1.4%
Close Punctuation 69
 
1.4%
Lowercase Letter 37
 
0.7%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
309
 
7.6%
121
 
3.0%
99
 
2.4%
99
 
2.4%
87
 
2.1%
86
 
2.1%
85
 
2.1%
76
 
1.9%
71
 
1.8%
70
 
1.7%
Other values (350) 2952
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 77
27.0%
G 76
26.7%
C 42
14.7%
U 39
13.7%
M 6
 
2.1%
E 6
 
2.1%
R 6
 
2.1%
K 5
 
1.8%
N 4
 
1.4%
L 4
 
1.4%
Other values (10) 20
 
7.0%
Lowercase Letter
ValueCountFrequency (%)
a 5
13.5%
e 4
10.8%
r 4
10.8%
u 3
8.1%
o 3
8.1%
t 3
8.1%
c 2
 
5.4%
k 2
 
5.4%
s 2
 
5.4%
y 2
 
5.4%
Other values (7) 7
18.9%
Decimal Number
ValueCountFrequency (%)
2 121
46.0%
5 92
35.0%
4 25
 
9.5%
1 11
 
4.2%
3 4
 
1.5%
8 3
 
1.1%
9 3
 
1.1%
6 2
 
0.8%
7 1
 
0.4%
0 1
 
0.4%
Space Separator
ValueCountFrequency (%)
258
100.0%
Open Punctuation
ValueCountFrequency (%)
( 69
100.0%
Close Punctuation
ValueCountFrequency (%)
) 69
100.0%
Other Punctuation
ValueCountFrequency (%)
' 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4055
80.5%
Common 660
 
13.1%
Latin 322
 
6.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
309
 
7.6%
121
 
3.0%
99
 
2.4%
99
 
2.4%
87
 
2.1%
86
 
2.1%
85
 
2.1%
76
 
1.9%
71
 
1.8%
70
 
1.7%
Other values (350) 2952
72.8%
Latin
ValueCountFrequency (%)
S 77
23.9%
G 76
23.6%
C 42
13.0%
U 39
12.1%
M 6
 
1.9%
E 6
 
1.9%
R 6
 
1.9%
K 5
 
1.6%
a 5
 
1.6%
e 4
 
1.2%
Other values (27) 56
17.4%
Common
ValueCountFrequency (%)
258
39.1%
2 121
18.3%
5 92
 
13.9%
( 69
 
10.5%
) 69
 
10.5%
4 25
 
3.8%
1 11
 
1.7%
3 4
 
0.6%
8 3
 
0.5%
9 3
 
0.5%
Other values (4) 5
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4055
80.5%
ASCII 982
 
19.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
309
 
7.6%
121
 
3.0%
99
 
2.4%
99
 
2.4%
87
 
2.1%
86
 
2.1%
85
 
2.1%
76
 
1.9%
71
 
1.8%
70
 
1.7%
Other values (350) 2952
72.8%
ASCII
ValueCountFrequency (%)
258
26.3%
2 121
12.3%
5 92
 
9.4%
S 77
 
7.8%
G 76
 
7.7%
( 69
 
7.0%
) 69
 
7.0%
C 42
 
4.3%
U 39
 
4.0%
4 25
 
2.5%
Other values (41) 114
11.6%
Distinct581
Distinct (%)91.8%
Missing0
Missing (%)0.0%
Memory size5.1 KiB
2023-12-13T08:38:04.691401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length74
Median length55
Mean length30.116904
Min length1

Characters and Unicode

Total characters19064
Distinct characters288
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique576 ?
Unique (%)91.0%

Sample

1st row서울특별시 서대문구 이화여대길 48 (대현동)
2nd row서울특별시 서대문구 북아현로4라길 24. 지1층 (북아현동)
3rd row서울특별시 서대문구 이화여대8길 123. 107.108호 (북아현동. 힐스테이트 신촌)
4th row서울특별시 서대문구 연희로32길 48. 2층 203호 (연희동. 연희동성원아파트)
5th row서울특별시 서대문구 이화여대길 29. B층 1칸 (대현동)
ValueCountFrequency (%)
서울특별시 584
 
16.2%
서대문구 584
 
16.2%
1층 194
 
5.4%
홍은동 75
 
2.1%
홍제동 71
 
2.0%
북가좌동 63
 
1.8%
남가좌동 62
 
1.7%
연희동 59
 
1.6%
통일로 49
 
1.4%
창천동 46
 
1.3%
Other values (823) 1807
50.3%
2023-12-13T08:38:05.096254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3181
 
16.7%
1202
 
6.3%
1 847
 
4.4%
715
 
3.8%
634
 
3.3%
624
 
3.3%
601
 
3.2%
600
 
3.1%
( 597
 
3.1%
) 597
 
3.1%
Other values (278) 9466
49.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 11365
59.6%
Space Separator 3181
 
16.7%
Decimal Number 2617
 
13.7%
Open Punctuation 597
 
3.1%
Close Punctuation 597
 
3.1%
Other Punctuation 525
 
2.8%
Dash Punctuation 94
 
0.5%
Uppercase Letter 71
 
0.4%
Lowercase Letter 12
 
0.1%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1202
 
10.6%
715
 
6.3%
634
 
5.6%
624
 
5.5%
601
 
5.3%
600
 
5.3%
593
 
5.2%
584
 
5.1%
584
 
5.1%
537
 
4.7%
Other values (246) 4691
41.3%
Decimal Number
ValueCountFrequency (%)
1 847
32.4%
2 357
13.6%
3 289
 
11.0%
0 271
 
10.4%
4 217
 
8.3%
5 164
 
6.3%
7 135
 
5.2%
8 124
 
4.7%
9 108
 
4.1%
6 105
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
B 20
28.2%
M 14
19.7%
C 13
18.3%
D 13
18.3%
A 5
 
7.0%
K 2
 
2.8%
S 1
 
1.4%
G 1
 
1.4%
T 1
 
1.4%
J 1
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
e 8
66.7%
m 1
 
8.3%
a 1
 
8.3%
t 1
 
8.3%
r 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
. 523
99.6%
& 2
 
0.4%
Space Separator
ValueCountFrequency (%)
3181
100.0%
Open Punctuation
ValueCountFrequency (%)
( 597
100.0%
Close Punctuation
ValueCountFrequency (%)
) 597
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 94
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 11365
59.6%
Common 7616
39.9%
Latin 83
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1202
 
10.6%
715
 
6.3%
634
 
5.6%
624
 
5.5%
601
 
5.3%
600
 
5.3%
593
 
5.2%
584
 
5.1%
584
 
5.1%
537
 
4.7%
Other values (246) 4691
41.3%
Common
ValueCountFrequency (%)
3181
41.8%
1 847
 
11.1%
( 597
 
7.8%
) 597
 
7.8%
. 523
 
6.9%
2 357
 
4.7%
3 289
 
3.8%
0 271
 
3.6%
4 217
 
2.8%
5 164
 
2.2%
Other values (7) 573
 
7.5%
Latin
ValueCountFrequency (%)
B 20
24.1%
M 14
16.9%
C 13
15.7%
D 13
15.7%
e 8
 
9.6%
A 5
 
6.0%
K 2
 
2.4%
S 1
 
1.2%
G 1
 
1.2%
T 1
 
1.2%
Other values (5) 5
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 11365
59.6%
ASCII 7699
40.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3181
41.3%
1 847
 
11.0%
( 597
 
7.8%
) 597
 
7.8%
. 523
 
6.8%
2 357
 
4.6%
3 289
 
3.8%
0 271
 
3.5%
4 217
 
2.8%
5 164
 
2.1%
Other values (22) 656
 
8.5%
Hangul
ValueCountFrequency (%)
1202
 
10.6%
715
 
6.3%
634
 
5.6%
624
 
5.5%
601
 
5.3%
600
 
5.3%
593
 
5.2%
584
 
5.1%
584
 
5.1%
537
 
4.7%
Other values (246) 4691
41.3%

Interactions

2023-12-13T08:38:03.063591image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:38:05.169931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번민원구분
연번1.0000.778
민원구분0.7781.000
2023-12-13T08:38:05.233040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번민원구분
연번1.0000.655
민원구분0.6551.000

Missing values

2023-12-13T08:38:03.162167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:38:03.488699image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번민원구분업소명소재지
01제7조의3제2항에따른경우씨유 이대정문점서울특별시 서대문구 이화여대길 48 (대현동)
12제7조의3제2항에따른경우유정상회서울특별시 서대문구 북아현로4라길 24. 지1층 (북아현동)
23제7조의3제2항에따른경우씨유 신촌힐스테이트점서울특별시 서대문구 이화여대8길 123. 107.108호 (북아현동. 힐스테이트 신촌)
34제7조의3제2항에따른경우지에스(GS)25 연희성원점서울특별시 서대문구 연희로32길 48. 2층 203호 (연희동. 연희동성원아파트)
45제7조의3제2항에따른경우이대퀸즈부동산중개서울특별시 서대문구 이화여대길 29. B층 1칸 (대현동)
56제7조의3제2항에따른경우세븐일레븐 홍제해링턴점서울특별시 서대문구 세무서8길 30. 1층 11호 (홍제동. 홍제역해링턴플레이스)
67제7조의3제2항에따른경우지에스25 홍은포레스트점서울특별시 서대문구 가좌로4길 9. 1층 101호 (홍은동. 동천빌라)
78제7조의3제2항에따른경우씨유 서대문하나점서울특별시 서대문구 증가로 197. 1층 (북가좌동)
89제7조의3제2항에따른경우씨유 연희본점서울특별시 서대문구 연희로25길 45. B1층 (연희동)
910제7조의3제2항에따른경우주식회사 페르소나서울특별시 서대문구 충정로 35. 1층 (충정로3가)
연번민원구분업소명소재지
623624옥천상회서울특별시 서대문구 홍지문2길 31 (홍은동)
624625한창슈퍼서울특별시 서대문구 증가로32안길 35 (북가좌동)
625626CU연대봉원점서울특별시 서대문구 봉원사길 12 (대신동)
626627크린토피아서울특별시 서대문구 경기대로9길 87 (충정로3가)
627628(주)지에스리테일 북가좌점서울특별시 서대문구 응암로 54. 지하 1층 (북가좌동. 서부프라자빌딩)
628629제일슈퍼서울특별시 서대문구 응암로 79. 문화빌딩 1층 5호 (북가좌동)
629630총각상회서울특별시 서대문구 모래내로13길 25-6 (남가좌동)
630631제7조의3제2항에따른경우씨유 창천점서울특별시 서대문구 연세로5나길 30-4 (창천동)
631632서강슈퍼서울특별시 서대문구 연희로41길 152 (홍은동. 서강아파트2차)
632633서대문구청서울특별시 서대문구 연희로 248. 서대문구청 지하1층 (연희동)