Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells1682
Missing cells (%)3.4%
Duplicate rows4
Duplicate rows (%)< 0.1%
Total size in memory468.8 KiB
Average record size in memory48.0 B

Variable types

Text4
Categorical1

Dataset

Description소상공인시장진흥공단에서 제공하는 개업일이 2017년 6월 1일~2017년 10월 16일인 소상공인의 상호명, 주소 및 업종분류에 대한 데이터입니다.
Author소상공인시장진흥공단
URLhttps://www.data.go.kr/data/15069600/fileData.do

Alerts

Dataset has 4 (< 0.1%) duplicate rowsDuplicates
업종중분류 has 841 (8.4%) missing valuesMissing
업종소분류 has 841 (8.4%) missing valuesMissing

Reproduction

Analysis started2023-12-12 20:03:15.621292
Analysis finished2023-12-12 20:03:16.976991
Duration1.36 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct9196
Distinct (%)92.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T05:03:17.118073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length23
Mean length5.9465
Min length1

Characters and Unicode

Total characters59465
Distinct characters1061
Distinct categories14 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8819 ?
Unique (%)88.2%

Sample

1st row담소소사골순대문정점
2nd row카페350
3rd row코냐뷰티
4th row대성정육
5th row연암사
ValueCountFrequency (%)
gs25 91
 
0.9%
cu 78
 
0.8%
세븐일레븐 34
 
0.3%
유플러스스퀘어 26
 
0.3%
위드미 15
 
0.1%
이마트 14
 
0.1%
마켓인 10
 
0.1%
storyway 10
 
0.1%
미니스톱 8
 
0.1%
참다한홍삼 8
 
0.1%
Other values (9189) 9713
97.1%
2023-12-13T05:03:17.475685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1446
 
2.4%
1444
 
2.4%
1328
 
2.2%
1019
 
1.7%
704
 
1.2%
586
 
1.0%
565
 
1.0%
549
 
0.9%
541
 
0.9%
534
 
0.9%
Other values (1051) 50749
85.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 54640
91.9%
Uppercase Letter 2631
 
4.4%
Decimal Number 1190
 
2.0%
Lowercase Letter 812
 
1.4%
Other Punctuation 134
 
0.2%
Dash Punctuation 18
 
< 0.1%
Space Separator 16
 
< 0.1%
Other Symbol 10
 
< 0.1%
Math Symbol 7
 
< 0.1%
Close Punctuation 2
 
< 0.1%
Other values (4) 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1446
 
2.6%
1444
 
2.6%
1328
 
2.4%
1019
 
1.9%
704
 
1.3%
586
 
1.1%
565
 
1.0%
549
 
1.0%
541
 
1.0%
534
 
1.0%
Other values (972) 45924
84.0%
Uppercase Letter
ValueCountFrequency (%)
S 346
13.2%
C 317
12.0%
G 263
 
10.0%
U 191
 
7.3%
B 144
 
5.5%
A 140
 
5.3%
O 138
 
5.2%
E 121
 
4.6%
T 111
 
4.2%
P 85
 
3.2%
Other values (16) 775
29.5%
Lowercase Letter
ValueCountFrequency (%)
e 117
14.4%
a 80
 
9.9%
o 65
 
8.0%
i 60
 
7.4%
n 56
 
6.9%
r 55
 
6.8%
l 47
 
5.8%
h 39
 
4.8%
s 36
 
4.4%
t 35
 
4.3%
Other values (15) 222
27.3%
Decimal Number
ValueCountFrequency (%)
2 316
26.6%
5 260
21.8%
1 161
13.5%
0 99
 
8.3%
3 97
 
8.2%
4 81
 
6.8%
9 56
 
4.7%
7 45
 
3.8%
6 41
 
3.4%
8 34
 
2.9%
Other Punctuation
ValueCountFrequency (%)
& 82
61.2%
. 45
33.6%
/ 3
 
2.2%
: 3
 
2.2%
· 1
 
0.7%
Math Symbol
ValueCountFrequency (%)
+ 4
57.1%
> 1
 
14.3%
< 1
 
14.3%
= 1
 
14.3%
Other Symbol
ValueCountFrequency (%)
8
80.0%
2
 
20.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%
Space Separator
ValueCountFrequency (%)
16
100.0%
Close Punctuation
ValueCountFrequency (%)
] 2
100.0%
Open Punctuation
ValueCountFrequency (%)
[ 2
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 54642
91.9%
Latin 3444
 
5.8%
Common 1373
 
2.3%
Han 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1446
 
2.6%
1444
 
2.6%
1328
 
2.4%
1019
 
1.9%
704
 
1.3%
586
 
1.1%
565
 
1.0%
549
 
1.0%
541
 
1.0%
534
 
1.0%
Other values (969) 45926
84.0%
Latin
ValueCountFrequency (%)
S 346
 
10.0%
C 317
 
9.2%
G 263
 
7.6%
U 191
 
5.5%
B 144
 
4.2%
A 140
 
4.1%
O 138
 
4.0%
E 121
 
3.5%
e 117
 
3.4%
T 111
 
3.2%
Other values (42) 1556
45.2%
Common
ValueCountFrequency (%)
2 316
23.0%
5 260
18.9%
1 161
11.7%
0 99
 
7.2%
3 97
 
7.1%
& 82
 
6.0%
4 81
 
5.9%
9 56
 
4.1%
7 45
 
3.3%
. 45
 
3.3%
Other values (16) 131
9.5%
Han
ValueCountFrequency (%)
3
50.0%
1
 
16.7%
1
 
16.7%
1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 54634
91.9%
ASCII 4813
 
8.1%
None 9
 
< 0.1%
CJK 6
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1446
 
2.6%
1444
 
2.6%
1328
 
2.4%
1019
 
1.9%
704
 
1.3%
586
 
1.1%
565
 
1.0%
549
 
1.0%
541
 
1.0%
534
 
1.0%
Other values (968) 45918
84.0%
ASCII
ValueCountFrequency (%)
S 346
 
7.2%
C 317
 
6.6%
2 316
 
6.6%
G 263
 
5.5%
5 260
 
5.4%
U 191
 
4.0%
1 161
 
3.3%
B 144
 
3.0%
A 140
 
2.9%
O 138
 
2.9%
Other values (65) 2537
52.7%
None
ValueCountFrequency (%)
8
88.9%
· 1
 
11.1%
CJK
ValueCountFrequency (%)
3
50.0%
1
 
16.7%
1
 
16.7%
1
 
16.7%
Letterlike Symbols
ValueCountFrequency (%)
2
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

주소
Text

Distinct7902
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T05:03:17.781927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length23
Mean length19.2025
Min length3

Characters and Unicode

Total characters192025
Distinct characters212
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7002 ?
Unique (%)70.0%

Sample

1st row서울특별시 송파구 문정동 641-4
2nd row서울특별시 성북구 성북동 350
3rd row서울특별시 은평구 구산동 363-2
4th row서울특별시 강서구 화곡동 837-1
5th row서울특별시 중랑구 망우동 450-3
ValueCountFrequency (%)
서울특별시 9999
25.0%
강남구 1071
 
2.7%
송파구 692
 
1.7%
마포구 616
 
1.5%
영등포구 555
 
1.4%
동대문구 552
 
1.4%
강서구 543
 
1.4%
서초구 542
 
1.4%
강동구 522
 
1.3%
성동구 484
 
1.2%
Other values (7104) 24421
61.1%
2023-12-13T05:03:18.236288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
29997
15.6%
11793
 
6.1%
11693
 
6.1%
10628
 
5.5%
10098
 
5.3%
9999
 
5.2%
9999
 
5.2%
9999
 
5.2%
1 8201
 
4.3%
- 7977
 
4.2%
Other values (202) 71641
37.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 112398
58.5%
Decimal Number 41653
 
21.7%
Space Separator 29997
 
15.6%
Dash Punctuation 7977
 
4.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11793
 
10.5%
11693
 
10.4%
10628
 
9.5%
10098
 
9.0%
9999
 
8.9%
9999
 
8.9%
9999
 
8.9%
2412
 
2.1%
1454
 
1.3%
1243
 
1.1%
Other values (190) 33080
29.4%
Decimal Number
ValueCountFrequency (%)
1 8201
19.7%
2 5144
12.3%
3 4609
11.1%
4 4274
10.3%
5 3790
9.1%
6 3721
8.9%
7 3115
 
7.5%
9 3007
 
7.2%
8 2941
 
7.1%
0 2851
 
6.8%
Space Separator
ValueCountFrequency (%)
29997
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7977
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 112398
58.5%
Common 79627
41.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11793
 
10.5%
11693
 
10.4%
10628
 
9.5%
10098
 
9.0%
9999
 
8.9%
9999
 
8.9%
9999
 
8.9%
2412
 
2.1%
1454
 
1.3%
1243
 
1.1%
Other values (190) 33080
29.4%
Common
ValueCountFrequency (%)
29997
37.7%
1 8201
 
10.3%
- 7977
 
10.0%
2 5144
 
6.5%
3 4609
 
5.8%
4 4274
 
5.4%
5 3790
 
4.8%
6 3721
 
4.7%
7 3115
 
3.9%
9 3007
 
3.8%
Other values (2) 5792
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 112398
58.5%
ASCII 79627
41.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29997
37.7%
1 8201
 
10.3%
- 7977
 
10.0%
2 5144
 
6.5%
3 4609
 
5.8%
4 4274
 
5.4%
5 3790
 
4.8%
6 3721
 
4.7%
7 3115
 
3.9%
9 3007
 
3.8%
Other values (2) 5792
 
7.3%
Hangul
ValueCountFrequency (%)
11793
 
10.5%
11693
 
10.4%
10628
 
9.5%
10098
 
9.0%
9999
 
8.9%
9999
 
8.9%
9999
 
8.9%
2412
 
2.1%
1454
 
1.3%
1243
 
1.1%
Other values (190) 33080
29.4%

업종대분류
Categorical

Distinct21
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
음식
4241 
소매
2311 
생활서비스
921 
<NA>
840 
제조
 
400
Other values (16)
1287 

Length

Max length20
Median length2
Mean length2.8084
Min length2

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row음식
2nd row음식
3rd row<NA>
4th row소매
5th row문화/예술/종교

Common Values

ValueCountFrequency (%)
음식 4241
42.4%
소매 2311
23.1%
생활서비스 921
 
9.2%
<NA> 840
 
8.4%
제조 400
 
4.0%
의료 378
 
3.8%
문화/예술/종교 245
 
2.5%
학문/교육 219
 
2.2%
부동산 160
 
1.6%
도매/유통/무역 108
 
1.1%
Other values (11) 177
 
1.8%

Length

2023-12-13T05:03:18.426778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
음식 4241
42.4%
소매 2311
23.1%
생활서비스 921
 
9.2%
na 840
 
8.4%
제조 400
 
4.0%
의료 378
 
3.8%
문화/예술/종교 245
 
2.4%
학문/교육 219
 
2.2%
부동산 160
 
1.6%
도매/유통/무역 108
 
1.1%
Other values (14) 180
 
1.8%

업종중분류
Text

MISSING 

Distinct140
Distinct (%)1.5%
Missing841
Missing (%)8.4%
Memory size156.2 KiB
2023-12-13T05:03:18.754616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length11
Mean length5.0174692
Min length2

Characters and Unicode

Total characters45955
Distinct characters207
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)0.2%

Sample

1st row별식/퓨전요리
2nd row커피점/카페
3rd row음/식료품소매
4th row종교
5th row음/식료품소매
ValueCountFrequency (%)
한식 1434
15.7%
종합소매점 992
 
10.8%
커피점/카페 651
 
7.1%
이/미용/건강 568
 
6.2%
유흥주점 444
 
4.8%
음/식료품소매 397
 
4.3%
식품가공/제조 391
 
4.3%
분식 343
 
3.7%
일식/수산물 315
 
3.4%
병원 299
 
3.3%
Other values (130) 3325
36.3%
2023-12-13T05:03:19.221493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 4750
 
10.3%
3459
 
7.5%
2087
 
4.5%
1891
 
4.1%
1793
 
3.9%
1501
 
3.3%
1243
 
2.7%
1130
 
2.5%
994
 
2.2%
840
 
1.8%
Other values (197) 26267
57.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 40964
89.1%
Other Punctuation 4750
 
10.3%
Dash Punctuation 139
 
0.3%
Uppercase Letter 102
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3459
 
8.4%
2087
 
5.1%
1891
 
4.6%
1793
 
4.4%
1501
 
3.7%
1243
 
3.0%
1130
 
2.8%
994
 
2.4%
840
 
2.1%
787
 
1.9%
Other values (193) 25239
61.6%
Uppercase Letter
ValueCountFrequency (%)
P 51
50.0%
C 51
50.0%
Other Punctuation
ValueCountFrequency (%)
/ 4750
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 139
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 40964
89.1%
Common 4889
 
10.6%
Latin 102
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3459
 
8.4%
2087
 
5.1%
1891
 
4.6%
1793
 
4.4%
1501
 
3.7%
1243
 
3.0%
1130
 
2.8%
994
 
2.4%
840
 
2.1%
787
 
1.9%
Other values (193) 25239
61.6%
Common
ValueCountFrequency (%)
/ 4750
97.2%
- 139
 
2.8%
Latin
ValueCountFrequency (%)
P 51
50.0%
C 51
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 40964
89.1%
ASCII 4991
 
10.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 4750
95.2%
- 139
 
2.8%
P 51
 
1.0%
C 51
 
1.0%
Hangul
ValueCountFrequency (%)
3459
 
8.4%
2087
 
5.1%
1891
 
4.6%
1793
 
4.4%
1501
 
3.7%
1243
 
3.0%
1130
 
2.8%
994
 
2.4%
840
 
2.1%
787
 
1.9%
Other values (193) 25239
61.6%

업종소분류
Text

MISSING 

Distinct444
Distinct (%)4.8%
Missing841
Missing (%)8.4%
Memory size156.2 KiB
2023-12-13T05:03:19.534261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length14
Mean length6.2681515
Min length2

Characters and Unicode

Total characters57410
Distinct characters379
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique151 ?
Unique (%)1.6%

Sample

1st row순대전문점
2nd row커피전문점/카페/다방
3rd row정육점
4th row불교
5th row식료품점
ValueCountFrequency (%)
한식/백반/한정식 949
 
10.4%
커피전문점/카페/다방 644
 
7.0%
편의점 511
 
5.6%
종합식품제조 386
 
4.2%
여성미용실 306
 
3.3%
호프/맥주 271
 
3.0%
종합소매 232
 
2.5%
기독교 206
 
2.2%
갈비/삼겹살 188
 
2.1%
라면김밥분식 185
 
2.0%
Other values (434) 5281
57.7%
2023-12-13T05:03:20.041430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 5752
 
10.0%
3477
 
6.1%
2143
 
3.7%
1997
 
3.5%
1569
 
2.7%
1534
 
2.7%
1209
 
2.1%
1147
 
2.0%
1087
 
1.9%
952
 
1.7%
Other values (369) 36543
63.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 51138
89.1%
Other Punctuation 5754
 
10.0%
Dash Punctuation 380
 
0.7%
Uppercase Letter 92
 
0.2%
Close Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3477
 
6.8%
2143
 
4.2%
1997
 
3.9%
1569
 
3.1%
1534
 
3.0%
1209
 
2.4%
1147
 
2.2%
1087
 
2.1%
952
 
1.9%
892
 
1.7%
Other values (361) 35131
68.7%
Uppercase Letter
ValueCountFrequency (%)
C 46
50.0%
P 45
48.9%
D 1
 
1.1%
Other Punctuation
ValueCountFrequency (%)
/ 5752
> 99.9%
. 2
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 380
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 51138
89.1%
Common 6180
 
10.8%
Latin 92
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3477
 
6.8%
2143
 
4.2%
1997
 
3.9%
1569
 
3.1%
1534
 
3.0%
1209
 
2.4%
1147
 
2.2%
1087
 
2.1%
952
 
1.9%
892
 
1.7%
Other values (361) 35131
68.7%
Common
ValueCountFrequency (%)
/ 5752
93.1%
- 380
 
6.1%
) 23
 
0.4%
( 23
 
0.4%
. 2
 
< 0.1%
Latin
ValueCountFrequency (%)
C 46
50.0%
P 45
48.9%
D 1
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 51138
89.1%
ASCII 6272
 
10.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 5752
91.7%
- 380
 
6.1%
C 46
 
0.7%
P 45
 
0.7%
) 23
 
0.4%
( 23
 
0.4%
. 2
 
< 0.1%
D 1
 
< 0.1%
Hangul
ValueCountFrequency (%)
3477
 
6.8%
2143
 
4.2%
1997
 
3.9%
1569
 
3.1%
1534
 
3.0%
1209
 
2.4%
1147
 
2.2%
1087
 
2.1%
952
 
1.9%
892
 
1.7%
Other values (361) 35131
68.7%

Missing values

2023-12-13T05:03:16.725407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:03:16.817805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T05:03:16.912388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상호명주소업종대분류업종중분류업종소분류
7579담소소사골순대문정점서울특별시 송파구 문정동 641-4음식별식/퓨전요리순대전문점
1947카페350서울특별시 성북구 성북동 350음식커피점/카페커피전문점/카페/다방
21043코냐뷰티서울특별시 은평구 구산동 363-2<NA><NA><NA>
3659대성정육서울특별시 강서구 화곡동 837-1소매음/식료품소매정육점
1065연암사서울특별시 중랑구 망우동 450-3문화/예술/종교종교불교
7864대성상회서울특별시 동대문구 제기동 990소매음/식료품소매식료품점
14853라이언스위스키서울특별시 강남구 신사동 656-2음식양식정통양식/경양식
14649서진디지털서울특별시 서초구 서초동 1445-3소매종합소매점시장/종합상가
3071홍이채함흥냉면전문점서울특별시 마포구 아현동 330-16음식한식냉면집
8682곤솔시아식품서울특별시 성동구 마장동 480-13소매음/식료품소매육류소매
상호명주소업종대분류업종중분류업종소분류
11129메밀촌서울특별시 은평구 응암동 89-18음식분식국수/만두/칼국수
20943분짜라붐서울특별시 강남구 청담동 125-16<NA><NA><NA>
19662토끼정서울특별시 강서구 마곡동 774-12음식일식/수산물음식점-일식
8520웰빙식품서울특별시 금천구 시흥동 873-26소매종합소매점종합소매
1407얼담육개장이야기서울특별시 강남구 대치동 889-72음식한식한식/백반/한정식
17393황금룡서울특별시 강남구 역삼동 669-14음식중식중국음식/중국집
4952인천젓갈서울특별시 강서구 화곡동 370-44소매음/식료품소매식료품점
18783짜로가든서울특별시 송파구 가락동 600음식한식한식/백반/한정식
3241쥬뜨서울특별시 송파구 오금동 40-9음식커피점/카페커피전문점/카페/다방
18453리치서울특별시 중구 충무로3가 56-5음식유흥주점룸살롱/단란주점

Duplicate rows

Most frequently occurring

상호명주소업종대분류업종중분류업종소분류# duplicates
0STORYWAY서울특별시 용산구 한강로3가 40-999소매종합소매점편의점2
1THEKIM서울특별시 은평구 응암동 585-45음식한식한식/백반/한정식2
2올리브영양천향교역점서울특별시 강서구 마곡동 776-3소매화장품소매화장품판매점2
3토라식당서울특별시 성동구 성수동1가 14-47음식일식/수산물음식점-일식2