Overview

Dataset statistics

Number of variables3
Number of observations206
Missing cells1
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.0 KiB
Average record size in memory24.6 B

Variable types

Text2
Categorical1

Dataset

Description부산광역시금정구_출판업등록현황_20230307
Author부산광역시 금정구
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3055406

Alerts

업종 is highly imbalanced (69.9%)Imbalance
사업체명칭 has unique valuesUnique

Reproduction

Analysis started2023-12-10 17:21:52.758702
Analysis finished2023-12-10 17:21:53.906309
Duration1.15 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사업체명칭
Text

UNIQUE 

Distinct206
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
2023-12-11T02:21:54.404192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length15
Mean length6.5436893
Min length2

Characters and Unicode

Total characters1348
Distinct characters356
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique206 ?
Unique (%)100.0%

Sample

1st row사단법인 부산대학교 출판문화원
2nd row제일출판인쇄
3rd row만수출판사
4th row도서출판 늘함께
5th row월간불교세계출판부
ValueCountFrequency (%)
도서출판 18
 
6.6%
주식회사 10
 
3.7%
사단법인 2
 
0.7%
가을 2
 
0.7%
디자인 2
 
0.7%
출판부 2
 
0.7%
공책 1
 
0.4%
작은출판사 1
 
0.4%
한국사회적가치회복연구원 1
 
0.4%
헤로도토스 1
 
0.4%
Other values (232) 232
85.3%
2023-12-11T02:21:55.652795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
66
 
4.9%
48
 
3.6%
47
 
3.5%
47
 
3.5%
29
 
2.2%
28
 
2.1%
27
 
2.0%
23
 
1.7%
19
 
1.4%
19
 
1.4%
Other values (346) 995
73.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1112
82.5%
Lowercase Letter 81
 
6.0%
Space Separator 66
 
4.9%
Uppercase Letter 38
 
2.8%
Open Punctuation 15
 
1.1%
Close Punctuation 15
 
1.1%
Decimal Number 13
 
1.0%
Other Punctuation 6
 
0.4%
Dash Punctuation 2
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
48
 
4.3%
47
 
4.2%
47
 
4.2%
29
 
2.6%
28
 
2.5%
27
 
2.4%
23
 
2.1%
19
 
1.7%
19
 
1.7%
18
 
1.6%
Other values (294) 807
72.6%
Lowercase Letter
ValueCountFrequency (%)
r 9
11.1%
a 8
 
9.9%
n 7
 
8.6%
t 7
 
8.6%
e 7
 
8.6%
o 6
 
7.4%
s 6
 
7.4%
i 5
 
6.2%
k 3
 
3.7%
p 3
 
3.7%
Other values (12) 20
24.7%
Uppercase Letter
ValueCountFrequency (%)
S 5
13.2%
O 4
10.5%
C 3
 
7.9%
H 3
 
7.9%
T 3
 
7.9%
M 3
 
7.9%
E 2
 
5.3%
R 2
 
5.3%
D 2
 
5.3%
U 2
 
5.3%
Other values (7) 9
23.7%
Decimal Number
ValueCountFrequency (%)
0 3
23.1%
2 3
23.1%
5 2
15.4%
1 2
15.4%
3 2
15.4%
4 1
 
7.7%
Other Punctuation
ValueCountFrequency (%)
& 3
50.0%
. 2
33.3%
% 1
 
16.7%
Space Separator
ValueCountFrequency (%)
66
100.0%
Open Punctuation
ValueCountFrequency (%)
( 15
100.0%
Close Punctuation
ValueCountFrequency (%)
) 15
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1103
81.8%
Latin 119
 
8.8%
Common 117
 
8.7%
Han 9
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
48
 
4.4%
47
 
4.3%
47
 
4.3%
29
 
2.6%
28
 
2.5%
27
 
2.4%
23
 
2.1%
19
 
1.7%
19
 
1.7%
18
 
1.6%
Other values (285) 798
72.3%
Latin
ValueCountFrequency (%)
r 9
 
7.6%
a 8
 
6.7%
n 7
 
5.9%
t 7
 
5.9%
e 7
 
5.9%
o 6
 
5.0%
s 6
 
5.0%
S 5
 
4.2%
i 5
 
4.2%
O 4
 
3.4%
Other values (29) 55
46.2%
Common
ValueCountFrequency (%)
66
56.4%
( 15
 
12.8%
) 15
 
12.8%
0 3
 
2.6%
& 3
 
2.6%
2 3
 
2.6%
- 2
 
1.7%
. 2
 
1.7%
5 2
 
1.7%
1 2
 
1.7%
Other values (3) 4
 
3.4%
Han
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1103
81.8%
ASCII 236
 
17.5%
CJK 9
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
66
28.0%
( 15
 
6.4%
) 15
 
6.4%
r 9
 
3.8%
a 8
 
3.4%
n 7
 
3.0%
t 7
 
3.0%
e 7
 
3.0%
o 6
 
2.5%
s 6
 
2.5%
Other values (42) 90
38.1%
Hangul
ValueCountFrequency (%)
48
 
4.4%
47
 
4.3%
47
 
4.3%
29
 
2.6%
28
 
2.5%
27
 
2.4%
23
 
2.1%
19
 
1.7%
19
 
1.7%
18
 
1.6%
Other values (285) 798
72.3%
CJK
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Distinct201
Distinct (%)98.0%
Missing1
Missing (%)0.5%
Memory size1.7 KiB
2023-12-11T02:21:56.323393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length54
Median length44
Mean length34.078049
Min length21

Characters and Unicode

Total characters6986
Distinct characters191
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique197 ?
Unique (%)96.1%

Sample

1st row부산광역시 금정구 부산대학로63번길 2 (장전동)
2nd row부산광역시 금정구 부산대학로 10, 103동 23층 2301호 (부곡동, 대우아파트)
3rd row부산광역시 금정구 부산대학로64번길 14-7 (장전동)
4th row부산광역시 금정구 두실로 16 (남산동)
5th row부산광역시 금정구 수림로 132 (장전동)
ValueCountFrequency (%)
부산광역시 205
 
15.9%
금정구 205
 
15.9%
장전동 72
 
5.6%
구서동 40
 
3.1%
부곡동 30
 
2.3%
남산동 28
 
2.2%
금강로 21
 
1.6%
2층 11
 
0.9%
부산대학로63번길 10
 
0.8%
금정로 10
 
0.8%
Other values (426) 659
51.0%
2023-12-11T02:21:57.503612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1144
 
16.4%
287
 
4.1%
280
 
4.0%
280
 
4.0%
272
 
3.9%
260
 
3.7%
1 251
 
3.6%
226
 
3.2%
208
 
3.0%
( 207
 
3.0%
Other values (181) 3571
51.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3956
56.6%
Decimal Number 1217
 
17.4%
Space Separator 1144
 
16.4%
Open Punctuation 207
 
3.0%
Close Punctuation 207
 
3.0%
Other Punctuation 202
 
2.9%
Dash Punctuation 41
 
0.6%
Uppercase Letter 11
 
0.2%
Letter Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
287
 
7.3%
280
 
7.1%
280
 
7.1%
272
 
6.9%
260
 
6.6%
226
 
5.7%
208
 
5.3%
206
 
5.2%
206
 
5.2%
205
 
5.2%
Other values (158) 1526
38.6%
Decimal Number
ValueCountFrequency (%)
1 251
20.6%
2 172
14.1%
0 160
13.1%
3 133
10.9%
5 109
9.0%
4 101
8.3%
7 87
 
7.1%
6 78
 
6.4%
9 70
 
5.8%
8 56
 
4.6%
Uppercase Letter
ValueCountFrequency (%)
B 4
36.4%
A 3
27.3%
T 1
 
9.1%
P 1
 
9.1%
F 1
 
9.1%
D 1
 
9.1%
Other Punctuation
ValueCountFrequency (%)
, 201
99.5%
/ 1
 
0.5%
Space Separator
ValueCountFrequency (%)
1144
100.0%
Open Punctuation
ValueCountFrequency (%)
( 207
100.0%
Close Punctuation
ValueCountFrequency (%)
) 207
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 41
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3956
56.6%
Common 3018
43.2%
Latin 12
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
287
 
7.3%
280
 
7.1%
280
 
7.1%
272
 
6.9%
260
 
6.6%
226
 
5.7%
208
 
5.3%
206
 
5.2%
206
 
5.2%
205
 
5.2%
Other values (158) 1526
38.6%
Common
ValueCountFrequency (%)
1144
37.9%
1 251
 
8.3%
( 207
 
6.9%
) 207
 
6.9%
, 201
 
6.7%
2 172
 
5.7%
0 160
 
5.3%
3 133
 
4.4%
5 109
 
3.6%
4 101
 
3.3%
Other values (6) 333
 
11.0%
Latin
ValueCountFrequency (%)
B 4
33.3%
A 3
25.0%
1
 
8.3%
T 1
 
8.3%
P 1
 
8.3%
F 1
 
8.3%
D 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3956
56.6%
ASCII 3029
43.4%
Number Forms 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1144
37.8%
1 251
 
8.3%
( 207
 
6.8%
) 207
 
6.8%
, 201
 
6.6%
2 172
 
5.7%
0 160
 
5.3%
3 133
 
4.4%
5 109
 
3.6%
4 101
 
3.3%
Other values (12) 344
 
11.4%
Hangul
ValueCountFrequency (%)
287
 
7.3%
280
 
7.1%
280
 
7.1%
272
 
6.9%
260
 
6.6%
226
 
5.7%
208
 
5.3%
206
 
5.2%
206
 
5.2%
205
 
5.2%
Other values (158) 1526
38.6%
Number Forms
ValueCountFrequency (%)
1
100.0%

업종
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
출판사
195 
<NA>
 
11

Length

Max length4
Median length3
Mean length3.0533981
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row출판사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 195
94.7%
<NA> 11
 
5.3%

Length

2023-12-11T02:21:57.873071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:21:58.206560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 195
94.7%
na 11
 
5.3%

Missing values

2023-12-11T02:21:53.517430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T02:21:53.786294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업체명칭사업체소재지(도로명)업종
0사단법인 부산대학교 출판문화원부산광역시 금정구 부산대학로63번길 2 (장전동)출판사
1제일출판인쇄부산광역시 금정구 부산대학로 10, 103동 23층 2301호 (부곡동, 대우아파트)출판사
2만수출판사부산광역시 금정구 부산대학로64번길 14-7 (장전동)출판사
3도서출판 늘함께부산광역시 금정구 두실로 16 (남산동)출판사
4월간불교세계출판부부산광역시 금정구 수림로 132 (장전동)출판사
5도서출판미래원부산광역시 금정구 중앙대로1841번길 65, 1층 103호 (구서동, 구서골드1상가)출판사
6동성출판사부산광역시 금정구 서부로 74-6 (서동)출판사
7광진출판사부산광역시 금정구 부산대학로 60-1 (장전동)출판사
8한둘학력개발연구소부산광역시 금정구 중앙대로1959번길 11 (구서동)출판사
9시공연출부산광역시 금정구 부곡로 1 (부곡동)출판사
사업체명칭사업체소재지(도로명)업종
196도서출판 둔갑부산광역시 금정구 남산로51번길 27, 퍼스트빌 404호 (남산동)<NA>
197노페이퍼북스(NoPaperbooks)부산광역시 금정구 동부곡로5번길 8, 차타운원롬 3층 304호 (부곡동)<NA>
198법기선원부산광역시 금정구 금강로 503, 804동 2004호 (구서동, 롯데캐슬골드2단지)<NA>
199벤처메이트부산광역시 금정구 부산대학로50번길 68, 2층 B3호 (장전동)<NA>
200세컨리폼부산광역시 금정구 금강로279번길 61, 1706호 (장전동, 현대2차아파트)<NA>
201디자인410부산광역시 금정구 금정로60번길 27-8 (장전동)<NA>
202투더문부산광역시 금정구 두실로45번길 102, 3동 108호 (구서동, 일신아파트)<NA>
203솜니움북스부산광역시 금정구 금샘로420번길 13, 310동 204호 (구서동, 선경3차아파트)<NA>
204캣시더부산광역시 금정구 식물원로 64, 104동 1304호 (장전동, 금정산에스케이뷰아파트)<NA>
205다올책방부산광역시 금정구 팔송로7번길 24 (남산동)<NA>