Overview

Dataset statistics

Number of variables3
Number of observations196
Missing cells3
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.7 KiB
Average record size in memory24.7 B

Variable types

Text2
Categorical1

Dataset

Description부산광역시금정구_출판업등록현황_20210326
Author부산광역시 금정구
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3055406

Alerts

업종 is highly imbalanced (95.4%)Imbalance
사업체소재지(도로명) has 2 (1.0%) missing valuesMissing

Reproduction

Analysis started2023-12-10 17:21:59.849725
Analysis finished2023-12-10 17:22:01.237871
Duration1.39 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct195
Distinct (%)100.0%
Missing1
Missing (%)0.5%
Memory size1.7 KiB
2023-12-11T02:22:01.657778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length15
Mean length6.3692308
Min length2

Characters and Unicode

Total characters1242
Distinct characters344
Distinct categories9 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique195 ?
Unique (%)100.0%

Sample

1st row사단법인 부산대학교 출판문화원
2nd row제일출판인쇄
3rd row만수출판사
4th row도서출판 늘함께
5th row월간불교세계출판부
ValueCountFrequency (%)
도서출판 19
 
7.5%
주식회사 8
 
3.2%
사단법인 2
 
0.8%
가을 2
 
0.8%
출판부 2
 
0.8%
아슬란 1
 
0.4%
부산산악포럼 1
 
0.4%
사운드퍼즐 1
 
0.4%
쉼표 1
 
0.4%
ciznet 1
 
0.4%
Other values (214) 214
84.9%
2023-12-11T02:22:02.850180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
57
 
4.6%
47
 
3.8%
46
 
3.7%
39
 
3.1%
29
 
2.3%
29
 
2.3%
28
 
2.3%
23
 
1.9%
20
 
1.6%
19
 
1.5%
Other values (334) 905
72.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1033
83.2%
Lowercase Letter 61
 
4.9%
Space Separator 57
 
4.6%
Uppercase Letter 46
 
3.7%
Close Punctuation 15
 
1.2%
Open Punctuation 15
 
1.2%
Decimal Number 8
 
0.6%
Other Punctuation 6
 
0.5%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
47
 
4.5%
46
 
4.5%
39
 
3.8%
29
 
2.8%
29
 
2.8%
28
 
2.7%
23
 
2.2%
20
 
1.9%
19
 
1.8%
17
 
1.6%
Other values (284) 736
71.2%
Lowercase Letter
ValueCountFrequency (%)
t 7
11.5%
a 6
9.8%
r 6
9.8%
e 5
 
8.2%
s 5
 
8.2%
n 5
 
8.2%
i 4
 
6.6%
o 3
 
4.9%
l 3
 
4.9%
f 3
 
4.9%
Other values (10) 14
23.0%
Uppercase Letter
ValueCountFrequency (%)
S 5
10.9%
O 5
10.9%
A 4
 
8.7%
M 4
 
8.7%
R 3
 
6.5%
H 3
 
6.5%
C 3
 
6.5%
E 3
 
6.5%
T 3
 
6.5%
B 2
 
4.3%
Other values (8) 11
23.9%
Decimal Number
ValueCountFrequency (%)
5 2
25.0%
2 2
25.0%
3 2
25.0%
0 1
12.5%
1 1
12.5%
Other Punctuation
ValueCountFrequency (%)
& 3
50.0%
. 2
33.3%
% 1
 
16.7%
Space Separator
ValueCountFrequency (%)
57
100.0%
Close Punctuation
ValueCountFrequency (%)
) 15
100.0%
Open Punctuation
ValueCountFrequency (%)
( 15
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1024
82.4%
Latin 107
 
8.6%
Common 102
 
8.2%
Han 9
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
47
 
4.6%
46
 
4.5%
39
 
3.8%
29
 
2.8%
29
 
2.8%
28
 
2.7%
23
 
2.2%
20
 
2.0%
19
 
1.9%
17
 
1.7%
Other values (275) 727
71.0%
Latin
ValueCountFrequency (%)
t 7
 
6.5%
a 6
 
5.6%
r 6
 
5.6%
S 5
 
4.7%
e 5
 
4.7%
s 5
 
4.7%
n 5
 
4.7%
O 5
 
4.7%
i 4
 
3.7%
A 4
 
3.7%
Other values (28) 55
51.4%
Common
ValueCountFrequency (%)
57
55.9%
) 15
 
14.7%
( 15
 
14.7%
& 3
 
2.9%
5 2
 
2.0%
2 2
 
2.0%
3 2
 
2.0%
. 2
 
2.0%
0 1
 
1.0%
1 1
 
1.0%
Other values (2) 2
 
2.0%
Han
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1024
82.4%
ASCII 209
 
16.8%
CJK 9
 
0.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
57
27.3%
) 15
 
7.2%
( 15
 
7.2%
t 7
 
3.3%
a 6
 
2.9%
r 6
 
2.9%
S 5
 
2.4%
e 5
 
2.4%
s 5
 
2.4%
n 5
 
2.4%
Other values (40) 83
39.7%
Hangul
ValueCountFrequency (%)
47
 
4.6%
46
 
4.5%
39
 
3.8%
29
 
2.8%
29
 
2.8%
28
 
2.7%
23
 
2.2%
20
 
2.0%
19
 
1.9%
17
 
1.7%
Other values (275) 727
71.0%
CJK
ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Distinct190
Distinct (%)97.9%
Missing2
Missing (%)1.0%
Memory size1.7 KiB
2023-12-11T02:22:03.553503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length54
Median length45
Mean length33.458763
Min length21

Characters and Unicode

Total characters6491
Distinct characters187
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique186 ?
Unique (%)95.9%

Sample

1st row부산광역시 금정구 부산대학로63번길 2 (장전동)
2nd row부산광역시 금정구 부산대학로 10, 103동 23층 2301호 (부곡동, 대우아파트)
3rd row부산광역시 금정구 부산대학로64번길 14-7 (장전동)
4th row부산광역시 금정구 두실로 16 (남산동)
5th row부산광역시 금정구 수림로 132 (장전동)
ValueCountFrequency (%)
부산광역시 194
 
16.2%
금정구 194
 
16.2%
장전동 63
 
5.3%
구서동 39
 
3.3%
부곡동 32
 
2.7%
남산동 26
 
2.2%
금강로 19
 
1.6%
부산대학로63번길 11
 
0.9%
금정로 11
 
0.9%
중앙대로 11
 
0.9%
Other values (390) 598
49.9%
2023-12-11T02:22:04.655119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1071
 
16.5%
278
 
4.3%
266
 
4.1%
258
 
4.0%
257
 
4.0%
248
 
3.8%
1 243
 
3.7%
212
 
3.3%
197
 
3.0%
( 196
 
3.0%
Other values (177) 3265
50.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3688
56.8%
Decimal Number 1113
 
17.1%
Space Separator 1071
 
16.5%
Open Punctuation 196
 
3.0%
Close Punctuation 196
 
3.0%
Other Punctuation 174
 
2.7%
Dash Punctuation 41
 
0.6%
Uppercase Letter 12
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
278
 
7.5%
266
 
7.2%
258
 
7.0%
257
 
7.0%
248
 
6.7%
212
 
5.7%
197
 
5.3%
196
 
5.3%
195
 
5.3%
194
 
5.3%
Other values (154) 1387
37.6%
Decimal Number
ValueCountFrequency (%)
1 243
21.8%
2 163
14.6%
0 135
12.1%
3 114
10.2%
5 103
9.3%
4 89
 
8.0%
7 80
 
7.2%
6 70
 
6.3%
9 68
 
6.1%
8 48
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
A 4
33.3%
B 3
25.0%
P 1
 
8.3%
D 1
 
8.3%
T 1
 
8.3%
F 1
 
8.3%
L 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
, 173
99.4%
/ 1
 
0.6%
Space Separator
ValueCountFrequency (%)
1071
100.0%
Open Punctuation
ValueCountFrequency (%)
( 196
100.0%
Close Punctuation
ValueCountFrequency (%)
) 196
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 41
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3688
56.8%
Common 2791
43.0%
Latin 12
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
278
 
7.5%
266
 
7.2%
258
 
7.0%
257
 
7.0%
248
 
6.7%
212
 
5.7%
197
 
5.3%
196
 
5.3%
195
 
5.3%
194
 
5.3%
Other values (154) 1387
37.6%
Common
ValueCountFrequency (%)
1071
38.4%
1 243
 
8.7%
( 196
 
7.0%
) 196
 
7.0%
, 173
 
6.2%
2 163
 
5.8%
0 135
 
4.8%
3 114
 
4.1%
5 103
 
3.7%
4 89
 
3.2%
Other values (6) 308
 
11.0%
Latin
ValueCountFrequency (%)
A 4
33.3%
B 3
25.0%
P 1
 
8.3%
D 1
 
8.3%
T 1
 
8.3%
F 1
 
8.3%
L 1
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3688
56.8%
ASCII 2803
43.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1071
38.2%
1 243
 
8.7%
( 196
 
7.0%
) 196
 
7.0%
, 173
 
6.2%
2 163
 
5.8%
0 135
 
4.8%
3 114
 
4.1%
5 103
 
3.7%
4 89
 
3.2%
Other values (13) 320
 
11.4%
Hangul
ValueCountFrequency (%)
278
 
7.5%
266
 
7.2%
258
 
7.0%
257
 
7.0%
248
 
6.7%
212
 
5.7%
197
 
5.3%
196
 
5.3%
195
 
5.3%
194
 
5.3%
Other values (154) 1387
37.6%

업종
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
출판사
195 
<NA>
 
1

Length

Max length4
Median length3
Mean length3.005102
Min length3

Unique

Unique1 ?
Unique (%)0.5%

Sample

1st row출판사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 195
99.5%
<NA> 1
 
0.5%

Length

2023-12-11T02:22:05.028168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T02:22:05.345264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 195
99.5%
na 1
 
0.5%

Missing values

2023-12-11T02:22:00.583444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T02:22:00.818705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T02:22:01.084416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

사업체명칭사업체소재지(도로명)업종
0사단법인 부산대학교 출판문화원부산광역시 금정구 부산대학로63번길 2 (장전동)출판사
1제일출판인쇄부산광역시 금정구 부산대학로 10, 103동 23층 2301호 (부곡동, 대우아파트)출판사
2만수출판사부산광역시 금정구 부산대학로64번길 14-7 (장전동)출판사
3도서출판 늘함께부산광역시 금정구 두실로 16 (남산동)출판사
4월간불교세계출판부부산광역시 금정구 수림로 132 (장전동)출판사
5도서출판미래원부산광역시 금정구 중앙대로1841번길 65, 1층 103호 (구서동, 구서골드1상가)출판사
6동성출판사부산광역시 금정구 서부로 74-6 (서동)출판사
7광진출판사부산광역시 금정구 부산대학로 60-1 (장전동)출판사
8한둘학력개발연구소부산광역시 금정구 중앙대로1959번길 11 (구서동)출판사
9시공연출부산광역시 금정구 부곡로 1 (부곡동)출판사
사업체명칭사업체소재지(도로명)업종
186주식회사 글로벌탑넷부산광역시 금정구 시실로 11-3, 순흥빌딩 3층 (부곡동)출판사
187도서출판3부산광역시 금정구 중앙대로1929번길 48, 101동 808호 (구서동, 부영벽산아파트)출판사
188데이북부산광역시 금정구 금강로279번길 61, 110호 (장전동, 현대2차아파트)출판사
189도서출판 꿈 키움부산광역시 금정구 금샘로485번길 65, 부산외국어대학교 A동 130호 (남산동)출판사
190디자인 달라부산광역시 금정구 금강로 690-3, 304호 (남산동, 스튜디오690)출판사
191소하북스부산광역시 금정구 장전온천천로89번길 10-1 (장전동)출판사
192가인의 집밥부산광역시 금정구 서동로149번길 19 (서동)출판사
193빅터디자인스튜디오부산광역시 금정구 금강로578번길 32, 2층 (구서동)출판사
194더알미디어부산광역시 금정구 금정로 63-1, 4층 LAB 5호 (장전동)출판사
195<NA><NA><NA>