Overview

Dataset statistics

Number of variables5
Number of observations186
Missing cells75
Missing cells (%)8.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.6 KiB
Average record size in memory41.7 B

Variable types

Numeric1
Categorical2
Text2

Dataset

Description부산광역시남구출판사및인쇄소현황_20210927
Author부산광역시 남구
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=3034659

Alerts

사업체소재지(도로명) has constant value ""Constant
연번 is highly overall correlated with 업종High correlation
업종 is highly overall correlated with 연번High correlation
전화번호 has 75 (40.3%) missing valuesMissing
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:25:39.409462
Analysis finished2023-12-10 16:25:40.458603
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct186
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean93.5
Minimum1
Maximum186
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.8 KiB
2023-12-11T01:25:40.542616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile10.25
Q147.25
median93.5
Q3139.75
95-th percentile176.75
Maximum186
Range185
Interquartile range (IQR)92.5

Descriptive statistics

Standard deviation53.837719
Coefficient of variation (CV)0.57580448
Kurtosis-1.2
Mean93.5
Median Absolute Deviation (MAD)46.5
Skewness0
Sum17391
Variance2898.5
MonotonicityStrictly increasing
2023-12-11T01:25:40.716425image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.5%
129 1
 
0.5%
120 1
 
0.5%
121 1
 
0.5%
122 1
 
0.5%
123 1
 
0.5%
124 1
 
0.5%
125 1
 
0.5%
126 1
 
0.5%
127 1
 
0.5%
Other values (176) 176
94.6%
ValueCountFrequency (%)
1 1
0.5%
2 1
0.5%
3 1
0.5%
4 1
0.5%
5 1
0.5%
6 1
0.5%
7 1
0.5%
8 1
0.5%
9 1
0.5%
10 1
0.5%
ValueCountFrequency (%)
186 1
0.5%
185 1
0.5%
184 1
0.5%
183 1
0.5%
182 1
0.5%
181 1
0.5%
180 1
0.5%
179 1
0.5%
178 1
0.5%
177 1
0.5%

업종
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
출판사
158 
인쇄사
28 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row출판사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 158
84.9%
인쇄사 28
 
15.1%

Length

2023-12-11T01:25:40.896862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:25:41.016454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 158
84.9%
인쇄사 28
 
15.1%
Distinct175
Distinct (%)94.1%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
2023-12-11T01:25:41.328854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length14
Mean length6.8870968
Min length2

Characters and Unicode

Total characters1281
Distinct characters321
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique164 ?
Unique (%)88.2%

Sample

1st row경성대학교출판부
2nd row부경대학교 출판부
3rd row도서출판 에이맨
4th row아베마리아출판사
5th row폰테고 출판사
ValueCountFrequency (%)
도서출판 13
 
5.3%
주식회사 8
 
3.2%
출판사 6
 
2.4%
디자인 4
 
1.6%
에그위즈 2
 
0.8%
출판부 2
 
0.8%
디자인g2 2
 
0.8%
golden 2
 
0.8%
음악 2
 
0.8%
담앤북스(부산 2
 
0.8%
Other values (196) 204
82.6%
2023-12-11T01:25:41.796320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
61
 
4.8%
39
 
3.0%
33
 
2.6%
32
 
2.5%
31
 
2.4%
27
 
2.1%
) 27
 
2.1%
26
 
2.0%
( 26
 
2.0%
20
 
1.6%
Other values (311) 959
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1026
80.1%
Lowercase Letter 82
 
6.4%
Space Separator 61
 
4.8%
Uppercase Letter 47
 
3.7%
Close Punctuation 27
 
2.1%
Open Punctuation 26
 
2.0%
Decimal Number 7
 
0.5%
Other Punctuation 5
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
39
 
3.8%
33
 
3.2%
32
 
3.1%
31
 
3.0%
27
 
2.6%
26
 
2.5%
20
 
1.9%
20
 
1.9%
19
 
1.9%
18
 
1.8%
Other values (263) 761
74.2%
Lowercase Letter
ValueCountFrequency (%)
n 10
12.2%
i 8
 
9.8%
o 7
 
8.5%
a 7
 
8.5%
g 6
 
7.3%
e 6
 
7.3%
r 5
 
6.1%
u 4
 
4.9%
c 4
 
4.9%
d 4
 
4.9%
Other values (10) 21
25.6%
Uppercase Letter
ValueCountFrequency (%)
G 6
12.8%
C 5
10.6%
N 5
10.6%
E 4
 
8.5%
P 3
 
6.4%
D 3
 
6.4%
S 3
 
6.4%
J 3
 
6.4%
R 2
 
4.3%
L 2
 
4.3%
Other values (10) 11
23.4%
Other Punctuation
ValueCountFrequency (%)
& 2
40.0%
. 2
40.0%
/ 1
20.0%
Decimal Number
ValueCountFrequency (%)
2 4
57.1%
1 3
42.9%
Space Separator
ValueCountFrequency (%)
61
100.0%
Close Punctuation
ValueCountFrequency (%)
) 27
100.0%
Open Punctuation
ValueCountFrequency (%)
( 26
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1018
79.5%
Latin 129
 
10.1%
Common 126
 
9.8%
Han 8
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
39
 
3.8%
33
 
3.2%
32
 
3.1%
31
 
3.0%
27
 
2.7%
26
 
2.6%
20
 
2.0%
20
 
2.0%
19
 
1.9%
18
 
1.8%
Other values (255) 753
74.0%
Latin
ValueCountFrequency (%)
n 10
 
7.8%
i 8
 
6.2%
o 7
 
5.4%
a 7
 
5.4%
G 6
 
4.7%
g 6
 
4.7%
e 6
 
4.7%
C 5
 
3.9%
N 5
 
3.9%
r 5
 
3.9%
Other values (30) 64
49.6%
Common
ValueCountFrequency (%)
61
48.4%
) 27
21.4%
( 26
20.6%
2 4
 
3.2%
1 3
 
2.4%
& 2
 
1.6%
. 2
 
1.6%
/ 1
 
0.8%
Han
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1018
79.5%
ASCII 255
 
19.9%
CJK 8
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
61
23.9%
) 27
 
10.6%
( 26
 
10.2%
n 10
 
3.9%
i 8
 
3.1%
o 7
 
2.7%
a 7
 
2.7%
G 6
 
2.4%
g 6
 
2.4%
e 6
 
2.4%
Other values (38) 91
35.7%
Hangul
ValueCountFrequency (%)
39
 
3.8%
33
 
3.2%
32
 
3.1%
31
 
3.0%
27
 
2.7%
26
 
2.6%
20
 
2.0%
20
 
2.0%
19
 
1.9%
18
 
1.8%
Other values (255) 753
74.0%
CJK
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%

사업체소재지(도로명)
Categorical

CONSTANT 

Distinct1
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.6 KiB
부산광역시 남구
186 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부산광역시 남구
2nd row부산광역시 남구
3rd row부산광역시 남구
4th row부산광역시 남구
5th row부산광역시 남구

Common Values

ValueCountFrequency (%)
부산광역시 남구 186
100.0%

Length

2023-12-11T01:25:41.963875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:25:42.102996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 186
50.0%
남구 186
50.0%

전화번호
Text

MISSING 

Distinct101
Distinct (%)91.0%
Missing75
Missing (%)40.3%
Memory size1.6 KiB
2023-12-11T01:25:42.362833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.054054
Min length12

Characters and Unicode

Total characters1338
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)82.0%

Sample

1st row051-620-4355
2nd row051-620-1325
3rd row051-645-9801
4th row051-625-7373
5th row051-631-8101
ValueCountFrequency (%)
051-626-0777 2
 
1.8%
051-631-9907 2
 
1.8%
051-623-8003 2
 
1.8%
051-623-7733 2
 
1.8%
051-624-8898 2
 
1.8%
051-464-1230 2
 
1.8%
051-632-9005 2
 
1.8%
051-624-4620 2
 
1.8%
051-248-3699 2
 
1.8%
051-611-3951 2
 
1.8%
Other values (91) 91
82.0%
2023-12-11T01:25:42.855627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 222
16.6%
1 200
14.9%
0 196
14.6%
5 152
11.4%
6 142
10.6%
2 101
7.5%
3 77
 
5.8%
8 67
 
5.0%
4 67
 
5.0%
7 59
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1116
83.4%
Dash Punctuation 222
 
16.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 200
17.9%
0 196
17.6%
5 152
13.6%
6 142
12.7%
2 101
9.1%
3 77
 
6.9%
8 67
 
6.0%
4 67
 
6.0%
7 59
 
5.3%
9 55
 
4.9%
Dash Punctuation
ValueCountFrequency (%)
- 222
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1338
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 222
16.6%
1 200
14.9%
0 196
14.6%
5 152
11.4%
6 142
10.6%
2 101
7.5%
3 77
 
5.8%
8 67
 
5.0%
4 67
 
5.0%
7 59
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1338
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 222
16.6%
1 200
14.9%
0 196
14.6%
5 152
11.4%
6 142
10.6%
2 101
7.5%
3 77
 
5.8%
8 67
 
5.0%
4 67
 
5.0%
7 59
 
4.4%

Interactions

2023-12-11T01:25:39.782975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:25:42.976879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번업종
연번1.0000.986
업종0.9861.000
2023-12-11T01:25:43.087576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번업종
연번1.0000.875
업종0.8751.000

Missing values

2023-12-11T01:25:39.961181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:25:40.419801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번업종사업체명칭사업체소재지(도로명)전화번호
01출판사경성대학교출판부부산광역시 남구051-620-4355
12출판사부경대학교 출판부부산광역시 남구051-620-1325
23출판사도서출판 에이맨부산광역시 남구051-645-9801
34출판사아베마리아출판사부산광역시 남구<NA>
45출판사폰테고 출판사부산광역시 남구051-625-7373
56출판사기러기문화원부산광역시 남구<NA>
67출판사(재)한국경제정책연구원부산광역시 남구051-631-8101
78출판사도서출판 논문의집부산광역시 남구<NA>
89출판사도서출판글초롱부산광역시 남구051-623-7733
910출판사만나출판사부산광역시 남구051-622-8536
연번업종사업체명칭사업체소재지(도로명)전화번호
176177인쇄사한길기획부산광역시 남구051-624-8898
177178인쇄사플랑부산광역시 남구051-631-9907
178179인쇄사디자인통두손컴부설연구소부산광역시 남구051-623-8003
179180인쇄사주식회사 유니온키드부산광역시 남구<NA>
180181인쇄사주식회사 재원부산광역시 남구051-248-3699
181182인쇄사수애드부산광역시 남구<NA>
182183인쇄사이너스부산광역시 남구051-632-9005
183184인쇄사주식회사 에그위즈부산광역시 남구<NA>
184185인쇄사뜬금없는 디자인부산광역시 남구<NA>
185186인쇄사디자인G2부산광역시 남구<NA>