Overview

Dataset statistics

Number of variables4
Number of observations54
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory35.4 B

Variable types

Numeric1
Text1
Categorical2

Dataset

Description부산광역시강서구_출판인쇄업등록현황_20230518
Author부산광역시 강서구
URLhttp://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15023174

Alerts

연번 is highly overall correlated with 업종High correlation
사업체소재지(동) is highly overall correlated with 업종High correlation
업종 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
연번 has unique valuesUnique

Reproduction

Analysis started2023-12-10 16:41:00.623951
Analysis finished2023-12-10 16:41:01.369396
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct54
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.5
Minimum1
Maximum54
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size618.0 B
2023-12-11T01:41:01.493058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.65
Q114.25
median27.5
Q340.75
95-th percentile51.35
Maximum54
Range53
Interquartile range (IQR)26.5

Descriptive statistics

Standard deviation15.732133
Coefficient of variation (CV)0.57207755
Kurtosis-1.2
Mean27.5
Median Absolute Deviation (MAD)13.5
Skewness0
Sum1485
Variance247.5
MonotonicityStrictly increasing
2023-12-11T01:41:01.720832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.9%
42 1
 
1.9%
31 1
 
1.9%
32 1
 
1.9%
33 1
 
1.9%
34 1
 
1.9%
35 1
 
1.9%
36 1
 
1.9%
37 1
 
1.9%
38 1
 
1.9%
Other values (44) 44
81.5%
ValueCountFrequency (%)
1 1
1.9%
2 1
1.9%
3 1
1.9%
4 1
1.9%
5 1
1.9%
6 1
1.9%
7 1
1.9%
8 1
1.9%
9 1
1.9%
10 1
1.9%
ValueCountFrequency (%)
54 1
1.9%
53 1
1.9%
52 1
1.9%
51 1
1.9%
50 1
1.9%
49 1
1.9%
48 1
1.9%
47 1
1.9%
46 1
1.9%
45 1
1.9%
Distinct51
Distinct (%)94.4%
Missing0
Missing (%)0.0%
Memory size564.0 B
2023-12-11T01:41:02.048769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length12
Mean length6.6481481
Min length2

Characters and Unicode

Total characters359
Distinct characters145
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)88.9%

Sample

1st row천지문화사
2nd row프리죤
3rd row리얼라이프북스
4th row정림
5th row도서출판 한국선급
ValueCountFrequency (%)
도서출판 7
 
9.1%
주식회사 3
 
3.9%
북앤스페이스 2
 
2.6%
미서북스 2
 
2.6%
영신사 2
 
2.6%
대진 1
 
1.3%
마이플레이트 1
 
1.3%
꿈꾸는 1
 
1.3%
별들 1
 
1.3%
세종출판디자인 1
 
1.3%
Other values (56) 56
72.7%
2023-12-11T01:41:02.593333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
23
 
6.4%
18
 
5.0%
13
 
3.6%
10
 
2.8%
10
 
2.8%
9
 
2.5%
9
 
2.5%
8
 
2.2%
8
 
2.2%
8
 
2.2%
Other values (135) 243
67.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 298
83.0%
Space Separator 23
 
6.4%
Lowercase Letter 21
 
5.8%
Uppercase Letter 7
 
1.9%
Close Punctuation 5
 
1.4%
Open Punctuation 5
 
1.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
18
 
6.0%
13
 
4.4%
10
 
3.4%
10
 
3.4%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
8
 
2.7%
6
 
2.0%
Other values (115) 199
66.8%
Lowercase Letter
ValueCountFrequency (%)
e 4
19.0%
o 4
19.0%
s 3
14.3%
t 2
9.5%
i 1
 
4.8%
n 1
 
4.8%
b 1
 
4.8%
h 1
 
4.8%
r 1
 
4.8%
f 1
 
4.8%
Other values (2) 2
9.5%
Uppercase Letter
ValueCountFrequency (%)
M 2
28.6%
O 2
28.6%
D 1
14.3%
P 1
14.3%
B 1
14.3%
Space Separator
ValueCountFrequency (%)
23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 298
83.0%
Common 33
 
9.2%
Latin 28
 
7.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
18
 
6.0%
13
 
4.4%
10
 
3.4%
10
 
3.4%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
8
 
2.7%
6
 
2.0%
Other values (115) 199
66.8%
Latin
ValueCountFrequency (%)
e 4
14.3%
o 4
14.3%
s 3
10.7%
M 2
 
7.1%
O 2
 
7.1%
t 2
 
7.1%
D 1
 
3.6%
i 1
 
3.6%
n 1
 
3.6%
b 1
 
3.6%
Other values (7) 7
25.0%
Common
ValueCountFrequency (%)
23
69.7%
) 5
 
15.2%
( 5
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 298
83.0%
ASCII 61
 
17.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
23
37.7%
) 5
 
8.2%
( 5
 
8.2%
e 4
 
6.6%
o 4
 
6.6%
s 3
 
4.9%
M 2
 
3.3%
O 2
 
3.3%
t 2
 
3.3%
D 1
 
1.6%
Other values (10) 10
16.4%
Hangul
ValueCountFrequency (%)
18
 
6.0%
13
 
4.4%
10
 
3.4%
10
 
3.4%
9
 
3.0%
9
 
3.0%
8
 
2.7%
8
 
2.7%
8
 
2.7%
6
 
2.0%
Other values (115) 199
66.8%

사업체소재지(동)
Categorical

HIGH CORRELATION 

Distinct8
Distinct (%)14.8%
Missing0
Missing (%)0.0%
Memory size564.0 B
부산광역시 강서구 명지동
30 
부산광역시 강서구 대저1동
10 
부산광역시 강서구 대저2동
부산광역시 강서구 송정동
부산광역시 강서구 동선동
 
1
Other values (3)
 
3

Length

Max length14
Median length13
Mean length13.314815
Min length13

Unique

Unique4 ?
Unique (%)7.4%

Sample

1st row부산광역시 강서구 대저1동
2nd row부산광역시 강서구 송정동
3rd row부산광역시 강서구 명지동
4th row부산광역시 강서구 명지동
5th row부산광역시 강서구 명지동

Common Values

ValueCountFrequency (%)
부산광역시 강서구 명지동 30
55.6%
부산광역시 강서구 대저1동 10
 
18.5%
부산광역시 강서구 대저2동 6
 
11.1%
부산광역시 강서구 송정동 4
 
7.4%
부산광역시 강서구 동선동 1
 
1.9%
부산광역시 강서구 지사동 1
 
1.9%
부산광역시 강서구 신호동 1
 
1.9%
부산광역시 강서구 송정동 1
 
1.9%

Length

2023-12-11T01:41:02.789262image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:41:02.951157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부산광역시 54
33.3%
강서구 54
33.3%
명지동 30
18.5%
대저1동 10
 
6.2%
대저2동 6
 
3.7%
송정동 5
 
3.1%
동선동 1
 
0.6%
지사동 1
 
0.6%
신호동 1
 
0.6%

업종
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size564.0 B
출판사
42 
인쇄사
12 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row출판사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 42
77.8%
인쇄사 12
 
22.2%

Length

2023-12-11T01:41:03.166338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T01:41:03.318170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 42
77.8%
인쇄사 12
 
22.2%

Interactions

2023-12-11T01:41:00.957834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T01:41:03.408684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업체명칭사업체소재지(동)업종
연번1.0000.7390.5070.996
사업체명칭0.7391.0001.0000.626
사업체소재지(동)0.5071.0001.0000.909
업종0.9960.6260.9091.000
2023-12-11T01:41:03.535931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업체소재지(동)업종
사업체소재지(동)1.0000.695
업종0.6951.000
2023-12-11T01:41:03.644315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번사업체소재지(동)업종
연번1.0000.2580.871
사업체소재지(동)0.2581.0000.695
업종0.8710.6951.000

Missing values

2023-12-11T01:41:01.146088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T01:41:01.307571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번사업체명칭사업체소재지(동)업종
01천지문화사부산광역시 강서구 대저1동출판사
12프리죤부산광역시 강서구 송정동출판사
23리얼라이프북스부산광역시 강서구 명지동출판사
34정림부산광역시 강서구 명지동출판사
45도서출판 한국선급부산광역시 강서구 명지동출판사
56작은통일부산광역시 강서구 명지동출판사
67귀를 닫은 토끼부산광역시 강서구 대저2동출판사
78데스티니 북스(Destiny Books)부산광역시 강서구 명지동출판사
89영신사부산광역시 강서구 대저2동출판사
910넉넉부산광역시 강서구 명지동출판사
연번사업체명칭사업체소재지(동)업종
4445영신사부산광역시 강서구 대저2동인쇄사
4546성광정판인쇄부산광역시 강서구 송정동인쇄사
4647명보카렌다부산광역시 강서구 송정동인쇄사
4748(주)통구부산광역시 강서구 송정동인쇄사
4849모루상사부산광역시 강서구 송정동인쇄사
4950천지종합 인쇄사부산광역시 강서구 대저1동인쇄사
5051동진인쇄부산광역시 강서구 대저1동인쇄사
5152경도상사부산광역시 강서구 대저1동인쇄사
5253명성사부산광역시 강서구 대저1동인쇄사
5354명성프린테크부산광역시 강서구 대저1동인쇄사