Overview

Dataset statistics

Number of variables5
Number of observations196
Missing cells310
Missing cells (%)31.6%
Duplicate rows1
Duplicate rows (%)0.5%
Total size in memory7.8 KiB
Average record size in memory40.7 B

Variable types

Categorical1
Text3
DateTime1

Dataset

Description부산광역시 사하구 관내 인쇄사 및 출판사의 업종, 사업체명칭, 주소, 전화번호 등 사업체 정보를 붙임과 같이 제공합니다.
Author부산광역시 사하구
URLhttps://www.data.go.kr/data/3045772/fileData.do

Alerts

데이터기준일자 has constant value ""Constant
Dataset has 1 (0.5%) duplicate rowsDuplicates
사업체명칭 has 55 (28.1%) missing valuesMissing
도로명주소 has 55 (28.1%) missing valuesMissing
전화번호 has 145 (74.0%) missing valuesMissing
데이터기준일자 has 55 (28.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 04:18:06.344282
Analysis finished2023-12-12 04:18:06.989272
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업종
Categorical

Distinct3
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size1.7 KiB
출판사
118 
<NA>
55 
인쇄사
23 

Length

Max length4
Median length3
Mean length3.2806122
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row출판사
2nd row출판사
3rd row출판사
4th row출판사
5th row출판사

Common Values

ValueCountFrequency (%)
출판사 118
60.2%
<NA> 55
28.1%
인쇄사 23
 
11.7%

Length

2023-12-12T13:18:07.084718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T13:18:07.207450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출판사 118
60.2%
na 55
28.1%
인쇄사 23
 
11.7%

사업체명칭
Text

MISSING 

Distinct136
Distinct (%)96.5%
Missing55
Missing (%)28.1%
Memory size1.7 KiB
2023-12-12T13:18:07.442717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length25
Median length18
Mean length6.7730496
Min length2

Characters and Unicode

Total characters955
Distinct characters290
Distinct categories8 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131 ?
Unique (%)92.9%

Sample

1st row태극도 출판부
2nd row동문출판기획
3rd row도서출판 동아기획
4th row(주)도시인쇄문화사
5th row동아대학교출판사
ValueCountFrequency (%)
도서출판 10
 
4.8%
주식회사 6
 
2.9%
출판사 4
 
1.9%
대원애드콤 2
 
1.0%
인쇄사 2
 
1.0%
동아기획 2
 
1.0%
연화경 2
 
1.0%
예스패키지 2
 
1.0%
글꽃 2
 
1.0%
디자인 2
 
1.0%
Other values (171) 173
83.6%
2023-12-12T13:18:07.784414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
66
 
6.9%
27
 
2.8%
26
 
2.7%
25
 
2.6%
22
 
2.3%
20
 
2.1%
17
 
1.8%
15
 
1.6%
) 14
 
1.5%
( 14
 
1.5%
Other values (280) 709
74.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 699
73.2%
Uppercase Letter 81
 
8.5%
Lowercase Letter 73
 
7.6%
Space Separator 66
 
6.9%
Close Punctuation 14
 
1.5%
Open Punctuation 14
 
1.5%
Decimal Number 5
 
0.5%
Other Punctuation 3
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
27
 
3.9%
26
 
3.7%
25
 
3.6%
22
 
3.1%
20
 
2.9%
17
 
2.4%
15
 
2.1%
14
 
2.0%
14
 
2.0%
14
 
2.0%
Other values (231) 505
72.2%
Uppercase Letter
ValueCountFrequency (%)
S 10
 
12.3%
D 7
 
8.6%
O 6
 
7.4%
N 6
 
7.4%
A 5
 
6.2%
E 5
 
6.2%
T 5
 
6.2%
I 4
 
4.9%
C 4
 
4.9%
U 3
 
3.7%
Other values (12) 26
32.1%
Lowercase Letter
ValueCountFrequency (%)
a 11
15.1%
e 9
12.3%
n 6
8.2%
i 6
8.2%
s 6
8.2%
r 6
8.2%
l 6
8.2%
o 6
8.2%
g 4
 
5.5%
h 3
 
4.1%
Other values (8) 10
13.7%
Decimal Number
ValueCountFrequency (%)
1 2
40.0%
5 1
20.0%
4 1
20.0%
3 1
20.0%
Other Punctuation
ValueCountFrequency (%)
. 2
66.7%
' 1
33.3%
Space Separator
ValueCountFrequency (%)
66
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Open Punctuation
ValueCountFrequency (%)
( 14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 697
73.0%
Latin 154
 
16.1%
Common 102
 
10.7%
Han 2
 
0.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
27
 
3.9%
26
 
3.7%
25
 
3.6%
22
 
3.2%
20
 
2.9%
17
 
2.4%
15
 
2.2%
14
 
2.0%
14
 
2.0%
14
 
2.0%
Other values (229) 503
72.2%
Latin
ValueCountFrequency (%)
a 11
 
7.1%
S 10
 
6.5%
e 9
 
5.8%
D 7
 
4.5%
n 6
 
3.9%
i 6
 
3.9%
s 6
 
3.9%
r 6
 
3.9%
l 6
 
3.9%
O 6
 
3.9%
Other values (30) 81
52.6%
Common
ValueCountFrequency (%)
66
64.7%
) 14
 
13.7%
( 14
 
13.7%
1 2
 
2.0%
. 2
 
2.0%
5 1
 
1.0%
4 1
 
1.0%
3 1
 
1.0%
' 1
 
1.0%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 697
73.0%
ASCII 256
 
26.8%
CJK 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
66
25.8%
) 14
 
5.5%
( 14
 
5.5%
a 11
 
4.3%
S 10
 
3.9%
e 9
 
3.5%
D 7
 
2.7%
n 6
 
2.3%
i 6
 
2.3%
s 6
 
2.3%
Other values (39) 107
41.8%
Hangul
ValueCountFrequency (%)
27
 
3.9%
26
 
3.7%
25
 
3.6%
22
 
3.2%
20
 
2.9%
17
 
2.4%
15
 
2.2%
14
 
2.0%
14
 
2.0%
14
 
2.0%
Other values (229) 503
72.2%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

도로명주소
Text

MISSING 

Distinct133
Distinct (%)94.3%
Missing55
Missing (%)28.1%
Memory size1.7 KiB
2023-12-12T13:18:08.042465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length48
Mean length36.51773
Min length21

Characters and Unicode

Total characters5149
Distinct characters167
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique127 ?
Unique (%)90.1%

Sample

1st row부산광역시 사하구 감천로142번길 25-4 (감천동)
2nd row부산광역시 사하구 낙동대로520번길 1 (하단동)
3rd row부산광역시 사하구 낙동대로 542, 213호 (하단동, 대우에덴프라자)
4th row부산광역시 사하구 다대로170번길 13 (신평동)
5th row부산광역시 사하구 낙동대로550번길 37 (하단동)
ValueCountFrequency (%)
부산광역시 141
 
14.8%
사하구 141
 
14.8%
다대동 29
 
3.0%
하단동 25
 
2.6%
괴정동 22
 
2.3%
당리동 21
 
2.2%
장림동 20
 
2.1%
신평동 16
 
1.7%
낙동대로 13
 
1.4%
2층 10
 
1.1%
Other values (322) 513
53.9%
2023-12-12T13:18:08.469998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
810
 
15.7%
1 245
 
4.8%
235
 
4.6%
193
 
3.7%
0 170
 
3.3%
, 163
 
3.2%
155
 
3.0%
148
 
2.9%
147
 
2.9%
144
 
2.8%
Other values (157) 2739
53.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2909
56.5%
Decimal Number 965
 
18.7%
Space Separator 810
 
15.7%
Other Punctuation 163
 
3.2%
Open Punctuation 141
 
2.7%
Close Punctuation 141
 
2.7%
Dash Punctuation 13
 
0.3%
Uppercase Letter 5
 
0.1%
Lowercase Letter 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
235
 
8.1%
193
 
6.6%
155
 
5.3%
148
 
5.1%
147
 
5.1%
144
 
5.0%
143
 
4.9%
142
 
4.9%
141
 
4.8%
127
 
4.4%
Other values (139) 1334
45.9%
Decimal Number
ValueCountFrequency (%)
1 245
25.4%
0 170
17.6%
2 132
13.7%
3 114
11.8%
4 78
 
8.1%
5 67
 
6.9%
7 52
 
5.4%
6 50
 
5.2%
8 29
 
3.0%
9 28
 
2.9%
Uppercase Letter
ValueCountFrequency (%)
A 3
60.0%
W 2
40.0%
Space Separator
ValueCountFrequency (%)
810
100.0%
Other Punctuation
ValueCountFrequency (%)
, 163
100.0%
Open Punctuation
ValueCountFrequency (%)
( 141
100.0%
Close Punctuation
ValueCountFrequency (%)
) 141
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 13
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2909
56.5%
Common 2233
43.4%
Latin 7
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
235
 
8.1%
193
 
6.6%
155
 
5.3%
148
 
5.1%
147
 
5.1%
144
 
5.0%
143
 
4.9%
142
 
4.9%
141
 
4.8%
127
 
4.4%
Other values (139) 1334
45.9%
Common
ValueCountFrequency (%)
810
36.3%
1 245
 
11.0%
0 170
 
7.6%
, 163
 
7.3%
( 141
 
6.3%
) 141
 
6.3%
2 132
 
5.9%
3 114
 
5.1%
4 78
 
3.5%
5 67
 
3.0%
Other values (5) 172
 
7.7%
Latin
ValueCountFrequency (%)
A 3
42.9%
W 2
28.6%
e 2
28.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2909
56.5%
ASCII 2240
43.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
810
36.2%
1 245
 
10.9%
0 170
 
7.6%
, 163
 
7.3%
( 141
 
6.3%
) 141
 
6.3%
2 132
 
5.9%
3 114
 
5.1%
4 78
 
3.5%
5 67
 
3.0%
Other values (8) 179
 
8.0%
Hangul
ValueCountFrequency (%)
235
 
8.1%
193
 
6.6%
155
 
5.3%
148
 
5.1%
147
 
5.1%
144
 
5.0%
143
 
4.9%
142
 
4.9%
141
 
4.8%
127
 
4.4%
Other values (139) 1334
45.9%

전화번호
Text

MISSING 

Distinct49
Distinct (%)96.1%
Missing145
Missing (%)74.0%
Memory size1.7 KiB
2023-12-12T13:18:08.688163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length12.137255
Min length12

Characters and Unicode

Total characters619
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique47 ?
Unique (%)92.2%

Sample

1st row051-292-0571
2nd row051-206-9785
3rd row051-292-1177
4th row051-200-6391-2
5th row051-207-2530
ValueCountFrequency (%)
051-292-1177 2
 
3.9%
051-206-9785 2
 
3.9%
051-291-0911 1
 
2.0%
051-715-1079 1
 
2.0%
051-631-3032 1
 
2.0%
051-204-3036 1
 
2.0%
051-261-4114 1
 
2.0%
070-4197-6693 1
 
2.0%
051-714-3935 1
 
2.0%
051-206-1891 1
 
2.0%
Other values (39) 39
76.5%
2023-12-12T13:18:09.054675image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 106
17.1%
- 104
16.8%
1 94
15.2%
5 75
12.1%
2 64
10.3%
9 36
 
5.8%
7 35
 
5.7%
3 30
 
4.8%
4 28
 
4.5%
6 24
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 515
83.2%
Dash Punctuation 104
 
16.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 106
20.6%
1 94
18.3%
5 75
14.6%
2 64
12.4%
9 36
 
7.0%
7 35
 
6.8%
3 30
 
5.8%
4 28
 
5.4%
6 24
 
4.7%
8 23
 
4.5%
Dash Punctuation
ValueCountFrequency (%)
- 104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 619
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 106
17.1%
- 104
16.8%
1 94
15.2%
5 75
12.1%
2 64
10.3%
9 36
 
5.8%
7 35
 
5.7%
3 30
 
4.8%
4 28
 
4.5%
6 24
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 619
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 106
17.1%
- 104
16.8%
1 94
15.2%
5 75
12.1%
2 64
10.3%
9 36
 
5.8%
7 35
 
5.7%
3 30
 
4.8%
4 28
 
4.5%
6 24
 
3.9%

데이터기준일자
Date

CONSTANT  MISSING 

Distinct1
Distinct (%)0.7%
Missing55
Missing (%)28.1%
Memory size1.7 KiB
Minimum2023-11-20 00:00:00
Maximum2023-11-20 00:00:00
2023-12-12T13:18:09.213491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:18:09.615162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

Correlations

2023-12-12T13:18:09.687733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종전화번호
업종1.0000.000
전화번호0.0001.000

Missing values

2023-12-12T13:18:06.684309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:18:06.793447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T13:18:06.907481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

업종사업체명칭도로명주소전화번호데이터기준일자
0출판사태극도 출판부부산광역시 사하구 감천로142번길 25-4 (감천동)051-292-05712023-11-20
1출판사동문출판기획부산광역시 사하구 낙동대로520번길 1 (하단동)051-206-97852023-11-20
2출판사도서출판 동아기획부산광역시 사하구 낙동대로 542, 213호 (하단동, 대우에덴프라자)<NA>2023-11-20
3출판사(주)도시인쇄문화사부산광역시 사하구 다대로170번길 13 (신평동)051-292-11772023-11-20
4출판사동아대학교출판사부산광역시 사하구 낙동대로550번길 37 (하단동)051-200-6391-22023-11-20
5출판사힌트출판사부산광역시 사하구 사리로 47 (괴정동)051-207-25302023-11-20
6출판사(주)켑스부산광역시 사하구 낙동대로550번길 37 (하단동)051-203-5490-12023-11-20
7출판사한국인간재활공학연구소부산광역시 사하구 승학로3번길 87 (하단동)051-204-50852023-11-20
8출판사국민전화번호부 출판부산광역시 사하구 회화나무길 67 (괴정동)051-204-71142023-11-20
9출판사선원문부산광역시 사하구 사하로141번길 29 (괴정동)051-207-79532023-11-20
업종사업체명칭도로명주소전화번호데이터기준일자
186<NA><NA><NA><NA><NA>
187<NA><NA><NA><NA><NA>
188<NA><NA><NA><NA><NA>
189<NA><NA><NA><NA><NA>
190<NA><NA><NA><NA><NA>
191<NA><NA><NA><NA><NA>
192<NA><NA><NA><NA><NA>
193<NA><NA><NA><NA><NA>
194<NA><NA><NA><NA><NA>
195<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

업종사업체명칭도로명주소전화번호데이터기준일자# duplicates
0<NA><NA><NA><NA><NA>55