Overview

Dataset statistics

Number of variables6
Number of observations1481
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory69.6 KiB
Average record size in memory48.1 B

Variable types

Text3
Categorical3

Dataset

Description2020.7월 기준 대전지역 소재 벤처기업 현황
Author기술보증기금
URLhttps://www.data.go.kr/data/15062068/fileData.do

Alerts

지역 has constant value ""Constant
주소 is highly imbalanced (50.9%)Imbalance
업체명 has unique valuesUnique

Reproduction

Analysis started2023-12-12 10:03:46.308378
Analysis finished2023-12-12 10:03:47.245342
Duration0.94 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업체명
Text

UNIQUE 

Distinct1481
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
2023-12-12T19:03:47.473579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length31
Mean length8.2572586
Min length2

Characters and Unicode

Total characters12229
Distinct characters548
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1481 ?
Unique (%)100.0%

Sample

1st row(주)비앤비컴퍼니
2nd row(주)에스알팜
3rd row(주)알앤디웨어
4th row(주)아울네스트
5th row(주)아이리스
ValueCountFrequency (%)
co.,ltd 14
 
0.9%
inc 12
 
0.7%
co 9
 
0.6%
ltd 8
 
0.5%
농업회사법인 5
 
0.3%
4
 
0.2%
corp 3
 
0.2%
엔지니어링 2
 
0.1%
inc.)(주 2
 
0.1%
주)주식회사 2
 
0.1%
Other values (1540) 1541
96.2%
2023-12-12T19:03:48.006060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 1443
 
11.8%
( 1442
 
11.8%
1383
 
11.3%
562
 
4.6%
470
 
3.8%
228
 
1.9%
187
 
1.5%
142
 
1.2%
133
 
1.1%
125
 
1.0%
Other values (538) 6114
50.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8425
68.9%
Close Punctuation 1443
 
11.8%
Open Punctuation 1442
 
11.8%
Uppercase Letter 362
 
3.0%
Lowercase Letter 326
 
2.7%
Space Separator 133
 
1.1%
Other Punctuation 92
 
0.8%
Dash Punctuation 5
 
< 0.1%
Decimal Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1383
 
16.4%
562
 
6.7%
470
 
5.6%
228
 
2.7%
187
 
2.2%
142
 
1.7%
125
 
1.5%
123
 
1.5%
123
 
1.5%
121
 
1.4%
Other values (484) 4961
58.9%
Uppercase Letter
ValueCountFrequency (%)
I 41
11.3%
C 39
10.8%
L 34
 
9.4%
E 33
 
9.1%
A 24
 
6.6%
S 22
 
6.1%
O 21
 
5.8%
N 20
 
5.5%
T 19
 
5.2%
M 17
 
4.7%
Other values (14) 92
25.4%
Lowercase Letter
ValueCountFrequency (%)
o 40
12.3%
e 37
11.3%
t 35
10.7%
n 31
9.5%
c 29
8.9%
d 28
8.6%
i 23
 
7.1%
l 15
 
4.6%
a 15
 
4.6%
r 13
 
4.0%
Other values (12) 60
18.4%
Other Punctuation
ValueCountFrequency (%)
. 63
68.5%
, 22
 
23.9%
& 7
 
7.6%
Close Punctuation
ValueCountFrequency (%)
) 1443
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1442
100.0%
Space Separator
ValueCountFrequency (%)
133
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Decimal Number
ValueCountFrequency (%)
3 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8425
68.9%
Common 3116
 
25.5%
Latin 688
 
5.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1383
 
16.4%
562
 
6.7%
470
 
5.6%
228
 
2.7%
187
 
2.2%
142
 
1.7%
125
 
1.5%
123
 
1.5%
123
 
1.5%
121
 
1.4%
Other values (484) 4961
58.9%
Latin
ValueCountFrequency (%)
I 41
 
6.0%
o 40
 
5.8%
C 39
 
5.7%
e 37
 
5.4%
t 35
 
5.1%
L 34
 
4.9%
E 33
 
4.8%
n 31
 
4.5%
c 29
 
4.2%
d 28
 
4.1%
Other values (36) 341
49.6%
Common
ValueCountFrequency (%)
) 1443
46.3%
( 1442
46.3%
133
 
4.3%
. 63
 
2.0%
, 22
 
0.7%
& 7
 
0.2%
- 5
 
0.2%
3 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8425
68.9%
ASCII 3804
31.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 1443
37.9%
( 1442
37.9%
133
 
3.5%
. 63
 
1.7%
I 41
 
1.1%
o 40
 
1.1%
C 39
 
1.0%
e 37
 
1.0%
t 35
 
0.9%
L 34
 
0.9%
Other values (44) 497
 
13.1%
Hangul
ValueCountFrequency (%)
1383
 
16.4%
562
 
6.7%
470
 
5.6%
228
 
2.7%
187
 
2.2%
142
 
1.7%
125
 
1.5%
123
 
1.5%
123
 
1.5%
121
 
1.4%
Other values (484) 4961
58.9%

지역
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
대전
1481 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대전
2nd row대전
3rd row대전
4th row대전
5th row대전

Common Values

ValueCountFrequency (%)
대전 1481
100.0%

Length

2023-12-12T19:03:48.182465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:03:48.305578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대전 1481
100.0%

주소
Categorical

IMBALANCE 

Distinct9
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
대전광역시 유성구
1016 
대전광역시 대덕구
180 
대전광역시 서구
141 
대전광역시 동구
 
68
대전광역시 중구
 
46
Other values (4)
 
30

Length

Max length9
Median length9
Mean length8.7643484
Min length5

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row대전광역시 유성구
2nd row대전광역시 서구
3rd row대전광역시 유성구
4th row대전광역시 유성구
5th row대전광역시 유성구

Common Values

ValueCountFrequency (%)
대전광역시 유성구 1016
68.6%
대전광역시 대덕구 180
 
12.2%
대전광역시 서구 141
 
9.5%
대전광역시 동구 68
 
4.6%
대전광역시 중구 46
 
3.1%
대전 유성구 25
 
1.7%
대전 서구 3
 
0.2%
대전 동구 1
 
0.1%
대전 대덕구 1
 
0.1%

Length

2023-12-12T19:03:48.440921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:03:48.640318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대전광역시 1451
49.0%
유성구 1041
35.1%
대덕구 181
 
6.1%
서구 144
 
4.9%
동구 69
 
2.3%
중구 46
 
1.6%
대전 30
 
1.0%

업종분류
Categorical

Distinct8
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
제조업
950 
정보처리, S/W
201 
기타
107 
정보처리, S/W
102 
연구개발서비스
 
70
Other values (3)
 
51

Length

Max length18
Median length17
Mean length15.69345
Min length9

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row제조업
2nd row도소매업
3rd row제조업
4th row정보처리, S/W
5th row제조업

Common Values

ValueCountFrequency (%)
제조업 950
64.1%
정보처리, S/W 201
 
13.6%
기타 107
 
7.2%
정보처리, S/W 102
 
6.9%
연구개발서비스 70
 
4.7%
건설, 운수 34
 
2.3%
도소매업 16
 
1.1%
농,어,임,광업 1
 
0.1%

Length

2023-12-12T19:03:48.892752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T19:03:49.043906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
제조업 950
52.3%
정보처리 303
 
16.7%
s/w 303
 
16.7%
기타 107
 
5.9%
연구개발서비스 70
 
3.9%
건설 34
 
1.9%
운수 34
 
1.9%
도소매업 16
 
0.9%
농,어,임,광업 1
 
0.1%
Distinct284
Distinct (%)19.2%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
2023-12-12T19:03:49.446003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length23
Mean length15.95341
Min length6

Characters and Unicode

Total characters23627
Distinct characters301
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)8.4%

Sample

1st row비 및 솔 제조업
2nd row의약품 및 의료용품 소매업
3rd row전자기 측정, 시험 및 분석기구 제조업
4th row응용 소프트웨어 개발 및 공급업
5th row방사선 장치 제조업
ValueCountFrequency (%)
제조업 920
 
13.4%
730
 
10.7%
기타 489
 
7.1%
219
 
3.2%
218
 
3.2%
소프트웨어 202
 
2.9%
개발 201
 
2.9%
공급업 201
 
2.9%
서비스업 159
 
2.3%
응용 126
 
1.8%
Other values (524) 3383
49.4%
2023-12-12T19:03:50.024854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5367
22.7%
1544
 
6.5%
1200
 
5.1%
1157
 
4.9%
1005
 
4.3%
730
 
3.1%
492
 
2.1%
414
 
1.8%
348
 
1.5%
316
 
1.3%
Other values (291) 11054
46.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 18031
76.3%
Space Separator 5367
 
22.7%
Other Punctuation 221
 
0.9%
Close Punctuation 3
 
< 0.1%
Open Punctuation 3
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1544
 
8.6%
1200
 
6.7%
1157
 
6.4%
1005
 
5.6%
730
 
4.0%
492
 
2.7%
414
 
2.3%
348
 
1.9%
316
 
1.8%
314
 
1.7%
Other values (285) 10511
58.3%
Other Punctuation
ValueCountFrequency (%)
, 219
99.1%
. 2
 
0.9%
Space Separator
ValueCountFrequency (%)
5367
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Decimal Number
ValueCountFrequency (%)
1 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 18031
76.3%
Common 5596
 
23.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1544
 
8.6%
1200
 
6.7%
1157
 
6.4%
1005
 
5.6%
730
 
4.0%
492
 
2.7%
414
 
2.3%
348
 
1.9%
316
 
1.8%
314
 
1.7%
Other values (285) 10511
58.3%
Common
ValueCountFrequency (%)
5367
95.9%
, 219
 
3.9%
) 3
 
0.1%
( 3
 
0.1%
. 2
 
< 0.1%
1 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 18022
76.3%
ASCII 5596
 
23.7%
Compat Jamo 9
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5367
95.9%
, 219
 
3.9%
) 3
 
0.1%
( 3
 
0.1%
. 2
 
< 0.1%
1 2
 
< 0.1%
Hangul
ValueCountFrequency (%)
1544
 
8.6%
1200
 
6.7%
1157
 
6.4%
1005
 
5.6%
730
 
4.1%
492
 
2.7%
414
 
2.3%
348
 
1.9%
316
 
1.8%
314
 
1.7%
Other values (284) 10502
58.3%
Compat Jamo
ValueCountFrequency (%)
9
100.0%
Distinct1436
Distinct (%)97.0%
Missing0
Missing (%)0.0%
Memory size11.7 KiB
2023-12-12T19:03:50.390323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length69
Median length52
Mean length17.554355
Min length3

Characters and Unicode

Total characters25998
Distinct characters661
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1417 ?
Unique (%)95.7%

Sample

1st row마스카라,치간, 산업용, 특수목적용기계
2nd row의약품
3rd row물성분석 시스템
4th row소프트웨어
5th row방사선 측정장치제조
ValueCountFrequency (%)
156
 
3.2%
시스템 73
 
1.5%
소프트웨어 62
 
1.3%
개발 52
 
1.1%
49
 
1.0%
솔루션 42
 
0.9%
서비스 39
 
0.8%
부품 36
 
0.7%
s/w 34
 
0.7%
화장품 33
 
0.7%
Other values (2909) 4274
88.1%
2023-12-12T19:03:51.006487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5705
 
21.9%
607
 
2.3%
505
 
1.9%
, 399
 
1.5%
372
 
1.4%
366
 
1.4%
343
 
1.3%
306
 
1.2%
274
 
1.1%
270
 
1.0%
Other values (651) 16851
64.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 17086
65.7%
Space Separator 5705
 
21.9%
Uppercase Letter 1344
 
5.2%
Lowercase Letter 967
 
3.7%
Other Punctuation 531
 
2.0%
Open Punctuation 129
 
0.5%
Close Punctuation 129
 
0.5%
Decimal Number 68
 
0.3%
Dash Punctuation 37
 
0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
607
 
3.6%
505
 
3.0%
372
 
2.2%
366
 
2.1%
343
 
2.0%
306
 
1.8%
274
 
1.6%
270
 
1.6%
239
 
1.4%
233
 
1.4%
Other values (577) 13571
79.4%
Uppercase Letter
ValueCountFrequency (%)
S 195
14.5%
C 124
 
9.2%
D 124
 
9.2%
P 86
 
6.4%
R 77
 
5.7%
I 74
 
5.5%
E 73
 
5.4%
W 73
 
5.4%
T 69
 
5.1%
L 67
 
5.0%
Other values (16) 382
28.4%
Lowercase Letter
ValueCountFrequency (%)
e 128
13.2%
o 89
 
9.2%
a 77
 
8.0%
t 76
 
7.9%
n 72
 
7.4%
i 71
 
7.3%
r 67
 
6.9%
l 65
 
6.7%
s 55
 
5.7%
c 36
 
3.7%
Other values (15) 231
23.9%
Decimal Number
ValueCountFrequency (%)
3 23
33.8%
0 15
22.1%
2 14
20.6%
7 3
 
4.4%
8 3
 
4.4%
1 3
 
4.4%
6 2
 
2.9%
5 2
 
2.9%
4 2
 
2.9%
9 1
 
1.5%
Other Punctuation
ValueCountFrequency (%)
, 399
75.1%
/ 114
 
21.5%
. 9
 
1.7%
& 4
 
0.8%
· 3
 
0.6%
' 1
 
0.2%
? 1
 
0.2%
Math Symbol
ValueCountFrequency (%)
+ 1
50.0%
~ 1
50.0%
Space Separator
ValueCountFrequency (%)
5705
100.0%
Open Punctuation
ValueCountFrequency (%)
( 129
100.0%
Close Punctuation
ValueCountFrequency (%)
) 129
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 17072
65.7%
Common 6601
 
25.4%
Latin 2311
 
8.9%
Han 14
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
607
 
3.6%
505
 
3.0%
372
 
2.2%
366
 
2.1%
343
 
2.0%
306
 
1.8%
274
 
1.6%
270
 
1.6%
239
 
1.4%
233
 
1.4%
Other values (576) 13557
79.4%
Latin
ValueCountFrequency (%)
S 195
 
8.4%
e 128
 
5.5%
C 124
 
5.4%
D 124
 
5.4%
o 89
 
3.9%
P 86
 
3.7%
R 77
 
3.3%
a 77
 
3.3%
t 76
 
3.3%
I 74
 
3.2%
Other values (41) 1261
54.6%
Common
ValueCountFrequency (%)
5705
86.4%
, 399
 
6.0%
( 129
 
2.0%
) 129
 
2.0%
/ 114
 
1.7%
- 37
 
0.6%
3 23
 
0.3%
0 15
 
0.2%
2 14
 
0.2%
. 9
 
0.1%
Other values (13) 27
 
0.4%
Han
ValueCountFrequency (%)
14
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 17071
65.7%
ASCII 8909
34.3%
CJK 14
 
0.1%
None 3
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5705
64.0%
, 399
 
4.5%
S 195
 
2.2%
( 129
 
1.4%
) 129
 
1.4%
e 128
 
1.4%
C 124
 
1.4%
D 124
 
1.4%
/ 114
 
1.3%
o 89
 
1.0%
Other values (63) 1773
 
19.9%
Hangul
ValueCountFrequency (%)
607
 
3.6%
505
 
3.0%
372
 
2.2%
366
 
2.1%
343
 
2.0%
306
 
1.8%
274
 
1.6%
270
 
1.6%
239
 
1.4%
233
 
1.4%
Other values (575) 13556
79.4%
CJK
ValueCountFrequency (%)
14
100.0%
None
ValueCountFrequency (%)
· 3
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%

Correlations

2023-12-12T19:03:51.129671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주소업종분류
주소1.0000.283
업종분류0.2831.000
2023-12-12T19:03:51.212780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종분류주소
업종분류1.0000.143
주소0.1431.000
2023-12-12T19:03:51.309748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주소업종분류
주소1.0000.143
업종분류0.1431.000

Missing values

2023-12-12T19:03:47.084131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T19:03:47.198568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

업체명지역주소업종분류업종명주생산품
0(주)비앤비컴퍼니대전대전광역시 유성구제조업비 및 솔 제조업마스카라,치간, 산업용, 특수목적용기계
1(주)에스알팜대전대전광역시 서구도소매업의약품 및 의료용품 소매업의약품
2(주)알앤디웨어대전대전광역시 유성구제조업전자기 측정, 시험 및 분석기구 제조업물성분석 시스템
3(주)아울네스트대전대전광역시 유성구정보처리, S/W응용 소프트웨어 개발 및 공급업소프트웨어
4(주)아이리스대전대전광역시 유성구제조업방사선 장치 제조업방사선 측정장치제조
5(주)리걸텍대전대전광역시 유성구정보처리, S/W응용 소프트웨어 개발 및 공급업법률데이터 가공서비스
6(주)대경이앤씨대전대전광역시 유성구건설, 운수일반전기 공사업전기, 정보통신공사
7(주)선광패브릭대전대전광역시 동구제조업침구 및 관련제품 제조업침구류 원단류
8(주)칠성건설대전대전광역시 유성구건설, 운수교량, 터널 및 철도 건설업교량, 터널
9(주)쓰리디아이컨즈대전대전광역시 서구제조업인형 및 장난감 제조업3D 피규어
업체명지역주소업종분류업종명주생산품
1471연우금속대전대전광역시 동구제조업절삭가공 및 유사처리업금속가공
1472휴먼켐(주)대전대전광역시 유성구제조업기타 기초 무기 화학물질 제조업화장품 원료
1473(주)젠틸리언대전대전광역시 유성구정보처리, S/W시스템 소프트웨어 개발 및 공급업네트워크 보안 솔루션 외
1474(주)젠토대전대전광역시 유성구정보처리, S/W시스템 소프트웨어 개발 및 공급업통신용 소프트웨어 개발 CPS미들웨어 임베디드 시스템
1475(주)첸트랄대전대전광역시 유성구정보처리, S/W기타 게임 소프트웨어 개발 및 공급업VRight(VR헤드셋)
1476(주)해밀라이트대전대전광역시 중구제조업전시 및 광고용 조명장치 제조업LED 바 LED 조명기구 LED 모기등, LED 광고판
1477(주)지론텍(ZIRON TECH CORP)대전대전광역시 유성구제조업기타 비철금속 제련, 정련 및 합금 제조업희귀금속(Zr, Hf) 및 모합금
1478(주)지토대전대전광역시 유성구정보처리, S/W시스템 소프트웨어 개발 및 공급업영상관련 S/W
1479(주)쩍컴퍼니대전대전광역시 유성구정보처리, S/W응용 소프트웨어 개발 및 공급업아이엠그라운드
1480(주)금영이엔지대전대전광역시 유성구건설, 운수건물용 기계장비 설치 공사업크린룸 공사 GMP 설치 플랜트, 유틸리티 설치