Overview

Dataset statistics

Number of variables5
Number of observations1075
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory43.2 KiB
Average record size in memory41.1 B

Variable types

Numeric1
Categorical3
Text1

Dataset

Description벤처확인기관을 통해 벤처확인을 받은 기업 중 연구개발유형 요건을 갖춘 기업의 지역별 업종분류 및 현황 데이터입니다. 지역별 업종 분류를 세분화하여 표시하였습니다.
Author중소벤처기업진흥공단
URLhttps://www.data.go.kr/data/15107082/fileData.do

Alerts

벤처확인유형 has constant value ""Constant
구분 is highly overall correlated with 지역High correlation
지역 is highly overall correlated with 구분High correlation
업종분류 is highly imbalanced (54.4%)Imbalance
구분 has unique valuesUnique

Reproduction

Analysis started2023-12-12 02:10:19.948809
Analysis finished2023-12-12 02:10:20.517040
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1075
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean538
Minimum1
Maximum1075
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.6 KiB
2023-12-12T11:10:20.600046image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile54.7
Q1269.5
median538
Q3806.5
95-th percentile1021.3
Maximum1075
Range1074
Interquartile range (IQR)537

Descriptive statistics

Standard deviation310.47007
Coefficient of variation (CV)0.57708192
Kurtosis-1.2
Mean538
Median Absolute Deviation (MAD)269
Skewness0
Sum578350
Variance96391.667
MonotonicityStrictly increasing
2023-12-12T11:10:20.772680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
724 1
 
0.1%
710 1
 
0.1%
711 1
 
0.1%
712 1
 
0.1%
713 1
 
0.1%
714 1
 
0.1%
715 1
 
0.1%
716 1
 
0.1%
717 1
 
0.1%
Other values (1065) 1065
99.1%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
1075 1
0.1%
1074 1
0.1%
1073 1
0.1%
1072 1
0.1%
1071 1
0.1%
1070 1
0.1%
1069 1
0.1%
1068 1
0.1%
1067 1
0.1%
1066 1
0.1%

벤처확인유형
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
연구개발유형
1075 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row연구개발유형
2nd row연구개발유형
3rd row연구개발유형
4th row연구개발유형
5th row연구개발유형

Common Values

ValueCountFrequency (%)
연구개발유형 1075
100.0%

Length

2023-12-12T11:10:20.941317image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:10:21.059752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
연구개발유형 1075
100.0%

지역
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
경기
214 
서울
144 
인천
75 
대전
73 
부산
64 
Other values (12)
505 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원
2nd row강원
3rd row강원
4th row강원
5th row강원

Common Values

ValueCountFrequency (%)
경기 214
19.9%
서울 144
13.4%
인천 75
 
7.0%
대전 73
 
6.8%
부산 64
 
6.0%
경남 61
 
5.7%
경북 58
 
5.4%
충남 54
 
5.0%
대구 53
 
4.9%
전북 51
 
4.7%
Other values (7) 228
21.2%

Length

2023-12-12T11:10:21.150538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 214
19.9%
서울 144
13.4%
인천 75
 
7.0%
대전 73
 
6.8%
부산 64
 
6.0%
경남 61
 
5.7%
경북 58
 
5.4%
충남 54
 
5.0%
대구 53
 
4.9%
전북 51
 
4.7%
Other values (7) 228
21.2%

업종분류
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
제조업
772 
기타
99 
정보처리SW
88 
연구개발서비스
 
52
건설운수
 
33
Other values (5)
 
31

Length

Max length7
Median length3
Mean length3.4139535
Min length2

Unique

Unique2 ?
Unique (%)0.2%

Sample

1st row제조업
2nd row기타
3rd row제조업
4th row기타
5th row제조업

Common Values

ValueCountFrequency (%)
제조업 772
71.8%
기타 99
 
9.2%
정보처리SW 88
 
8.2%
연구개발서비스 52
 
4.8%
건설운수 33
 
3.1%
도소매업 19
 
1.8%
농어임광업 7
 
0.7%
서비스업 3
 
0.3%
운수및창고업 1
 
0.1%
도매업 1
 
0.1%

Length

2023-12-12T11:10:21.263703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T11:10:21.414772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
제조업 772
71.8%
기타 99
 
9.2%
정보처리sw 88
 
8.2%
연구개발서비스 52
 
4.8%
건설운수 33
 
3.1%
도소매업 19
 
1.8%
농어임광업 7
 
0.7%
서비스업 3
 
0.3%
운수및창고업 1
 
0.1%
도매업 1
 
0.1%
Distinct384
Distinct (%)35.7%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2023-12-12T11:10:21.658679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length18
Mean length11.541395
Min length3

Characters and Unicode

Total characters12407
Distinct characters321
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique177 ?
Unique (%)16.5%

Sample

1st row건강기능식품제조업
2nd row건물및토목엔지니어링서비스업
3rd row그외기타달리분류되지않은제품제조업
4th row그외기타분류안된전문과학및기술서비스업
5th row그외기타분류안된화학제품제조업
ValueCountFrequency (%)
그외기타전자부품제조업 15
 
1.4%
컴퓨터프로그래밍서비스업 15
 
1.4%
그외기타특수목적용기계제조업 15
 
1.4%
응용소프트웨어개발및공급업 15
 
1.4%
시스템소프트웨어개발및공급업 14
 
1.3%
컴퓨터시스템통합자문및구축서비스업 13
 
1.2%
기타무선통신장비제조업 13
 
1.2%
그외기타분류안된화학제품제조업 12
 
1.1%
배전반및전기자동제어반제조업 12
 
1.1%
화장품제조업 12
 
1.1%
Other values (374) 939
87.3%
2023-12-12T11:10:22.088742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1126
 
9.1%
942
 
7.6%
831
 
6.7%
814
 
6.6%
449
 
3.6%
363
 
2.9%
271
 
2.2%
255
 
2.1%
209
 
1.7%
204
 
1.6%
Other values (311) 6943
56.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 12401
> 99.9%
Decimal Number 4
 
< 0.1%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1126
 
9.1%
942
 
7.6%
831
 
6.7%
814
 
6.6%
449
 
3.6%
363
 
2.9%
271
 
2.2%
255
 
2.1%
209
 
1.7%
204
 
1.6%
Other values (309) 6937
55.9%
Decimal Number
ValueCountFrequency (%)
1 4
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 12401
> 99.9%
Common 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1126
 
9.1%
942
 
7.6%
831
 
6.7%
814
 
6.6%
449
 
3.6%
363
 
2.9%
271
 
2.2%
255
 
2.1%
209
 
1.7%
204
 
1.6%
Other values (309) 6937
55.9%
Common
ValueCountFrequency (%)
1 4
66.7%
. 2
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 12391
99.9%
Compat Jamo 10
 
0.1%
ASCII 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1126
 
9.1%
942
 
7.6%
831
 
6.7%
814
 
6.6%
449
 
3.6%
363
 
2.9%
271
 
2.2%
255
 
2.1%
209
 
1.7%
204
 
1.6%
Other values (308) 6927
55.9%
Compat Jamo
ValueCountFrequency (%)
10
100.0%
ASCII
ValueCountFrequency (%)
1 4
66.7%
. 2
33.3%

Interactions

2023-12-12T11:10:20.257104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:10:22.209196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분지역업종분류
구분1.0000.9670.211
지역0.9671.0000.107
업종분류0.2110.1071.000
2023-12-12T11:10:22.296148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
지역업종분류
지역1.0000.041
업종분류0.0411.000
2023-12-12T11:10:22.389833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분지역업종분류
구분1.0000.8470.066
지역0.8471.0000.041
업종분류0.0660.0411.000

Missing values

2023-12-12T11:10:20.374511image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:10:20.474207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분벤처확인유형지역업종분류업종명
01연구개발유형강원제조업건강기능식품제조업
12연구개발유형강원기타건물및토목엔지니어링서비스업
23연구개발유형강원제조업그외기타달리분류되지않은제품제조업
34연구개발유형강원기타그외기타분류안된전문과학및기술서비스업
45연구개발유형강원제조업그외기타분류안된화학제품제조업
56연구개발유형강원제조업그외기타의료용기기제조업
67연구개발유형강원제조업그외기타전자부품제조업
78연구개발유형강원제조업그외기타특수목적용기계제조업
89연구개발유형강원제조업기타무선통신장비제조업
910연구개발유형강원제조업기타산업용유리제품제조업
구분벤처확인유형지역업종분류업종명
10651066연구개발유형충북연구개발서비스의학및약학연구개발업
10661067연구개발유형충북연구개발서비스자연과학및공학융합연구개발업
10671068연구개발유형충북제조업전기경보및신호장치제조업
10681069연구개발유형충북제조업종이포대및가방제조업
10691070연구개발유형충북제조업질소화합물질소인산및칼리질화학비료제조업
10701071연구개발유형충북제조업축전지제조업
10711072연구개발유형충북제조업컴퓨터제조업
10721073연구개발유형충북정보처리SW컴퓨터시스템통합자문및구축서비스업
10731074연구개발유형충북제조업합성수지및기타플라스틱물질제조업
10741075연구개발유형충북제조업화장품제조업