Overview

Dataset statistics

Number of variables5
Number of observations701
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.9 KiB
Average record size in memory42.2 B

Variable types

Numeric2
Categorical1
Text2

Dataset

Description중소벤처기업진흥공단에서 추진 중인 '구조혁신지원사업'의 구조혁신 진단 기업 산업분류, 산업분류코드 및 주요제품 현황
Author중소벤처기업진흥공단
URLhttps://www.data.go.kr/data/15124145/fileData.do

Alerts

순번 has unique valuesUnique

Reproduction

Analysis started2023-12-11 23:44:06.344706
Analysis finished2023-12-11 23:44:07.533945
Duration1.19 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

UNIQUE 

Distinct701
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean351
Minimum1
Maximum701
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2023-12-12T08:44:07.616249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile36
Q1176
median351
Q3526
95-th percentile666
Maximum701
Range700
Interquartile range (IQR)350

Descriptive statistics

Standard deviation202.50556
Coefficient of variation (CV)0.5769389
Kurtosis-1.2
Mean351
Median Absolute Deviation (MAD)175
Skewness0
Sum246051
Variance41008.5
MonotonicityStrictly increasing
2023-12-12T08:44:07.759239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
472 1
 
0.1%
464 1
 
0.1%
465 1
 
0.1%
466 1
 
0.1%
467 1
 
0.1%
468 1
 
0.1%
469 1
 
0.1%
470 1
 
0.1%
471 1
 
0.1%
Other values (691) 691
98.6%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
701 1
0.1%
700 1
0.1%
699 1
0.1%
698 1
0.1%
697 1
0.1%
696 1
0.1%
695 1
0.1%
694 1
0.1%
693 1
0.1%
692 1
0.1%

지역
Categorical

Distinct16
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
경기
185 
서울
103 
경남
63 
경북
56 
전남
50 
Other values (11)
244 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원
2nd row인천
3rd row경기
4th row경기
5th row전북

Common Values

ValueCountFrequency (%)
경기 185
26.4%
서울 103
14.7%
경남 63
 
9.0%
경북 56
 
8.0%
전남 50
 
7.1%
인천 44
 
6.3%
부산 43
 
6.1%
충북 27
 
3.9%
충남 25
 
3.6%
강원 23
 
3.3%
Other values (6) 82
11.7%

Length

2023-12-12T08:44:07.916671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기 185
26.4%
서울 103
14.7%
경남 63
 
9.0%
경북 56
 
8.0%
전남 50
 
7.1%
인천 44
 
6.3%
부산 43
 
6.1%
충북 27
 
3.9%
충남 25
 
3.6%
강원 23
 
3.3%
Other values (6) 82
11.7%
Distinct322
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2023-12-12T08:44:08.187804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length29
Median length22
Mean length15.49786
Min length3

Characters and Unicode

Total characters10864
Distinct characters301
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique200 ?
Unique (%)28.5%

Sample

1st row일반 통신 공사업
2nd row1차 금속제품 도매업
3rd row기계 및 장비 중개업
4th row주형 및 금형 제조업
5th row합성수지 및 기타 플라스틱 물질 제조업
ValueCountFrequency (%)
제조업 426
 
12.6%
기타 272
 
8.0%
265
 
7.8%
167
 
4.9%
165
 
4.9%
서비스업 72
 
2.1%
부품 52
 
1.5%
도매업 51
 
1.5%
금속 43
 
1.3%
안된 40
 
1.2%
Other values (542) 1829
54.1%
2023-12-12T08:44:08.686753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2681
24.7%
721
 
6.6%
587
 
5.4%
478
 
4.4%
471
 
4.3%
283
 
2.6%
273
 
2.5%
265
 
2.4%
180
 
1.7%
178
 
1.6%
Other values (291) 4747
43.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8104
74.6%
Space Separator 2681
 
24.7%
Other Punctuation 59
 
0.5%
Decimal Number 8
 
0.1%
Open Punctuation 6
 
0.1%
Close Punctuation 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
721
 
8.9%
587
 
7.2%
478
 
5.9%
471
 
5.8%
283
 
3.5%
273
 
3.4%
265
 
3.3%
180
 
2.2%
178
 
2.2%
159
 
2.0%
Other values (286) 4509
55.6%
Space Separator
ValueCountFrequency (%)
2681
100.0%
Other Punctuation
ValueCountFrequency (%)
, 59
100.0%
Decimal Number
ValueCountFrequency (%)
1 8
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8104
74.6%
Common 2760
 
25.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
721
 
8.9%
587
 
7.2%
478
 
5.9%
471
 
5.8%
283
 
3.5%
273
 
3.4%
265
 
3.3%
180
 
2.2%
178
 
2.2%
159
 
2.0%
Other values (286) 4509
55.6%
Common
ValueCountFrequency (%)
2681
97.1%
, 59
 
2.1%
1 8
 
0.3%
( 6
 
0.2%
) 6
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8084
74.4%
ASCII 2760
 
25.4%
Compat Jamo 20
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2681
97.1%
, 59
 
2.1%
1 8
 
0.3%
( 6
 
0.2%
) 6
 
0.2%
Hangul
ValueCountFrequency (%)
721
 
8.9%
587
 
7.3%
478
 
5.9%
471
 
5.8%
283
 
3.5%
273
 
3.4%
265
 
3.3%
180
 
2.2%
178
 
2.2%
159
 
2.0%
Other values (285) 4489
55.5%
Compat Jamo
ValueCountFrequency (%)
20
100.0%

산업분류코드
Real number (ℝ)

Distinct322
Distinct (%)45.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33842.745
Minimum1122
Maximum96911
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.3 KiB
2023-12-12T08:44:08.857261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1122
5-th percentile10713
Q124311
median29133
Q346106
95-th percentile71531
Maximum96911
Range95789
Interquartile range (IQR)21795

Descriptive statistics

Standard deviation17201.092
Coefficient of variation (CV)0.50826528
Kurtosis0.80283068
Mean33842.745
Median Absolute Deviation (MAD)6842
Skewness1.0551182
Sum23723764
Variance2.9587757 × 108
MonotonicityNot monotonic
2023-12-12T08:44:09.003236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26299 24
 
3.4%
25999 17
 
2.4%
62021 14
 
2.0%
62010 14
 
2.0%
33999 14
 
2.0%
30399 13
 
1.9%
22299 13
 
1.9%
28123 11
 
1.6%
29199 10
 
1.4%
46800 10
 
1.4%
Other values (312) 561
80.0%
ValueCountFrequency (%)
1122 1
 
0.1%
1231 1
 
0.1%
3220 1
 
0.1%
10121 6
0.9%
10122 4
0.6%
10129 2
 
0.3%
10219 4
0.6%
10220 2
 
0.3%
10301 1
 
0.1%
10309 4
0.6%
ValueCountFrequency (%)
96911 1
 
0.1%
95120 1
 
0.1%
90290 1
 
0.1%
90199 1
 
0.1%
85709 4
0.6%
85632 1
 
0.1%
85503 1
 
0.1%
76310 1
 
0.1%
75999 2
0.3%
74220 1
 
0.1%
Distinct691
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
2023-12-12T08:44:09.325989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length64
Median length43
Mean length13.67903
Min length1

Characters and Unicode

Total characters9589
Distinct characters631
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique684 ?
Unique (%)97.6%

Sample

1st row정보통신공사업
2nd row냉연, 도금강판
3rd row에스프레소머신
4th row사출,다이캐스팅 금형제조
5th row우레탄 바닥재
ValueCountFrequency (%)
106
 
5.7%
부품 28
 
1.5%
25
 
1.3%
18
 
1.0%
제조 17
 
0.9%
시스템 12
 
0.6%
자동차 11
 
0.6%
개발 10
 
0.5%
장비 9
 
0.5%
소프트웨어 9
 
0.5%
Other values (1368) 1619
86.9%
2023-12-12T08:44:10.388422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1176
 
12.3%
, 390
 
4.1%
243
 
2.5%
179
 
1.9%
177
 
1.8%
148
 
1.5%
128
 
1.3%
126
 
1.3%
125
 
1.3%
118
 
1.2%
Other values (621) 6779
70.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 7192
75.0%
Space Separator 1176
 
12.3%
Other Punctuation 450
 
4.7%
Uppercase Letter 376
 
3.9%
Lowercase Letter 281
 
2.9%
Open Punctuation 38
 
0.4%
Close Punctuation 38
 
0.4%
Decimal Number 31
 
0.3%
Dash Punctuation 5
 
0.1%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
243
 
3.4%
179
 
2.5%
177
 
2.5%
148
 
2.1%
128
 
1.8%
126
 
1.8%
125
 
1.7%
118
 
1.6%
110
 
1.5%
102
 
1.4%
Other values (548) 5736
79.8%
Lowercase Letter
ValueCountFrequency (%)
e 41
14.6%
r 31
11.0%
o 23
 
8.2%
t 18
 
6.4%
i 18
 
6.4%
p 16
 
5.7%
a 16
 
5.7%
l 16
 
5.7%
c 16
 
5.7%
n 14
 
5.0%
Other values (15) 72
25.6%
Uppercase Letter
ValueCountFrequency (%)
E 36
 
9.6%
C 30
 
8.0%
A 27
 
7.2%
S 26
 
6.9%
I 26
 
6.9%
P 26
 
6.9%
R 25
 
6.6%
D 25
 
6.6%
T 21
 
5.6%
F 21
 
5.6%
Other values (15) 113
30.1%
Decimal Number
ValueCountFrequency (%)
2 8
25.8%
3 5
16.1%
5 4
12.9%
0 4
12.9%
8 3
 
9.7%
4 2
 
6.5%
1 2
 
6.5%
7 1
 
3.2%
6 1
 
3.2%
9 1
 
3.2%
Other Punctuation
ValueCountFrequency (%)
, 390
86.7%
/ 28
 
6.2%
. 21
 
4.7%
& 5
 
1.1%
" 2
 
0.4%
' 2
 
0.4%
; 1
 
0.2%
: 1
 
0.2%
Space Separator
ValueCountFrequency (%)
1176
100.0%
Open Punctuation
ValueCountFrequency (%)
( 38
100.0%
Close Punctuation
ValueCountFrequency (%)
) 38
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Math Symbol
ValueCountFrequency (%)
+ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7191
75.0%
Common 1740
 
18.1%
Latin 657
 
6.9%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
243
 
3.4%
179
 
2.5%
177
 
2.5%
148
 
2.1%
128
 
1.8%
126
 
1.8%
125
 
1.7%
118
 
1.6%
110
 
1.5%
102
 
1.4%
Other values (547) 5735
79.8%
Latin
ValueCountFrequency (%)
e 41
 
6.2%
E 36
 
5.5%
r 31
 
4.7%
C 30
 
4.6%
A 27
 
4.1%
S 26
 
4.0%
I 26
 
4.0%
P 26
 
4.0%
R 25
 
3.8%
D 25
 
3.8%
Other values (40) 364
55.4%
Common
ValueCountFrequency (%)
1176
67.6%
, 390
 
22.4%
( 38
 
2.2%
) 38
 
2.2%
/ 28
 
1.6%
. 21
 
1.2%
2 8
 
0.5%
- 5
 
0.3%
3 5
 
0.3%
& 5
 
0.3%
Other values (13) 26
 
1.5%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 7191
75.0%
ASCII 2397
 
25.0%
CJK 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1176
49.1%
, 390
 
16.3%
e 41
 
1.7%
( 38
 
1.6%
) 38
 
1.6%
E 36
 
1.5%
r 31
 
1.3%
C 30
 
1.3%
/ 28
 
1.2%
A 27
 
1.1%
Other values (63) 562
23.4%
Hangul
ValueCountFrequency (%)
243
 
3.4%
179
 
2.5%
177
 
2.5%
148
 
2.1%
128
 
1.8%
126
 
1.8%
125
 
1.7%
118
 
1.6%
110
 
1.5%
102
 
1.4%
Other values (547) 5735
79.8%
CJK
ValueCountFrequency (%)
1
100.0%

Interactions

2023-12-12T08:44:07.090857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:44:06.864371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:44:07.221156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:44:06.979040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:44:10.506346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번지역산업분류코드
순번1.0000.3130.100
지역0.3131.0000.357
산업분류코드0.1000.3571.000
2023-12-12T08:44:10.596926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번산업분류코드지역
순번1.000-0.0580.128
산업분류코드-0.0581.0000.146
지역0.1280.1461.000

Missing values

2023-12-12T08:44:07.343961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:44:07.492433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

순번지역산업분류산업분류코드주요제품
01강원일반 통신 공사업42321정보통신공사업
12인천1차 금속제품 도매업46721냉연, 도금강판
23경기기계 및 장비 중개업46106에스프레소머신
34경기주형 및 금형 제조업29294사출,다이캐스팅 금형제조
45전북합성수지 및 기타 플라스틱 물질 제조업20202우레탄 바닥재
56경기기타 전문 서비스업71600전기설계용역서비스
67부산컴퓨터 시스템 통합 자문 및 구축 서비스업62021SW개발, SI컨설팅, SI구축
78경기일반 통신 공사업42321구내방송장비
89서울컴퓨터 프로그래밍 서비스업62010소프트웨어
910강원그 외 기타 분류 안된 금속 가공제품 제조업25999낙석방지책
순번지역산업분류산업분류코드주요제품
691692인천단미사료 및 기타 사료 제조업10802반려동물 단미사료 및 용품
692693경기그 외 기타 의복 액세서리 제조업14499여성의류
693694경남그 외 기타 분류 안된 화학제품 제조업20499피톤치드오일 살균탈취제, 배식이섬유(페어파우더) 제품
694695경남코크스 및 관련제품 제조업19101씨콜 외 다수
695696경기그 외 기타 플라스틱 제품 제조업22299자동차에 들아가는 여러가지 부품
696697전남선박 구성 부분품 제조업31114안전발판(족장)
697698인천영화, 비디오물 및 방송 프로그램 제작 관련 서비스업59120미디어콘텐츠영상
698699경기반도체 제조용 기계 제조업29271H/J, HotN2, 3중배관
699700경기기타 일반 기계 및 장비 수리업34019산업용 로봇 제조업
700701울산기타 건축용 플라스틱 조립제품 제조업22229pp보드,epp블럭