Overview

Dataset statistics

Number of variables5
Number of observations5203
Missing cells4480
Missing cells (%)17.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory208.5 KiB
Average record size in memory41.0 B

Variable types

Numeric1
Text3
Categorical1

Dataset

Description2023년 9월 15일 기준의 데이터로, 연구개발특구진흥재단의 입주기업 생산품 데이터입니다.지역, 입주기업명, 생산품 데이터를 보유하고 있습니다.해당 데이터가 보유한 칼럼은 다음과 같습니다.칼럼명 : 번호, 입주기관명, 생산품, 지역, 지구*기타유의사항 : 특구 입주기업 중 생산품 정보를 기입하지 않은 기업은 생산품 정보가 공란임
Author(재)연구개발특구진흥재단
URLhttps://www.data.go.kr/data/15089829/fileData.do

Alerts

번호 is highly overall correlated with 지역High correlation
지역 is highly overall correlated with 번호High correlation
생산품 has 4480 (86.1%) missing valuesMissing
번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 21:45:02.212361
Analysis finished2023-12-12 21:45:03.454265
Duration1.24 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct5203
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2602
Minimum1
Maximum5203
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.9 KiB
2023-12-13T06:45:03.533369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile261.1
Q11301.5
median2602
Q33902.5
95-th percentile4942.9
Maximum5203
Range5202
Interquartile range (IQR)2601

Descriptive statistics

Standard deviation1502.1211
Coefficient of variation (CV)0.57729479
Kurtosis-1.2
Mean2602
Median Absolute Deviation (MAD)1301
Skewness0
Sum13538206
Variance2256367.7
MonotonicityStrictly increasing
2023-12-13T06:45:03.716337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
3458 1
 
< 0.1%
3476 1
 
< 0.1%
3475 1
 
< 0.1%
3474 1
 
< 0.1%
3473 1
 
< 0.1%
3472 1
 
< 0.1%
3471 1
 
< 0.1%
3470 1
 
< 0.1%
3469 1
 
< 0.1%
Other values (5193) 5193
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
5203 1
< 0.1%
5202 1
< 0.1%
5201 1
< 0.1%
5200 1
< 0.1%
5199 1
< 0.1%
5198 1
< 0.1%
5197 1
< 0.1%
5196 1
< 0.1%
5195 1
< 0.1%
5194 1
< 0.1%
Distinct5112
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
2023-12-13T06:45:04.086016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length23
Mean length7.68941
Min length2

Characters and Unicode

Total characters40008
Distinct characters728
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5023 ?
Unique (%)96.5%

Sample

1st rowSK이노베이션(주)환경과학기술원
2nd rowSK에너지(주)
3rd rowSK바이오팜(주)
4th row(주)삼양사
5th row국립문화재연구소
ValueCountFrequency (%)
주식회사 704
 
11.5%
유한회사 28
 
0.5%
21
 
0.3%
농업회사법인 15
 
0.2%
재단법인 5
 
0.1%
태양광발전소 5
 
0.1%
미음공장 4
 
0.1%
tech 4
 
0.1%
기술연구소 3
 
< 0.1%
주)삼양사 3
 
< 0.1%
Other values (5199) 5345
87.1%
2023-12-13T06:45:04.599468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3670
 
9.2%
) 2814
 
7.0%
( 2806
 
7.0%
1728
 
4.3%
1373
 
3.4%
1123
 
2.8%
1007
 
2.5%
961
 
2.4%
938
 
2.3%
868
 
2.2%
Other values (718) 22720
56.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 32517
81.3%
Close Punctuation 2819
 
7.0%
Open Punctuation 2811
 
7.0%
Space Separator 961
 
2.4%
Uppercase Letter 505
 
1.3%
Lowercase Letter 221
 
0.6%
Decimal Number 72
 
0.2%
Other Punctuation 54
 
0.1%
Other Symbol 29
 
0.1%
Dash Punctuation 17
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (647) 20244
62.3%
Uppercase Letter
ValueCountFrequency (%)
E 59
 
11.7%
S 55
 
10.9%
T 42
 
8.3%
G 31
 
6.1%
N 30
 
5.9%
K 30
 
5.9%
C 30
 
5.9%
I 29
 
5.7%
M 26
 
5.1%
O 25
 
5.0%
Other values (15) 148
29.3%
Lowercase Letter
ValueCountFrequency (%)
e 33
14.9%
o 27
12.2%
n 21
9.5%
t 17
 
7.7%
s 15
 
6.8%
r 14
 
6.3%
i 14
 
6.3%
c 14
 
6.3%
a 11
 
5.0%
l 8
 
3.6%
Other values (13) 47
21.3%
Decimal Number
ValueCountFrequency (%)
1 19
26.4%
3 12
16.7%
2 10
13.9%
5 9
12.5%
0 7
 
9.7%
4 6
 
8.3%
9 4
 
5.6%
6 3
 
4.2%
7 1
 
1.4%
8 1
 
1.4%
Other Punctuation
ValueCountFrequency (%)
. 39
72.2%
& 8
 
14.8%
, 5
 
9.3%
: 1
 
1.9%
/ 1
 
1.9%
Close Punctuation
ValueCountFrequency (%)
) 2814
99.8%
] 5
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 2806
99.8%
[ 5
 
0.2%
Space Separator
ValueCountFrequency (%)
961
100.0%
Other Symbol
ValueCountFrequency (%)
29
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 32544
81.3%
Common 6736
 
16.8%
Latin 726
 
1.8%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (646) 20271
62.3%
Latin
ValueCountFrequency (%)
E 59
 
8.1%
S 55
 
7.6%
T 42
 
5.8%
e 33
 
4.5%
G 31
 
4.3%
N 30
 
4.1%
K 30
 
4.1%
C 30
 
4.1%
I 29
 
4.0%
o 27
 
3.7%
Other values (38) 360
49.6%
Common
ValueCountFrequency (%)
) 2814
41.8%
( 2806
41.7%
961
 
14.3%
. 39
 
0.6%
1 19
 
0.3%
- 17
 
0.3%
3 12
 
0.2%
2 10
 
0.1%
5 9
 
0.1%
& 8
 
0.1%
Other values (12) 41
 
0.6%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 32514
81.3%
ASCII 7462
 
18.7%
None 29
 
0.1%
CJK 2
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (644) 20241
62.3%
ASCII
ValueCountFrequency (%)
) 2814
37.7%
( 2806
37.6%
961
 
12.9%
E 59
 
0.8%
S 55
 
0.7%
T 42
 
0.6%
. 39
 
0.5%
e 33
 
0.4%
G 31
 
0.4%
N 30
 
0.4%
Other values (60) 592
 
7.9%
None
ValueCountFrequency (%)
29
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

생산품
Text

MISSING 

Distinct701
Distinct (%)97.0%
Missing4480
Missing (%)86.1%
Memory size40.8 KiB
2023-12-13T06:45:04.905391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length359
Median length79
Mean length17.904564
Min length1

Characters and Unicode

Total characters12945
Distinct characters548
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique691 ?
Unique (%)95.6%

Sample

1st row니들패치
2nd row상수관로 진단 및 정비용 로봇개발
3rd row피뢰기 시험기
4th row의료용Display
5th row수질 분석기
ValueCountFrequency (%)
89
 
4.2%
부품 25
 
1.2%
21
 
1.0%
16
 
0.8%
14
 
0.7%
소프트웨어 13
 
0.6%
자동차 13
 
0.6%
시스템 12
 
0.6%
장비 12
 
0.6%
개발 11
 
0.5%
Other values (1424) 1870
89.2%
2023-12-13T06:45:05.432994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1404
 
10.8%
, 765
 
5.9%
444
 
3.4%
215
 
1.7%
210
 
1.6%
194
 
1.5%
187
 
1.4%
180
 
1.4%
166
 
1.3%
149
 
1.2%
Other values (538) 9031
69.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 9134
70.6%
Space Separator 1404
 
10.8%
Other Punctuation 836
 
6.5%
Uppercase Letter 745
 
5.8%
Lowercase Letter 595
 
4.6%
Decimal Number 104
 
0.8%
Open Punctuation 48
 
0.4%
Close Punctuation 48
 
0.4%
Dash Punctuation 29
 
0.2%
Math Symbol 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
444
 
4.9%
215
 
2.4%
210
 
2.3%
194
 
2.1%
187
 
2.0%
180
 
2.0%
166
 
1.8%
149
 
1.6%
144
 
1.6%
129
 
1.4%
Other values (470) 7116
77.9%
Lowercase Letter
ValueCountFrequency (%)
o 67
11.3%
e 63
10.6%
a 59
9.9%
r 55
9.2%
t 54
9.1%
i 50
 
8.4%
l 33
 
5.5%
s 31
 
5.2%
c 28
 
4.7%
n 28
 
4.7%
Other values (14) 127
21.3%
Uppercase Letter
ValueCountFrequency (%)
C 75
 
10.1%
D 66
 
8.9%
E 61
 
8.2%
A 60
 
8.1%
L 53
 
7.1%
P 51
 
6.8%
R 45
 
6.0%
T 43
 
5.8%
I 40
 
5.4%
S 39
 
5.2%
Other values (13) 212
28.5%
Decimal Number
ValueCountFrequency (%)
0 37
35.6%
3 17
16.3%
1 15
14.4%
2 12
 
11.5%
5 7
 
6.7%
4 6
 
5.8%
9 4
 
3.8%
6 3
 
2.9%
7 3
 
2.9%
Other Punctuation
ValueCountFrequency (%)
, 765
91.5%
/ 36
 
4.3%
. 12
 
1.4%
& 10
 
1.2%
# 5
 
0.6%
; 5
 
0.6%
· 3
 
0.4%
Space Separator
ValueCountFrequency (%)
1404
100.0%
Open Punctuation
ValueCountFrequency (%)
( 48
100.0%
Close Punctuation
ValueCountFrequency (%)
) 48
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 29
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 9133
70.6%
Common 2471
 
19.1%
Latin 1340
 
10.4%
Han 1
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
444
 
4.9%
215
 
2.4%
210
 
2.3%
194
 
2.1%
187
 
2.0%
180
 
2.0%
166
 
1.8%
149
 
1.6%
144
 
1.6%
129
 
1.4%
Other values (469) 7115
77.9%
Latin
ValueCountFrequency (%)
C 75
 
5.6%
o 67
 
5.0%
D 66
 
4.9%
e 63
 
4.7%
E 61
 
4.6%
A 60
 
4.5%
a 59
 
4.4%
r 55
 
4.1%
t 54
 
4.0%
L 53
 
4.0%
Other values (37) 727
54.3%
Common
ValueCountFrequency (%)
1404
56.8%
, 765
31.0%
( 48
 
1.9%
) 48
 
1.9%
0 37
 
1.5%
/ 36
 
1.5%
- 29
 
1.2%
3 17
 
0.7%
1 15
 
0.6%
2 12
 
0.5%
Other values (11) 60
 
2.4%
Han
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 9133
70.6%
ASCII 3808
29.4%
None 3
 
< 0.1%
CJK 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1404
36.9%
, 765
20.1%
C 75
 
2.0%
o 67
 
1.8%
D 66
 
1.7%
e 63
 
1.7%
E 61
 
1.6%
A 60
 
1.6%
a 59
 
1.5%
r 55
 
1.4%
Other values (57) 1133
29.8%
Hangul
ValueCountFrequency (%)
444
 
4.9%
215
 
2.4%
210
 
2.3%
194
 
2.1%
187
 
2.0%
180
 
2.0%
166
 
1.8%
149
 
1.6%
144
 
1.6%
129
 
1.4%
Other values (469) 7115
77.9%
None
ValueCountFrequency (%)
· 3
100.0%
CJK
ValueCountFrequency (%)
1
100.0%

지역
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
대덕
2501 
부산
880 
광주
676 
전북
479 
대구
288 
Other values (11)
379 

Length

Max length10
Median length2
Mean length2.4378243
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row대덕
2nd row대덕
3rd row대덕
4th row대덕
5th row대덕

Common Values

ValueCountFrequency (%)
대덕 2501
48.1%
부산 880
 
16.9%
광주 676
 
13.0%
전북 479
 
9.2%
대구 288
 
5.5%
강소(경남김해) 89
 
1.7%
강소(경북구미) 74
 
1.4%
강소(경북포항) 71
 
1.4%
강소(울산울주) 51
 
1.0%
강소(충북청주) 37
 
0.7%
Other values (6) 57
 
1.1%

Length

2023-12-13T06:45:05.681144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대덕 2501
48.1%
부산 880
 
16.9%
광주 676
 
13.0%
전북 479
 
9.2%
대구 288
 
5.5%
강소(경남김해 89
 
1.7%
강소(경북구미 74
 
1.4%
강소(경북포항 71
 
1.4%
강소(울산울주 51
 
1.0%
강소(충북청주 37
 
0.7%
Other values (6) 57
 
1.1%

지구
Text

Distinct53
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
2023-12-13T06:45:05.941661image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length3
Mean length5.8656544
Min length3

Characters and Unicode

Total characters30519
Distinct characters103
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st row1지구
2nd row1지구
3rd row1지구
4th row1지구
5th row1지구
ValueCountFrequency (%)
2지구 1657
24.9%
1지구 689
10.4%
1단계 672
10.1%
국제산업 559
 
8.4%
물류도시 559
 
8.4%
진곡지구 387
 
5.8%
나노지구 222
 
3.3%
테크노폴리스지구 179
 
2.7%
첨단과학연구단지 154
 
2.3%
미음일반산업단지 144
 
2.2%
Other values (51) 1429
21.5%
2023-12-13T06:45:06.361621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4360
 
14.3%
4019
 
13.2%
2 1660
 
5.4%
1448
 
4.7%
1422
 
4.7%
1 1391
 
4.6%
1047
 
3.4%
979
 
3.2%
( 672
 
2.2%
) 672
 
2.2%
Other values (93) 12849
42.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 23597
77.3%
Decimal Number 3206
 
10.5%
Space Separator 1448
 
4.7%
Open Punctuation 672
 
2.2%
Close Punctuation 672
 
2.2%
Uppercase Letter 580
 
1.9%
Other Punctuation 344
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
Decimal Number
ValueCountFrequency (%)
2 1660
51.8%
1 1391
43.4%
4 111
 
3.5%
3 39
 
1.2%
5 5
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
R 290
50.0%
D 290
50.0%
Other Punctuation
ValueCountFrequency (%)
& 290
84.3%
· 54
 
15.7%
Space Separator
ValueCountFrequency (%)
1448
100.0%
Open Punctuation
ValueCountFrequency (%)
( 672
100.0%
Close Punctuation
ValueCountFrequency (%)
) 672
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 23597
77.3%
Common 6342
 
20.8%
Latin 580
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
Common
ValueCountFrequency (%)
2 1660
26.2%
1448
22.8%
1 1391
21.9%
( 672
10.6%
) 672
10.6%
& 290
 
4.6%
4 111
 
1.8%
· 54
 
0.9%
3 39
 
0.6%
5 5
 
0.1%
Latin
ValueCountFrequency (%)
R 290
50.0%
D 290
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 23597
77.3%
ASCII 6868
 
22.5%
None 54
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
ASCII
ValueCountFrequency (%)
2 1660
24.2%
1448
21.1%
1 1391
20.3%
( 672
9.8%
) 672
9.8%
R 290
 
4.2%
D 290
 
4.2%
& 290
 
4.2%
4 111
 
1.6%
3 39
 
0.6%
None
ValueCountFrequency (%)
· 54
100.0%

Interactions

2023-12-13T06:45:03.126609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:45:06.480889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호지역지구
번호1.0000.8810.960
지역0.8811.0000.993
지구0.9600.9931.000
2023-12-13T06:45:06.591070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호지역
번호1.0000.606
지역0.6061.000

Missing values

2023-12-13T06:45:03.293216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:45:03.407961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호입주기관명생산품지역지구
01SK이노베이션(주)환경과학기술원<NA>대덕1지구
12SK에너지(주)<NA>대덕1지구
23SK바이오팜(주)<NA>대덕1지구
34(주)삼양사<NA>대덕1지구
45국립문화재연구소<NA>대덕1지구
56금호폴리켐<NA>대덕1지구
67(주)윕스<NA>대덕1지구
78(주)씨앤엘<NA>대덕1지구
89(주)제우기술<NA>대덕1지구
910(주)큐니온<NA>대덕1지구
번호입주기관명생산품지역지구
51935194(주)서경산업<NA>강소(충북청주)사업화지구
51945195주식회사 해치텍<NA>강소(충북청주)사업화지구
51955196주식회사이상기술<NA>강소(충북청주)사업화지구
51965197(주)지오비앤에이치<NA>강소(충북청주)사업화지구
51975198(주)시아이솔루션인공지능 반도체 불량분석 광학 현미경강소(충북청주)사업화지구
51985199(재)충북과학기술혁신원<NA>강소(충북청주)사업화지구
51995200(주)유트론<NA>강소(충북청주)사업화지구
52005201주식회사 큐에스랩전기화학적 수소 압축기강소(충북청주)사업화지구
52015202(주)네오세미텍냉각순환수용 스케일제거장비,냉각순환수용 스케일제거장비,세정기, 장비 실시간 모니터링 장치, 펠리클디마운터 등의 장비 및 기타부품, 소모품,세정기, 장비 실시간 모니터링 장치, 펠리클디마운터 등의 장비 및 기타부품, 소모품강소(충북청주)사업화지구
52025203주식회사 딜리셔스마켓히말라야허브솔트강소(충북청주)사업화지구