Overview

Dataset statistics

Number of variables4
Number of observations5203
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory167.8 KiB
Average record size in memory33.0 B

Variable types

Numeric1
Categorical1
Text2

Dataset

Description2023년 9월 15일 기준의 데이터로, 입주관리시스템 기준 연구개발특구의 입주기업 및 기관현황에 대한 데이터입니다.입주기관명과 주소 등의 데이터를 보유하고 있습니다.해당 데이터가 보유한 칼럼은 다음과 같습니다.칼럼명 : 번호, 지역, 기관명, 지구
Author(재)연구개발특구진흥재단
URLhttps://www.data.go.kr/data/15083254/fileData.do

Alerts

번호 is highly overall correlated with 지역High correlation
지역 is highly overall correlated with 번호High correlation
번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 16:50:12.224799
Analysis finished2023-12-12 16:50:13.078983
Duration0.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct5203
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2602
Minimum1
Maximum5203
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size45.9 KiB
2023-12-13T01:50:13.191126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile261.1
Q11301.5
median2602
Q33902.5
95-th percentile4942.9
Maximum5203
Range5202
Interquartile range (IQR)2601

Descriptive statistics

Standard deviation1502.1211
Coefficient of variation (CV)0.57729479
Kurtosis-1.2
Mean2602
Median Absolute Deviation (MAD)1301
Skewness0
Sum13538206
Variance2256367.7
MonotonicityStrictly increasing
2023-12-13T01:50:13.401257image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
< 0.1%
3458 1
 
< 0.1%
3476 1
 
< 0.1%
3475 1
 
< 0.1%
3474 1
 
< 0.1%
3473 1
 
< 0.1%
3472 1
 
< 0.1%
3471 1
 
< 0.1%
3470 1
 
< 0.1%
3469 1
 
< 0.1%
Other values (5193) 5193
99.8%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
10 1
< 0.1%
ValueCountFrequency (%)
5203 1
< 0.1%
5202 1
< 0.1%
5201 1
< 0.1%
5200 1
< 0.1%
5199 1
< 0.1%
5198 1
< 0.1%
5197 1
< 0.1%
5196 1
< 0.1%
5195 1
< 0.1%
5194 1
< 0.1%

지역
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
대덕
2501 
부산
880 
광주
676 
전북
479 
대구
288 
Other values (11)
379 

Length

Max length10
Median length2
Mean length2.4378243
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row대덕
2nd row대덕
3rd row대덕
4th row대덕
5th row대덕

Common Values

ValueCountFrequency (%)
대덕 2501
48.1%
부산 880
 
16.9%
광주 676
 
13.0%
전북 479
 
9.2%
대구 288
 
5.5%
강소(경남김해) 89
 
1.7%
강소(경북구미) 74
 
1.4%
강소(경북포항) 71
 
1.4%
강소(울산울주) 51
 
1.0%
강소(충북청주) 37
 
0.7%
Other values (6) 57
 
1.1%

Length

2023-12-13T01:50:13.629108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대덕 2501
48.1%
부산 880
 
16.9%
광주 676
 
13.0%
전북 479
 
9.2%
대구 288
 
5.5%
강소(경남김해 89
 
1.7%
강소(경북구미 74
 
1.4%
강소(경북포항 71
 
1.4%
강소(울산울주 51
 
1.0%
강소(충북청주 37
 
0.7%
Other values (6) 57
 
1.1%
Distinct5112
Distinct (%)98.3%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
2023-12-13T01:50:14.042775image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length23
Mean length7.68941
Min length2

Characters and Unicode

Total characters40008
Distinct characters728
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5023 ?
Unique (%)96.5%

Sample

1st rowSK이노베이션(주)환경과학기술원
2nd rowSK에너지(주)
3rd rowSK바이오팜(주)
4th row(주)삼양사
5th row국립문화재연구소
ValueCountFrequency (%)
주식회사 704
 
11.5%
유한회사 28
 
0.5%
21
 
0.3%
농업회사법인 15
 
0.2%
재단법인 5
 
0.1%
태양광발전소 5
 
0.1%
미음공장 4
 
0.1%
tech 4
 
0.1%
기술연구소 3
 
< 0.1%
주)삼양사 3
 
< 0.1%
Other values (5199) 5345
87.1%
2023-12-13T01:50:14.651594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3670
 
9.2%
) 2814
 
7.0%
( 2806
 
7.0%
1728
 
4.3%
1373
 
3.4%
1123
 
2.8%
1007
 
2.5%
961
 
2.4%
938
 
2.3%
868
 
2.2%
Other values (718) 22720
56.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 32517
81.3%
Close Punctuation 2819
 
7.0%
Open Punctuation 2811
 
7.0%
Space Separator 961
 
2.4%
Uppercase Letter 505
 
1.3%
Lowercase Letter 221
 
0.6%
Decimal Number 72
 
0.2%
Other Punctuation 54
 
0.1%
Other Symbol 29
 
0.1%
Dash Punctuation 17
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (647) 20244
62.3%
Uppercase Letter
ValueCountFrequency (%)
E 59
 
11.7%
S 55
 
10.9%
T 42
 
8.3%
G 31
 
6.1%
N 30
 
5.9%
K 30
 
5.9%
C 30
 
5.9%
I 29
 
5.7%
M 26
 
5.1%
O 25
 
5.0%
Other values (15) 148
29.3%
Lowercase Letter
ValueCountFrequency (%)
e 33
14.9%
o 27
12.2%
n 21
9.5%
t 17
 
7.7%
s 15
 
6.8%
r 14
 
6.3%
i 14
 
6.3%
c 14
 
6.3%
a 11
 
5.0%
l 8
 
3.6%
Other values (13) 47
21.3%
Decimal Number
ValueCountFrequency (%)
1 19
26.4%
3 12
16.7%
2 10
13.9%
5 9
12.5%
0 7
 
9.7%
4 6
 
8.3%
9 4
 
5.6%
6 3
 
4.2%
7 1
 
1.4%
8 1
 
1.4%
Other Punctuation
ValueCountFrequency (%)
. 39
72.2%
& 8
 
14.8%
, 5
 
9.3%
: 1
 
1.9%
/ 1
 
1.9%
Close Punctuation
ValueCountFrequency (%)
) 2814
99.8%
] 5
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 2806
99.8%
[ 5
 
0.2%
Space Separator
ValueCountFrequency (%)
961
100.0%
Other Symbol
ValueCountFrequency (%)
29
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 17
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 32544
81.3%
Common 6736
 
16.8%
Latin 726
 
1.8%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (646) 20271
62.3%
Latin
ValueCountFrequency (%)
E 59
 
8.1%
S 55
 
7.6%
T 42
 
5.8%
e 33
 
4.5%
G 31
 
4.3%
N 30
 
4.1%
K 30
 
4.1%
C 30
 
4.1%
I 29
 
4.0%
o 27
 
3.7%
Other values (38) 360
49.6%
Common
ValueCountFrequency (%)
) 2814
41.8%
( 2806
41.7%
961
 
14.3%
. 39
 
0.6%
1 19
 
0.3%
- 17
 
0.3%
3 12
 
0.2%
2 10
 
0.1%
5 9
 
0.1%
& 8
 
0.1%
Other values (12) 41
 
0.6%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 32514
81.3%
ASCII 7462
 
18.7%
None 29
 
0.1%
CJK 2
 
< 0.1%
Compat Jamo 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3670
 
11.3%
1728
 
5.3%
1373
 
4.2%
1123
 
3.5%
1007
 
3.1%
938
 
2.9%
868
 
2.7%
556
 
1.7%
515
 
1.6%
495
 
1.5%
Other values (644) 20241
62.3%
ASCII
ValueCountFrequency (%)
) 2814
37.7%
( 2806
37.6%
961
 
12.9%
E 59
 
0.8%
S 55
 
0.7%
T 42
 
0.6%
. 39
 
0.5%
e 33
 
0.4%
G 31
 
0.4%
N 30
 
0.4%
Other values (60) 592
 
7.9%
None
ValueCountFrequency (%)
29
100.0%
Compat Jamo
ValueCountFrequency (%)
1
100.0%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%

지구
Text

Distinct53
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size40.8 KiB
2023-12-13T01:50:14.959957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length3
Mean length5.8656544
Min length3

Characters and Unicode

Total characters30519
Distinct characters103
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st row1지구
2nd row1지구
3rd row1지구
4th row1지구
5th row1지구
ValueCountFrequency (%)
2지구 1657
24.9%
1지구 689
10.4%
1단계 672
10.1%
국제산업 559
 
8.4%
물류도시 559
 
8.4%
진곡지구 387
 
5.8%
나노지구 222
 
3.3%
테크노폴리스지구 179
 
2.7%
첨단과학연구단지 154
 
2.3%
미음일반산업단지 144
 
2.2%
Other values (51) 1429
21.5%
2023-12-13T01:50:15.374447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4360
 
14.3%
4019
 
13.2%
2 1660
 
5.4%
1448
 
4.7%
1422
 
4.7%
1 1391
 
4.6%
1047
 
3.4%
979
 
3.2%
( 672
 
2.2%
) 672
 
2.2%
Other values (93) 12849
42.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 23597
77.3%
Decimal Number 3206
 
10.5%
Space Separator 1448
 
4.7%
Open Punctuation 672
 
2.2%
Close Punctuation 672
 
2.2%
Uppercase Letter 580
 
1.9%
Other Punctuation 344
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
Decimal Number
ValueCountFrequency (%)
2 1660
51.8%
1 1391
43.4%
4 111
 
3.5%
3 39
 
1.2%
5 5
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
R 290
50.0%
D 290
50.0%
Other Punctuation
ValueCountFrequency (%)
& 290
84.3%
· 54
 
15.7%
Space Separator
ValueCountFrequency (%)
1448
100.0%
Open Punctuation
ValueCountFrequency (%)
( 672
100.0%
Close Punctuation
ValueCountFrequency (%)
) 672
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 23597
77.3%
Common 6342
 
20.8%
Latin 580
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
Common
ValueCountFrequency (%)
2 1660
26.2%
1448
22.8%
1 1391
21.9%
( 672
10.6%
) 672
10.6%
& 290
 
4.6%
4 111
 
1.8%
· 54
 
0.9%
3 39
 
0.6%
5 5
 
0.1%
Latin
ValueCountFrequency (%)
R 290
50.0%
D 290
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 23597
77.3%
ASCII 6868
 
22.5%
None 54
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4360
18.5%
4019
17.0%
1422
 
6.0%
1047
 
4.4%
979
 
4.1%
672
 
2.8%
616
 
2.6%
588
 
2.5%
585
 
2.5%
559
 
2.4%
Other values (81) 8750
37.1%
ASCII
ValueCountFrequency (%)
2 1660
24.2%
1448
21.1%
1 1391
20.3%
( 672
9.8%
) 672
9.8%
R 290
 
4.2%
D 290
 
4.2%
& 290
 
4.2%
4 111
 
1.6%
3 39
 
0.6%
None
ValueCountFrequency (%)
· 54
100.0%

Interactions

2023-12-13T01:50:12.769942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:50:15.525770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호지역지구
번호1.0000.8810.960
지역0.8811.0000.993
지구0.9600.9931.000
2023-12-13T01:50:15.646142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호지역
번호1.0000.606
지역0.6061.000

Missing values

2023-12-13T01:50:12.924692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:50:13.030626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호지역기관명지구
01대덕SK이노베이션(주)환경과학기술원1지구
12대덕SK에너지(주)1지구
23대덕SK바이오팜(주)1지구
34대덕(주)삼양사1지구
45대덕국립문화재연구소1지구
56대덕금호폴리켐1지구
67대덕(주)윕스1지구
78대덕(주)씨앤엘1지구
89대덕(주)제우기술1지구
910대덕(주)큐니온1지구
번호지역기관명지구
51935194강소(충북청주)(주)서경산업사업화지구
51945195강소(충북청주)주식회사 해치텍사업화지구
51955196강소(충북청주)주식회사이상기술사업화지구
51965197강소(충북청주)(주)지오비앤에이치사업화지구
51975198강소(충북청주)(주)시아이솔루션사업화지구
51985199강소(충북청주)(재)충북과학기술혁신원사업화지구
51995200강소(충북청주)(주)유트론사업화지구
52005201강소(충북청주)주식회사 큐에스랩사업화지구
52015202강소(충북청주)(주)네오세미텍사업화지구
52025203강소(충북청주)주식회사 딜리셔스마켓사업화지구