Overview

Dataset statistics

Number of variables4
Number of observations1241
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory41.3 KiB
Average record size in memory34.1 B

Variable types

Numeric2
Text2

Dataset

Description2022년 기준의 데이터로, 연구개발특구진흥재단의 연구소기업 운영 현황에 관한 데이터입니다.연구소기업명과 등록연도 등의 데이터를 보유하고 있습니다.해당 데이터가 보유한 칼럼은 다음과 같습니다.칼럼명 : 구분, 기업명, 사업자등록번호, 등록연도
Author(재)연구개발특구진흥재단
URLhttps://www.data.go.kr/data/15089826/fileData.do

Alerts

구분 is highly overall correlated with 등록연도High correlation
등록연도 is highly overall correlated with 구분High correlation
구분 has unique valuesUnique
사업자등록번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 04:10:39.604379
Analysis finished2023-12-12 04:10:40.524483
Duration0.92 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

구분
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct1241
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean621
Minimum1
Maximum1241
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.0 KiB
2023-12-12T13:10:40.620722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile63
Q1311
median621
Q3931
95-th percentile1179
Maximum1241
Range1240
Interquartile range (IQR)620

Descriptive statistics

Standard deviation358.39015
Coefficient of variation (CV)0.57711779
Kurtosis-1.2
Mean621
Median Absolute Deviation (MAD)310
Skewness0
Sum770661
Variance128443.5
MonotonicityStrictly increasing
2023-12-12T13:10:40.793416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.1%
826 1
 
0.1%
833 1
 
0.1%
832 1
 
0.1%
831 1
 
0.1%
830 1
 
0.1%
829 1
 
0.1%
828 1
 
0.1%
827 1
 
0.1%
825 1
 
0.1%
Other values (1231) 1231
99.2%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
5 1
0.1%
6 1
0.1%
7 1
0.1%
8 1
0.1%
9 1
0.1%
10 1
0.1%
ValueCountFrequency (%)
1241 1
0.1%
1240 1
0.1%
1239 1
0.1%
1238 1
0.1%
1237 1
0.1%
1236 1
0.1%
1235 1
0.1%
1234 1
0.1%
1233 1
0.1%
1232 1
0.1%
Distinct1238
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
2023-12-12T13:10:41.073115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length6.0362611
Min length2

Characters and Unicode

Total characters7491
Distinct characters473
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1235 ?
Unique (%)99.5%

Sample

1st row㈜비티웍스
2nd row㈜라스테크
3rd row서울프로폴리스㈜
4th row㈜케이에너지
5th row호전에이블
ValueCountFrequency (%)
농업회사법인 7
 
0.6%
유한회사 3
 
0.2%
㈜그린코어 2
 
0.2%
㈜케이에스씨 2
 
0.2%
㈜헬스텍 2
 
0.2%
㈜제이에스컴퍼니 1
 
0.1%
㈜비놀로지 1
 
0.1%
㈜휴엔씨네이쳐 1
 
0.1%
㈜테라프릭스 1
 
0.1%
㈜에쓰큐씨 1
 
0.1%
Other values (1238) 1238
98.3%
2023-12-12T13:10:41.514122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1218
 
16.3%
555
 
7.4%
378
 
5.0%
274
 
3.7%
137
 
1.8%
132
 
1.8%
120
 
1.6%
111
 
1.5%
110
 
1.5%
96
 
1.3%
Other values (463) 4360
58.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6217
83.0%
Other Symbol 1218
 
16.3%
Space Separator 30
 
0.4%
Close Punctuation 10
 
0.1%
Open Punctuation 9
 
0.1%
Lowercase Letter 5
 
0.1%
Decimal Number 1
 
< 0.1%
Uppercase Letter 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
555
 
8.9%
378
 
6.1%
274
 
4.4%
137
 
2.2%
132
 
2.1%
120
 
1.9%
111
 
1.8%
110
 
1.8%
96
 
1.5%
93
 
1.5%
Other values (452) 4211
67.7%
Lowercase Letter
ValueCountFrequency (%)
e 1
20.0%
l 1
20.0%
u 1
20.0%
a 1
20.0%
d 1
20.0%
Other Symbol
ValueCountFrequency (%)
1218
100.0%
Space Separator
ValueCountFrequency (%)
30
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%
Uppercase Letter
ValueCountFrequency (%)
N 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 7435
99.3%
Common 50
 
0.7%
Latin 6
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1218
 
16.4%
555
 
7.5%
378
 
5.1%
274
 
3.7%
137
 
1.8%
132
 
1.8%
120
 
1.6%
111
 
1.5%
110
 
1.5%
96
 
1.3%
Other values (453) 4304
57.9%
Latin
ValueCountFrequency (%)
e 1
16.7%
l 1
16.7%
u 1
16.7%
a 1
16.7%
d 1
16.7%
N 1
16.7%
Common
ValueCountFrequency (%)
30
60.0%
) 10
 
20.0%
( 9
 
18.0%
2 1
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6217
83.0%
None 1218
 
16.3%
ASCII 56
 
0.7%

Most frequent character per block

None
ValueCountFrequency (%)
1218
100.0%
Hangul
ValueCountFrequency (%)
555
 
8.9%
378
 
6.1%
274
 
4.4%
137
 
2.2%
132
 
2.1%
120
 
1.9%
111
 
1.8%
110
 
1.8%
96
 
1.5%
93
 
1.5%
Other values (452) 4211
67.7%
ASCII
ValueCountFrequency (%)
30
53.6%
) 10
 
17.9%
( 9
 
16.1%
e 1
 
1.8%
l 1
 
1.8%
u 1
 
1.8%
a 1
 
1.8%
d 1
 
1.8%
2 1
 
1.8%
N 1
 
1.8%
Distinct1241
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size9.8 KiB
2023-12-12T13:10:41.812241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.002417
Min length12

Characters and Unicode

Total characters14895
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1241 ?
Unique (%)100.0%

Sample

1st row120-86-42098
2nd row314-81-38438
3rd row314-81-57605
4th row314-86-01557
5th row314-86-31949
ValueCountFrequency (%)
120-86-42098 1
 
0.1%
572-87-02052 1
 
0.1%
838-81-02080 1
 
0.1%
515-81-56306 1
 
0.1%
572-87-02125 1
 
0.1%
489-87-01770 1
 
0.1%
695-87-01910 1
 
0.1%
516-87-01806 1
 
0.1%
579-86-01800 1
 
0.1%
569-81-01278 1
 
0.1%
Other values (1231) 1231
99.2%
2023-12-12T13:10:42.225922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 2482
16.7%
8 2284
15.3%
0 2051
13.8%
1 1595
10.7%
6 1155
7.8%
7 1079
7.2%
2 1033
6.9%
4 903
 
6.1%
3 840
 
5.6%
5 830
 
5.6%
Other values (2) 643
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 12410
83.3%
Dash Punctuation 2482
 
16.7%
Space Separator 3
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 2284
18.4%
0 2051
16.5%
1 1595
12.9%
6 1155
9.3%
7 1079
8.7%
2 1033
8.3%
4 903
 
7.3%
3 840
 
6.8%
5 830
 
6.7%
9 640
 
5.2%
Dash Punctuation
ValueCountFrequency (%)
- 2482
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 14895
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 2482
16.7%
8 2284
15.3%
0 2051
13.8%
1 1595
10.7%
6 1155
7.8%
7 1079
7.2%
2 1033
6.9%
4 903
 
6.1%
3 840
 
5.6%
5 830
 
5.6%
Other values (2) 643
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14895
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 2482
16.7%
8 2284
15.3%
0 2051
13.8%
1 1595
10.7%
6 1155
7.8%
7 1079
7.2%
2 1033
6.9%
4 903
 
6.1%
3 840
 
5.6%
5 830
 
5.6%
Other values (2) 643
 
4.3%

등록연도
Real number (ℝ)

HIGH CORRELATION 

Distinct13
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2019.3102
Minimum2008
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.0 KiB
2023-12-12T13:10:42.354707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2008
5-th percentile2016
Q12018
median2020
Q32021
95-th percentile2022
Maximum2022
Range14
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.2485766
Coefficient of variation (CV)0.001113537
Kurtosis0.76779048
Mean2019.3102
Median Absolute Deviation (MAD)2
Skewness-0.8268029
Sum2505964
Variance5.0560968
MonotonicityIncreasing
2023-12-12T13:10:42.470679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2022 236
19.0%
2021 224
18.0%
2020 205
16.5%
2019 161
13.0%
2018 132
10.6%
2017 124
10.0%
2016 99
8.0%
2015 30
 
2.4%
2014 21
 
1.7%
2009 3
 
0.2%
Other values (3) 6
 
0.5%
ValueCountFrequency (%)
2008 1
 
0.1%
2009 3
 
0.2%
2012 3
 
0.2%
2013 2
 
0.2%
2014 21
 
1.7%
2015 30
 
2.4%
2016 99
8.0%
2017 124
10.0%
2018 132
10.6%
2019 161
13.0%
ValueCountFrequency (%)
2022 236
19.0%
2021 224
18.0%
2020 205
16.5%
2019 161
13.0%
2018 132
10.6%
2017 124
10.0%
2016 99
8.0%
2015 30
 
2.4%
2014 21
 
1.7%
2013 2
 
0.2%

Interactions

2023-12-12T13:10:40.117940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:10:39.915843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:10:40.231315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T13:10:40.014680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T13:10:42.571920image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분등록연도
구분1.0000.853
등록연도0.8531.000
2023-12-12T13:10:42.658915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구분등록연도
구분1.0000.989
등록연도0.9891.000

Missing values

2023-12-12T13:10:40.396806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T13:10:40.486696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

구분기업명사업자등록번호등록연도
01㈜비티웍스120-86-420982008
12㈜라스테크314-81-384382009
23서울프로폴리스㈜314-81-576052009
34㈜케이에너지314-86-015572009
45호전에이블314-86-319492012
56㈜세이프텍리서치314-86-391312012
67㈜뉴런504-86-008792012
78㈜그린모빌리티514-81-848472013
89㈜에스엠나노바이오314-86-515142013
910㈜디지엠텍514-81-914642014
구분기업명사업자등록번호등록연도
12311232㈜스마트세이프티랩784-86-027552022
12321233㈜케이컨스682-88-026232022
12331234㈜광명이엔지138-81-507832022
12341235㈜에스에스월드535-88-018572022
12351236㈜케이에스씨217-81-500862022
12361237㈜골다공인공지능571-88-025462022
12371238유한회사 케이에듀545-87-013282022
12381239㈜유엔에스바이오793-88-027342022
12391240㈜에어트러스트160-87-021312022
12401241케이유융합소프트웨어연구센터㈜326-87-020322022