Overview

Dataset statistics

Number of variables18
Number of observations10000
Missing cells10193
Missing cells (%)5.7%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory1.5 MiB
Average record size in memory154.0 B

Variable types

Numeric2
Text8
Categorical5
DateTime3

Dataset

Description해당 과제에서 등록된 특허 출원 및 등록 정보 제공 (지식재산권구분, 출원등록구분, 출원번호, 등록번호, 출원 및 등록 기관 등), 지식재산권 중 공개대상인 특허만 포함
Author한국환경산업기술원
URLhttps://www.data.go.kr/data/15087586/fileData.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
국내외구분 is highly overall correlated with 지식재산구분 and 1 other fieldsHigh correlation
출원등록구분 is highly overall correlated with 지식재산구분High correlation
출원등록국가 is highly overall correlated with 지식재산구분 and 1 other fieldsHigh correlation
지식재산구분 is highly overall correlated with 순번 and 5 other fieldsHigh correlation
연구기관유형 is highly overall correlated with 지식재산구분High correlation
순번 is highly overall correlated with 지식재산구분High correlation
성과년도 is highly overall correlated with 지식재산구분High correlation
지식재산구분 is highly imbalanced (97.8%)Imbalance
출원등록국가 is highly imbalanced (86.4%)Imbalance
국내외구분 is highly imbalanced (72.9%)Imbalance
출원등록년월 has 1918 (19.2%) missing valuesMissing
출원번호 has 1910 (19.1%) missing valuesMissing
등록번호 has 6207 (62.1%) missing valuesMissing

Reproduction

Analysis started2023-12-12 21:54:25.725698
Analysis finished2023-12-12 21:54:30.284044
Duration4.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

순번
Real number (ℝ)

HIGH CORRELATION 

Distinct9988
Distinct (%)100.0%
Missing12
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean7909.0343
Minimum1
Maximum15807
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:54:30.376254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile784.35
Q13988.75
median7907
Q311847.25
95-th percentile15003.65
Maximum15807
Range15806
Interquartile range (IQR)7858.5

Descriptive statistics

Standard deviation4558.0327
Coefficient of variation (CV)0.5763071
Kurtosis-1.1937647
Mean7909.0343
Median Absolute Deviation (MAD)3928.5
Skewness-0.0034086735
Sum78995435
Variance20775662
MonotonicityNot monotonic
2023-12-13T06:54:30.576909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
417 1
 
< 0.1%
14379 1
 
< 0.1%
9229 1
 
< 0.1%
145 1
 
< 0.1%
11310 1
 
< 0.1%
10597 1
 
< 0.1%
9674 1
 
< 0.1%
7831 1
 
< 0.1%
12438 1
 
< 0.1%
14708 1
 
< 0.1%
Other values (9978) 9978
99.8%
(Missing) 12
 
0.1%
ValueCountFrequency (%)
1 1
< 0.1%
2 1
< 0.1%
3 1
< 0.1%
4 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
7 1
< 0.1%
8 1
< 0.1%
9 1
< 0.1%
12 1
< 0.1%
ValueCountFrequency (%)
15807 1
< 0.1%
15806 1
< 0.1%
15805 1
< 0.1%
15803 1
< 0.1%
15802 1
< 0.1%
15801 1
< 0.1%
15800 1
< 0.1%
15799 1
< 0.1%
15798 1
< 0.1%
15797 1
< 0.1%
Distinct60
Distinct (%)0.6%
Missing5
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-13T06:54:30.889385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length39
Median length28
Mean length13.967684
Min length2

Characters and Unicode

Total characters139607
Distinct characters180
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row글로벌탑환경기술개발사업
2nd row환경산업선진화기술개발사업
3rd row토양·지하수 오염방지 기술개발사업
4th row글로벌탑환경기술개발사업
5th row야생생물 유래 친환경 신소재 및 공정 기술개발사업
ValueCountFrequency (%)
기술개발사업 3458
16.7%
차세대 2083
 
10.1%
핵심환경 2083
 
10.1%
글로벌탑환경기술개발사업 1841
 
8.9%
환경산업선진화기술개발사업 1042
 
5.0%
환경정책기반공공기술개발사업 804
 
3.9%
사업 696
 
3.4%
eco-star 683
 
3.3%
연구사업 665
 
3.2%
오염방지 581
 
2.8%
Other values (142) 6742
32.6%
2023-12-13T06:54:31.321143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
11040
 
7.9%
10697
 
7.7%
9929
 
7.1%
9740
 
7.0%
8464
 
6.1%
8257
 
5.9%
8239
 
5.9%
7198
 
5.2%
7114
 
5.1%
2437
 
1.7%
Other values (170) 56492
40.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 122732
87.9%
Space Separator 10697
 
7.7%
Lowercase Letter 3415
 
2.4%
Uppercase Letter 1445
 
1.0%
Dash Punctuation 683
 
0.5%
Other Punctuation 593
 
0.4%
Decimal Number 32
 
< 0.1%
Open Punctuation 5
 
< 0.1%
Close Punctuation 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
11040
 
9.0%
9929
 
8.1%
9740
 
7.9%
8464
 
6.9%
8257
 
6.7%
8239
 
6.7%
7198
 
5.9%
7114
 
5.8%
2437
 
2.0%
2161
 
1.8%
Other values (153) 48153
39.2%
Uppercase Letter
ValueCountFrequency (%)
E 683
47.3%
S 683
47.3%
C 37
 
2.6%
O 32
 
2.2%
T 5
 
0.3%
I 5
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
r 683
20.0%
a 683
20.0%
t 683
20.0%
o 683
20.0%
c 683
20.0%
Space Separator
ValueCountFrequency (%)
10697
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 683
100.0%
Other Punctuation
ValueCountFrequency (%)
· 593
100.0%
Decimal Number
ValueCountFrequency (%)
2 32
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 122732
87.9%
Common 12015
 
8.6%
Latin 4860
 
3.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
11040
 
9.0%
9929
 
8.1%
9740
 
7.9%
8464
 
6.9%
8257
 
6.7%
8239
 
6.7%
7198
 
5.9%
7114
 
5.8%
2437
 
2.0%
2161
 
1.8%
Other values (153) 48153
39.2%
Latin
ValueCountFrequency (%)
r 683
14.1%
a 683
14.1%
t 683
14.1%
E 683
14.1%
S 683
14.1%
o 683
14.1%
c 683
14.1%
C 37
 
0.8%
O 32
 
0.7%
T 5
 
0.1%
Common
ValueCountFrequency (%)
10697
89.0%
- 683
 
5.7%
· 593
 
4.9%
2 32
 
0.3%
( 5
 
< 0.1%
) 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 122732
87.9%
ASCII 16282
 
11.7%
None 593
 
0.4%

Most frequent character per block

Hangul
ValueCountFrequency (%)
11040
 
9.0%
9929
 
8.1%
9740
 
7.9%
8464
 
6.9%
8257
 
6.7%
8239
 
6.7%
7198
 
5.9%
7114
 
5.8%
2437
 
2.0%
2161
 
1.8%
Other values (153) 48153
39.2%
ASCII
ValueCountFrequency (%)
10697
65.7%
r 683
 
4.2%
a 683
 
4.2%
t 683
 
4.2%
E 683
 
4.2%
S 683
 
4.2%
- 683
 
4.2%
o 683
 
4.2%
c 683
 
4.2%
C 37
 
0.2%
Other values (6) 84
 
0.5%
None
ValueCountFrequency (%)
· 593
100.0%
Distinct2464
Distinct (%)24.7%
Missing5
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-13T06:54:31.624954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length144
Median length74
Mean length35.391596
Min length2

Characters and Unicode

Total characters353739
Distinct characters698
Distinct categories16 ?
Distinct scripts4 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique617 ?
Unique (%)6.2%

Sample

1st row대기 배출원 복합유해물질 측정분석장치 개발
2nd row지하환경 환경진단 모니터링 및 평가 시스템 개발
3rd row폐굴껍질과 가축뼈 등 천연폐자원을 이용한 비소 및 중금속으로 오염된 토양의 안정화
4th row대기 배출원 복합유해물질 측정분석장치 개발
5th row목재 부산물과 이온성 액체/열 분해 복합 처리를 이용한 풀빅산 유사체 대량 생산 기술 개발 및 화장품 응용 소재 발굴
ValueCountFrequency (%)
개발 7209
 
8.6%
4668
 
5.6%
기술 1908
 
2.3%
위한 1805
 
2.2%
이용한 1585
 
1.9%
시스템 1394
 
1.7%
기반 775
 
0.9%
기술개발 714
 
0.9%
고효율 483
 
0.6%
실증화 422
 
0.5%
Other values (7225) 62770
75.0%
2023-12-13T06:54:32.046394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
74228
 
21.0%
9808
 
2.8%
9500
 
2.7%
9148
 
2.6%
5555
 
1.6%
5475
 
1.5%
4981
 
1.4%
4803
 
1.4%
4682
 
1.3%
4573
 
1.3%
Other values (688) 220986
62.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 252365
71.3%
Space Separator 74228
 
21.0%
Lowercase Letter 10335
 
2.9%
Uppercase Letter 9773
 
2.8%
Other Punctuation 2697
 
0.8%
Decimal Number 1392
 
0.4%
Open Punctuation 1051
 
0.3%
Close Punctuation 1048
 
0.3%
Dash Punctuation 782
 
0.2%
Other Symbol 28
 
< 0.1%
Other values (6) 40
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9808
 
3.9%
9500
 
3.8%
9148
 
3.6%
5555
 
2.2%
5475
 
2.2%
4981
 
2.0%
4803
 
1.9%
4682
 
1.9%
4573
 
1.8%
4011
 
1.6%
Other values (598) 189829
75.2%
Lowercase Letter
ValueCountFrequency (%)
i 1107
10.7%
e 1013
 
9.8%
o 927
 
9.0%
t 787
 
7.6%
r 782
 
7.6%
a 774
 
7.5%
s 712
 
6.9%
n 610
 
5.9%
l 570
 
5.5%
m 414
 
4.0%
Other values (17) 2639
25.5%
Uppercase Letter
ValueCountFrequency (%)
C 1114
 
11.4%
O 1025
 
10.5%
P 893
 
9.1%
S 705
 
7.2%
M 565
 
5.8%
N 513
 
5.2%
F 503
 
5.1%
D 458
 
4.7%
R 458
 
4.7%
V 450
 
4.6%
Other values (15) 3089
31.6%
Other Punctuation
ValueCountFrequency (%)
/ 1211
44.9%
, 777
28.8%
· 387
 
14.3%
. 132
 
4.9%
: 98
 
3.6%
; 36
 
1.3%
& 23
 
0.9%
% 21
 
0.8%
" 6
 
0.2%
# 4
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 372
26.7%
0 326
23.4%
3 216
15.5%
1 191
13.7%
5 97
 
7.0%
6 83
 
6.0%
4 73
 
5.2%
8 17
 
1.2%
9 12
 
0.9%
7 5
 
0.4%
Other Symbol
ValueCountFrequency (%)
21
75.0%
3
 
10.7%
3
 
10.7%
1
 
3.6%
Open Punctuation
ValueCountFrequency (%)
( 1046
99.5%
5
 
0.5%
Close Punctuation
ValueCountFrequency (%)
) 1043
99.5%
5
 
0.5%
Other Number
ValueCountFrequency (%)
8
80.0%
2
 
20.0%
Space Separator
ValueCountFrequency (%)
74228
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 782
100.0%
Math Symbol
ValueCountFrequency (%)
+ 18
100.0%
Final Punctuation
ValueCountFrequency (%)
4
100.0%
Initial Punctuation
ValueCountFrequency (%)
4
100.0%
Modifier Symbol
ValueCountFrequency (%)
˙ 3
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 252361
71.3%
Common 81269
 
23.0%
Latin 20105
 
5.7%
Han 4
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9808
 
3.9%
9500
 
3.8%
9148
 
3.6%
5555
 
2.2%
5475
 
2.2%
4981
 
2.0%
4803
 
1.9%
4682
 
1.9%
4573
 
1.8%
4011
 
1.6%
Other values (594) 189825
75.2%
Latin
ValueCountFrequency (%)
C 1114
 
5.5%
i 1107
 
5.5%
O 1025
 
5.1%
e 1013
 
5.0%
o 927
 
4.6%
P 893
 
4.4%
t 787
 
3.9%
r 782
 
3.9%
a 774
 
3.8%
s 712
 
3.5%
Other values (41) 10971
54.6%
Common
ValueCountFrequency (%)
74228
91.3%
/ 1211
 
1.5%
( 1046
 
1.3%
) 1043
 
1.3%
- 782
 
1.0%
, 777
 
1.0%
· 387
 
0.5%
2 372
 
0.5%
0 326
 
0.4%
3 216
 
0.3%
Other values (29) 881
 
1.1%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 252305
71.3%
ASCII 100925
28.5%
None 407
 
0.1%
Compat Jamo 56
 
< 0.1%
CJK Compat 27
 
< 0.1%
Punctuation 8
 
< 0.1%
Letterlike Symbols 4
 
< 0.1%
CJK 4
 
< 0.1%
Modifier Letters 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
74228
73.5%
/ 1211
 
1.2%
C 1114
 
1.1%
i 1107
 
1.1%
( 1046
 
1.0%
) 1043
 
1.0%
O 1025
 
1.0%
e 1013
 
1.0%
o 927
 
0.9%
P 893
 
0.9%
Other values (67) 17318
 
17.2%
Hangul
ValueCountFrequency (%)
9808
 
3.9%
9500
 
3.8%
9148
 
3.6%
5555
 
2.2%
5475
 
2.2%
4981
 
2.0%
4803
 
1.9%
4682
 
1.9%
4573
 
1.8%
4011
 
1.6%
Other values (593) 189769
75.2%
None
ValueCountFrequency (%)
· 387
95.1%
8
 
2.0%
5
 
1.2%
5
 
1.2%
2
 
0.5%
Compat Jamo
ValueCountFrequency (%)
56
100.0%
CJK Compat
ValueCountFrequency (%)
21
77.8%
3
 
11.1%
3
 
11.1%
Punctuation
ValueCountFrequency (%)
4
50.0%
4
50.0%
Modifier Letters
ValueCountFrequency (%)
˙ 3
100.0%
Letterlike Symbols
ValueCountFrequency (%)
3
75.0%
1
 
25.0%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Distinct1019
Distinct (%)10.2%
Missing5
Missing (%)< 0.1%
Memory size156.2 KiB
2023-12-13T06:54:32.259683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length16
Mean length9.1386693
Min length2

Characters and Unicode

Total characters91341
Distinct characters443
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique195 ?
Unique (%)2.0%

Sample

1st row건국대학교 산학협력단
2nd row(주)그린솔루스
3rd row(주)해천이티에스
4th row건국대학교 산학협력단
5th row경상국립대학교 산학협력단
ValueCountFrequency (%)
산학협력단 2044
 
15.3%
주식회사 700
 
5.2%
한국과학기술연구원 432
 
3.2%
서울대학교 280
 
2.1%
한국건설기술연구원 249
 
1.9%
234
 
1.7%
고려대학교 206
 
1.5%
한국지질자원연구원 187
 
1.4%
한국에너지기술연구원 187
 
1.4%
한국기계연구원 161
 
1.2%
Other values (1021) 8716
65.1%
2023-12-13T06:54:32.599202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5668
 
6.2%
5173
 
5.7%
) 4277
 
4.7%
( 4276
 
4.7%
3403
 
3.7%
2793
 
3.1%
2713
 
3.0%
2603
 
2.8%
2402
 
2.6%
2377
 
2.6%
Other values (433) 55656
60.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78897
86.4%
Close Punctuation 4277
 
4.7%
Open Punctuation 4276
 
4.7%
Space Separator 3403
 
3.7%
Uppercase Letter 287
 
0.3%
Decimal Number 146
 
0.2%
Dash Punctuation 21
 
< 0.1%
Lowercase Letter 20
 
< 0.1%
Other Punctuation 10
 
< 0.1%
Other Symbol 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5668
 
7.2%
5173
 
6.6%
2793
 
3.5%
2713
 
3.4%
2603
 
3.3%
2402
 
3.0%
2377
 
3.0%
2352
 
3.0%
2317
 
2.9%
2226
 
2.8%
Other values (390) 48273
61.2%
Uppercase Letter
ValueCountFrequency (%)
S 41
14.3%
E 38
13.2%
K 36
12.5%
M 22
7.7%
I 20
7.0%
G 20
7.0%
C 16
 
5.6%
N 16
 
5.6%
V 16
 
5.6%
A 14
 
4.9%
Other values (9) 48
16.7%
Decimal Number
ValueCountFrequency (%)
1 41
28.1%
2 38
26.0%
0 28
19.2%
3 10
 
6.8%
4 7
 
4.8%
8 5
 
3.4%
6 5
 
3.4%
7 5
 
3.4%
9 4
 
2.7%
5 3
 
2.1%
Lowercase Letter
ValueCountFrequency (%)
n 6
30.0%
a 4
20.0%
o 2
 
10.0%
c 2
 
10.0%
r 2
 
10.0%
e 2
 
10.0%
g 2
 
10.0%
Other Punctuation
ValueCountFrequency (%)
. 8
80.0%
& 2
 
20.0%
Close Punctuation
ValueCountFrequency (%)
) 4277
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4276
100.0%
Space Separator
ValueCountFrequency (%)
3403
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21
100.0%
Other Symbol
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78901
86.4%
Common 12133
 
13.3%
Latin 307
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5668
 
7.2%
5173
 
6.6%
2793
 
3.5%
2713
 
3.4%
2603
 
3.3%
2402
 
3.0%
2377
 
3.0%
2352
 
3.0%
2317
 
2.9%
2226
 
2.8%
Other values (391) 48277
61.2%
Latin
ValueCountFrequency (%)
S 41
13.4%
E 38
12.4%
K 36
11.7%
M 22
 
7.2%
I 20
 
6.5%
G 20
 
6.5%
C 16
 
5.2%
N 16
 
5.2%
V 16
 
5.2%
A 14
 
4.6%
Other values (16) 68
22.1%
Common
ValueCountFrequency (%)
) 4277
35.3%
( 4276
35.2%
3403
28.0%
1 41
 
0.3%
2 38
 
0.3%
0 28
 
0.2%
- 21
 
0.2%
3 10
 
0.1%
. 8
 
0.1%
4 7
 
0.1%
Other values (6) 24
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78897
86.4%
ASCII 12440
 
13.6%
None 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5668
 
7.2%
5173
 
6.6%
2793
 
3.5%
2713
 
3.4%
2603
 
3.3%
2402
 
3.0%
2377
 
3.0%
2352
 
3.0%
2317
 
2.9%
2226
 
2.8%
Other values (390) 48273
61.2%
ASCII
ValueCountFrequency (%)
) 4277
34.4%
( 4276
34.4%
3403
27.4%
1 41
 
0.3%
S 41
 
0.3%
E 38
 
0.3%
2 38
 
0.3%
K 36
 
0.3%
0 28
 
0.2%
M 22
 
0.2%
Other values (32) 240
 
1.9%
None
ValueCountFrequency (%)
4
100.0%

연구기관유형
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중소기업
2858 
대학
2369 
정부출연연구기관
1213 
벤처기업
803 
대기업
779 
Other values (12)
1978 

Length

Max length9
Median length8
Mean length4.3793
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대학
2nd row벤처기업
3rd row중소기업
4th row대학
5th row대학

Common Values

ValueCountFrequency (%)
중소기업 2858
28.6%
대학 2369
23.7%
정부출연연구기관 1213
12.1%
벤처기업 803
 
8.0%
대기업 779
 
7.8%
특정연구기관 684
 
6.8%
중소기업부설연구소 526
 
5.3%
중견기업 220
 
2.2%
기타 141
 
1.4%
기타비영리 131
 
1.3%
Other values (7) 276
 
2.8%

Length

2023-12-13T06:54:32.716791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
중소기업 2858
28.6%
대학 2369
23.7%
정부출연연구기관 1213
12.1%
벤처기업 803
 
8.0%
대기업 779
 
7.8%
특정연구기관 684
 
6.8%
중소기업부설연구소 526
 
5.3%
중견기업 220
 
2.2%
기타 141
 
1.4%
기타비영리 131
 
1.3%
Other values (7) 276
 
2.8%
Distinct1699
Distinct (%)17.0%
Missing6
Missing (%)0.1%
Memory size156.2 KiB
2023-12-13T06:54:33.019316image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length3
Mean length3.126676
Min length2

Characters and Unicode

Total characters31248
Distinct characters237
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique344 ?
Unique (%)3.4%

Sample

1st row김조천
2nd row봉춘근
3rd row김태성
4th row김조천
5th row전종록
ValueCountFrequency (%)
미존재인력의 109
 
1.1%
대표인력 109
 
1.1%
류재천 98
 
1.0%
61269 54
 
0.5%
한장희 50
 
0.5%
김조천 48
 
0.5%
김치경 48
 
0.5%
한무영 46
 
0.5%
이상협 42
 
0.4%
정준교 41
 
0.4%
Other values (1692) 9463
93.6%
2023-12-13T06:54:33.445982image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1964
 
6.3%
1854
 
5.9%
781
 
2.5%
751
 
2.4%
741
 
2.4%
701
 
2.2%
674
 
2.2%
613
 
2.0%
554
 
1.8%
530
 
1.7%
Other values (227) 22085
70.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30298
97.0%
Uppercase Letter 566
 
1.8%
Decimal Number 270
 
0.9%
Space Separator 114
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1964
 
6.5%
1854
 
6.1%
781
 
2.6%
751
 
2.5%
741
 
2.4%
701
 
2.3%
674
 
2.2%
613
 
2.0%
554
 
1.8%
530
 
1.7%
Other values (204) 21135
69.8%
Uppercase Letter
ValueCountFrequency (%)
N 123
21.7%
G 82
14.5%
E 46
 
8.1%
O 46
 
8.1%
A 44
 
7.8%
K 43
 
7.6%
H 42
 
7.4%
U 41
 
7.2%
W 40
 
7.1%
S 39
 
6.9%
Other values (8) 20
 
3.5%
Decimal Number
ValueCountFrequency (%)
6 108
40.0%
9 54
20.0%
2 54
20.0%
1 54
20.0%
Space Separator
ValueCountFrequency (%)
114
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30298
97.0%
Latin 566
 
1.8%
Common 384
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1964
 
6.5%
1854
 
6.1%
781
 
2.6%
751
 
2.5%
741
 
2.4%
701
 
2.3%
674
 
2.2%
613
 
2.0%
554
 
1.8%
530
 
1.7%
Other values (204) 21135
69.8%
Latin
ValueCountFrequency (%)
N 123
21.7%
G 82
14.5%
E 46
 
8.1%
O 46
 
8.1%
A 44
 
7.8%
K 43
 
7.6%
H 42
 
7.4%
U 41
 
7.2%
W 40
 
7.1%
S 39
 
6.9%
Other values (8) 20
 
3.5%
Common
ValueCountFrequency (%)
114
29.7%
6 108
28.1%
9 54
14.1%
2 54
14.1%
1 54
14.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30298
97.0%
ASCII 950
 
3.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1964
 
6.5%
1854
 
6.1%
781
 
2.6%
751
 
2.5%
741
 
2.4%
701
 
2.3%
674
 
2.2%
613
 
2.0%
554
 
1.8%
530
 
1.7%
Other values (204) 21135
69.8%
ASCII
ValueCountFrequency (%)
N 123
12.9%
114
12.0%
6 108
11.4%
G 82
 
8.6%
9 54
 
5.7%
2 54
 
5.7%
1 54
 
5.7%
E 46
 
4.8%
O 46
 
4.8%
A 44
 
4.6%
Other values (13) 225
23.7%
Distinct185
Distinct (%)1.9%
Missing12
Missing (%)0.1%
Memory size156.2 KiB
Minimum2001-06-01 00:00:00
Maximum2022-08-01 00:00:00
2023-12-13T06:54:33.596259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:33.721229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct181
Distinct (%)1.8%
Missing12
Missing (%)0.1%
Memory size156.2 KiB
Minimum2003-05-31 00:00:00
Maximum2028-12-31 00:00:00
2023-12-13T06:54:33.857754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:33.991868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

성과년도
Real number (ℝ)

HIGH CORRELATION 

Distinct23
Distinct (%)0.2%
Missing12
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean2014.2475
Minimum2001
Maximum2023
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-13T06:54:34.121007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2001
5-th percentile2005
Q12010
median2015
Q32019
95-th percentile2022
Maximum2023
Range22
Interquartile range (IQR)9

Descriptive statistics

Standard deviation5.2996916
Coefficient of variation (CV)0.0026311025
Kurtosis-0.72187609
Mean2014.2475
Median Absolute Deviation (MAD)4
Skewness-0.40202169
Sum20118304
Variance28.086731
MonotonicityNot monotonic
2023-12-13T06:54:34.251309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2015 731
 
7.3%
2016 727
 
7.3%
2013 700
 
7.0%
2018 681
 
6.8%
2019 665
 
6.7%
2020 640
 
6.4%
2014 615
 
6.2%
2017 575
 
5.8%
2021 564
 
5.6%
2022 542
 
5.4%
Other values (13) 3548
35.5%
ValueCountFrequency (%)
2001 38
 
0.4%
2002 111
 
1.1%
2003 119
 
1.2%
2004 148
 
1.5%
2005 266
2.7%
2006 337
3.4%
2007 378
3.8%
2008 391
3.9%
2009 373
3.7%
2010 349
3.5%
ValueCountFrequency (%)
2023 136
 
1.4%
2022 542
5.4%
2021 564
5.6%
2020 640
6.4%
2019 665
6.7%
2018 681
6.8%
2017 575
5.8%
2016 727
7.3%
2015 731
7.3%
2014 615
6.2%

출원등록년월
Date

MISSING 

Distinct3304
Distinct (%)40.9%
Missing1918
Missing (%)19.2%
Memory size156.2 KiB
Minimum2001-02-14 00:00:00
Maximum2023-09-23 00:00:00
2023-12-13T06:54:34.367730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:34.523773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

명칭
Text

Distinct7959
Distinct (%)79.7%
Missing12
Missing (%)0.1%
Memory size156.2 KiB
2023-12-13T06:54:34.857459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length258
Median length155
Mean length30.806768
Min length2

Characters and Unicode

Total characters307698
Distinct characters1086
Distinct categories13 ?
Distinct scripts7 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6204 ?
Unique (%)62.1%

Sample

1st row배출가스 분석장치 및 이의 배출가스 분석방법
2nd row실내쾌적도 모니터링 장치 및 방법
3rd row납 오염 토양 안정화 또는 수처리용 조성물
4th row히트란 밴드패스필터와 GFC가 결합된 가스분석기를 이용한 가스 농도 보정방법
5th row염모제 조성물
ValueCountFrequency (%)
4890
 
6.7%
이용한 2524
 
3.5%
방법 2383
 
3.3%
장치 1404
 
1.9%
이를 1346
 
1.8%
시스템 1158
 
1.6%
제조방법 1094
 
1.5%
위한 768
 
1.1%
728
 
1.0%
포함하는 494
 
0.7%
Other values (14924) 56152
77.0%
2023-12-13T06:54:35.603991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
63180
 
20.5%
7800
 
2.5%
5556
 
1.8%
5247
 
1.7%
5119
 
1.7%
4926
 
1.6%
4616
 
1.5%
4111
 
1.3%
4101
 
1.3%
3932
 
1.3%
Other values (1076) 199110
64.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 210298
68.3%
Space Separator 63182
 
20.5%
Uppercase Letter 17477
 
5.7%
Lowercase Letter 14139
 
4.6%
Other Punctuation 890
 
0.3%
Decimal Number 641
 
0.2%
Dash Punctuation 583
 
0.2%
Open Punctuation 234
 
0.1%
Close Punctuation 233
 
0.1%
Math Symbol 10
 
< 0.1%
Other values (3) 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
7800
 
3.7%
5556
 
2.6%
5247
 
2.5%
5119
 
2.4%
4926
 
2.3%
4616
 
2.2%
4111
 
2.0%
4101
 
2.0%
3932
 
1.9%
3850
 
1.8%
Other values (927) 161040
76.6%
Uppercase Letter
ValueCountFrequency (%)
E 1603
 
9.2%
A 1444
 
8.3%
T 1337
 
7.7%
I 1306
 
7.5%
O 1295
 
7.4%
N 1267
 
7.2%
R 1205
 
6.9%
S 991
 
5.7%
C 837
 
4.8%
M 790
 
4.5%
Other values (40) 5402
30.9%
Lowercase Letter
ValueCountFrequency (%)
e 1521
 
10.8%
o 1239
 
8.8%
a 1195
 
8.5%
t 1178
 
8.3%
i 1103
 
7.8%
r 1067
 
7.5%
n 1060
 
7.5%
s 741
 
5.2%
d 581
 
4.1%
l 549
 
3.9%
Other values (40) 3905
27.6%
Decimal Number
ValueCountFrequency (%)
2 192
30.0%
3 122
19.0%
1 102
15.9%
0 61
 
9.5%
5 48
 
7.5%
4 34
 
5.3%
6 19
 
3.0%
7 12
 
1.9%
9
 
1.4%
9 9
 
1.4%
Other values (9) 33
 
5.1%
Other Punctuation
ValueCountFrequency (%)
, 660
74.2%
/ 120
 
13.5%
· 47
 
5.3%
. 44
 
4.9%
4
 
0.4%
: 3
 
0.3%
& 3
 
0.3%
# 3
 
0.3%
; 3
 
0.3%
' 2
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 216
92.3%
{ 15
 
6.4%
2
 
0.9%
[ 1
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 215
92.3%
} 15
 
6.4%
2
 
0.9%
] 1
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
- 571
97.9%
6
 
1.0%
6
 
1.0%
Space Separator
ValueCountFrequency (%)
63180
> 99.9%
  2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 7
70.0%
3
30.0%
Letter Number
ValueCountFrequency (%)
2
50.0%
2
50.0%
Format
ValueCountFrequency (%)
­ 5
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 210052
68.3%
Common 65780
 
21.4%
Latin 31607
 
10.3%
Han 198
 
0.1%
Hiragana 26
 
< 0.1%
Katakana 22
 
< 0.1%
Greek 13
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
7800
 
3.7%
5556
 
2.6%
5247
 
2.5%
5119
 
2.4%
4926
 
2.3%
4616
 
2.2%
4111
 
2.0%
4101
 
2.0%
3932
 
1.9%
3850
 
1.8%
Other values (792) 160794
76.5%
Han
ValueCountFrequency (%)
11
 
5.6%
10
 
5.1%
9
 
4.5%
9
 
4.5%
6
 
3.0%
6
 
3.0%
5
 
2.5%
5
 
2.5%
4
 
2.0%
4
 
2.0%
Other values (93) 129
65.2%
Latin
ValueCountFrequency (%)
E 1603
 
5.1%
e 1521
 
4.8%
A 1444
 
4.6%
T 1337
 
4.2%
I 1306
 
4.1%
O 1295
 
4.1%
N 1267
 
4.0%
o 1239
 
3.9%
R 1205
 
3.8%
a 1195
 
3.8%
Other values (88) 18195
57.6%
Common
ValueCountFrequency (%)
63180
96.0%
, 660
 
1.0%
- 571
 
0.9%
( 216
 
0.3%
) 215
 
0.3%
2 192
 
0.3%
3 122
 
0.2%
/ 120
 
0.2%
1 102
 
0.2%
0 61
 
0.1%
Other values (37) 341
 
0.5%
Katakana
ValueCountFrequency (%)
2
 
9.1%
2
 
9.1%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
Other values (10) 10
45.5%
Hiragana
ValueCountFrequency (%)
5
19.2%
5
19.2%
4
15.4%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (2) 2
 
7.7%
Greek
ValueCountFrequency (%)
ε 6
46.2%
α 3
23.1%
β 3
23.1%
γ 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 210045
68.3%
ASCII 96930
31.5%
None 460
 
0.1%
CJK 196
 
0.1%
Hiragana 26
 
< 0.1%
Katakana 22
 
< 0.1%
Compat Jamo 7
 
< 0.1%
Punctuation 6
 
< 0.1%
Number Forms 4
 
< 0.1%
CJK Compat Ideographs 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
63180
65.2%
E 1603
 
1.7%
e 1521
 
1.6%
A 1444
 
1.5%
T 1337
 
1.4%
I 1306
 
1.3%
O 1295
 
1.3%
N 1267
 
1.3%
o 1239
 
1.3%
R 1205
 
1.2%
Other values (71) 21533
 
22.2%
Hangul
ValueCountFrequency (%)
7800
 
3.7%
5556
 
2.6%
5247
 
2.5%
5119
 
2.4%
4926
 
2.3%
4616
 
2.2%
4111
 
2.0%
4101
 
2.0%
3932
 
1.9%
3850
 
1.8%
Other values (789) 160787
76.5%
None
ValueCountFrequency (%)
· 47
 
10.2%
26
 
5.7%
26
 
5.7%
22
 
4.8%
22
 
4.8%
22
 
4.8%
16
 
3.5%
14
 
3.0%
13
 
2.8%
13
 
2.8%
Other values (55) 239
52.0%
CJK
ValueCountFrequency (%)
11
 
5.6%
10
 
5.1%
9
 
4.6%
9
 
4.6%
6
 
3.1%
6
 
3.1%
5
 
2.6%
5
 
2.6%
4
 
2.0%
4
 
2.0%
Other values (91) 127
64.8%
Punctuation
ValueCountFrequency (%)
6
100.0%
Hiragana
ValueCountFrequency (%)
5
19.2%
5
19.2%
4
15.4%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (2) 2
 
7.7%
Compat Jamo
ValueCountFrequency (%)
5
71.4%
1
 
14.3%
1
 
14.3%
Number Forms
ValueCountFrequency (%)
2
50.0%
2
50.0%
Katakana
ValueCountFrequency (%)
2
 
9.1%
2
 
9.1%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
1
 
4.5%
Other values (10) 10
45.5%
CJK Compat Ideographs
ValueCountFrequency (%)
1
50.0%
1
50.0%

지식재산구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특허
9979 
<NA>
 
21

Length

Max length4
Median length2
Mean length2.0042
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특허
2nd row특허
3rd row특허
4th row특허
5th row특허

Common Values

ValueCountFrequency (%)
특허 9979
99.8%
<NA> 21
 
0.2%

Length

2023-12-13T06:54:35.732381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:54:35.820543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특허 9979
99.8%
na 21
 
0.2%

출원등록구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
출원
6178 
등록
3801 
<NA>
 
21

Length

Max length4
Median length2
Mean length2.0042
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row출원
2nd row등록
3rd row출원
4th row출원
5th row출원

Common Values

ValueCountFrequency (%)
출원 6178
61.8%
등록 3801
38.0%
<NA> 21
 
0.2%

Length

2023-12-13T06:54:35.905070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:54:35.991065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
출원 6178
61.8%
등록 3801
38.0%
na 21
 
0.2%

출원번호
Text

MISSING 

Distinct6928
Distinct (%)85.6%
Missing1910
Missing (%)19.1%
Memory size156.2 KiB
2023-12-13T06:54:36.156569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length15
Mean length14.672682
Min length4

Characters and Unicode

Total characters118702
Distinct characters53
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5882 ?
Unique (%)72.7%

Sample

1st row10-2017-0072473
2nd row10-2009-0127882
3rd row10-2018-0126957
4th row10-2023-0004327
5th row10-2008-0012911
ValueCountFrequency (%)
출원 24
 
0.3%
1.02014e+12 17
 
0.2%
13
 
0.2%
1.02016e+12 9
 
0.1%
1.02003e+12 9
 
0.1%
2.0121e+11 8
 
0.1%
10-2004-0057612 7
 
0.1%
2.0131e+11 7
 
0.1%
2.01711e+11 7
 
0.1%
2.0128e+11 7
 
0.1%
Other values (6940) 8047
98.7%
2023-12-13T06:54:36.484152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 32992
27.8%
1 20525
17.3%
2 14561
12.3%
- 14365
12.1%
3 4880
 
4.1%
8 4835
 
4.1%
6 4717
 
4.0%
4 4713
 
4.0%
7 4683
 
3.9%
5 4635
 
3.9%
Other values (43) 7796
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100984
85.1%
Dash Punctuation 14365
 
12.1%
Uppercase Letter 1831
 
1.5%
Other Punctuation 1032
 
0.9%
Other Letter 199
 
0.2%
Math Symbol 190
 
0.2%
Space Separator 94
 
0.1%
Lowercase Letter 7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
P 329
18.0%
K 321
17.5%
R 316
17.3%
T 315
17.2%
C 314
17.1%
E 194
10.6%
X 9
 
0.5%
O 6
 
0.3%
L 5
 
0.3%
S 4
 
0.2%
Other values (11) 18
 
1.0%
Decimal Number
ValueCountFrequency (%)
0 32992
32.7%
1 20525
20.3%
2 14561
14.4%
3 4880
 
4.8%
8 4835
 
4.8%
6 4717
 
4.7%
4 4713
 
4.7%
7 4683
 
4.6%
5 4635
 
4.6%
9 4443
 
4.4%
Other Letter
ValueCountFrequency (%)
63
31.7%
56
28.1%
30
15.1%
30
15.1%
8
 
4.0%
8
 
4.0%
2
 
1.0%
2
 
1.0%
Lowercase Letter
ValueCountFrequency (%)
a 2
28.6%
s 1
14.3%
p 1
14.3%
n 1
14.3%
b 1
14.3%
i 1
14.3%
Other Punctuation
ValueCountFrequency (%)
/ 806
78.1%
. 205
 
19.9%
, 19
 
1.8%
; 1
 
0.1%
& 1
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 14365
100.0%
Math Symbol
ValueCountFrequency (%)
+ 190
100.0%
Space Separator
ValueCountFrequency (%)
94
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 116665
98.3%
Latin 1838
 
1.5%
Hangul 199
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
P 329
17.9%
K 321
17.5%
R 316
17.2%
T 315
17.1%
C 314
17.1%
E 194
10.6%
X 9
 
0.5%
O 6
 
0.3%
L 5
 
0.3%
S 4
 
0.2%
Other values (17) 25
 
1.4%
Common
ValueCountFrequency (%)
0 32992
28.3%
1 20525
17.6%
2 14561
12.5%
- 14365
12.3%
3 4880
 
4.2%
8 4835
 
4.1%
6 4717
 
4.0%
4 4713
 
4.0%
7 4683
 
4.0%
5 4635
 
4.0%
Other values (8) 5759
 
4.9%
Hangul
ValueCountFrequency (%)
63
31.7%
56
28.1%
30
15.1%
30
15.1%
8
 
4.0%
8
 
4.0%
2
 
1.0%
2
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 118503
99.8%
Hangul 199
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 32992
27.8%
1 20525
17.3%
2 14561
12.3%
- 14365
12.1%
3 4880
 
4.1%
8 4835
 
4.1%
6 4717
 
4.0%
4 4713
 
4.0%
7 4683
 
4.0%
5 4635
 
3.9%
Other values (35) 7597
 
6.4%
Hangul
ValueCountFrequency (%)
63
31.7%
56
28.1%
30
15.1%
30
15.1%
8
 
4.0%
8
 
4.0%
2
 
1.0%
2
 
1.0%

등록번호
Text

MISSING 

Distinct3731
Distinct (%)98.4%
Missing6207
Missing (%)62.1%
Memory size156.2 KiB
2023-12-13T06:54:36.683535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length15
Mean length14.414448
Min length4

Characters and Unicode

Total characters54674
Distinct characters38
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3686 ?
Unique (%)97.2%

Sample

1st row10-1307189-0000
2nd row10-0938596-0000
3rd row10-15386090000
4th rowZL202210464845.0
5th row10-2175195-0000
ValueCountFrequency (%)
등록 62
 
1.6%
48
 
1.2%
특허 24
 
0.6%
1.02e+12 9
 
0.2%
9
 
0.2%
10-0600704 6
 
0.2%
us 5
 
0.1%
1.01e+12 5
 
0.1%
1.00e+12 4
 
0.1%
zl 3
 
0.1%
Other values (3732) 3778
95.6%
2023-12-13T06:54:37.005327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 19869
36.3%
1 7883
 
14.4%
- 6825
 
12.5%
2 3275
 
6.0%
5 2411
 
4.4%
9 2336
 
4.3%
4 2333
 
4.3%
6 2286
 
4.2%
8 2228
 
4.1%
3 2216
 
4.1%
Other values (28) 3012
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 47029
86.0%
Dash Punctuation 6825
 
12.5%
Other Letter 422
 
0.8%
Space Separator 172
 
0.3%
Uppercase Letter 137
 
0.3%
Other Punctuation 68
 
0.1%
Math Symbol 19
 
< 0.1%
Open Punctuation 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
Z 38
27.7%
L 36
26.3%
E 19
13.9%
B 7
 
5.1%
U 7
 
5.1%
S 7
 
5.1%
P 6
 
4.4%
I 5
 
3.6%
X 5
 
3.6%
T 2
 
1.5%
Other values (4) 5
 
3.6%
Decimal Number
ValueCountFrequency (%)
0 19869
42.2%
1 7883
 
16.8%
2 3275
 
7.0%
5 2411
 
5.1%
9 2336
 
5.0%
4 2333
 
5.0%
6 2286
 
4.9%
8 2228
 
4.7%
3 2216
 
4.7%
7 2192
 
4.7%
Other Letter
ValueCountFrequency (%)
126
29.9%
107
25.4%
68
16.1%
68
16.1%
27
 
6.4%
26
 
6.2%
Other Punctuation
ValueCountFrequency (%)
. 54
79.4%
, 8
 
11.8%
/ 6
 
8.8%
Dash Punctuation
ValueCountFrequency (%)
- 6825
100.0%
Space Separator
ValueCountFrequency (%)
172
100.0%
Math Symbol
ValueCountFrequency (%)
+ 19
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 54115
99.0%
Hangul 422
 
0.8%
Latin 137
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0 19869
36.7%
1 7883
 
14.6%
- 6825
 
12.6%
2 3275
 
6.1%
5 2411
 
4.5%
9 2336
 
4.3%
4 2333
 
4.3%
6 2286
 
4.2%
8 2228
 
4.1%
3 2216
 
4.1%
Other values (8) 2453
 
4.5%
Latin
ValueCountFrequency (%)
Z 38
27.7%
L 36
26.3%
E 19
13.9%
B 7
 
5.1%
U 7
 
5.1%
S 7
 
5.1%
P 6
 
4.4%
I 5
 
3.6%
X 5
 
3.6%
T 2
 
1.5%
Other values (4) 5
 
3.6%
Hangul
ValueCountFrequency (%)
126
29.9%
107
25.4%
68
16.1%
68
16.1%
27
 
6.4%
26
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 54252
99.2%
Hangul 422
 
0.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 19869
36.6%
1 7883
 
14.5%
- 6825
 
12.6%
2 3275
 
6.0%
5 2411
 
4.4%
9 2336
 
4.3%
4 2333
 
4.3%
6 2286
 
4.2%
8 2228
 
4.1%
3 2216
 
4.1%
Other values (22) 2590
 
4.8%
Hangul
ValueCountFrequency (%)
126
29.9%
107
25.4%
68
16.1%
68
16.1%
27
 
6.4%
26
 
6.2%
Distinct1449
Distinct (%)14.6%
Missing77
Missing (%)0.8%
Memory size156.2 KiB
2023-12-13T06:54:37.285167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length58
Median length34
Mean length9.192079
Min length2

Characters and Unicode

Total characters91213
Distinct characters485
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique501 ?
Unique (%)5.0%

Sample

1st row건국대학교 산학협력단
2nd row(주)그린솔루스
3rd row(주)해천이티에스
4th row건국대학교 산학협력단
5th row경상국립대학교 산학협력단
ValueCountFrequency (%)
산학협력단 1984
 
14.7%
주식회사 749
 
5.5%
한국과학기술연구원 430
 
3.2%
270
 
2.0%
한국건설기술연구원 236
 
1.7%
서울대학교 217
 
1.6%
한국화학연구원 189
 
1.4%
고려대학교 183
 
1.4%
한국지질자원연구원 178
 
1.3%
한국에너지기술연구원 178
 
1.3%
Other values (1453) 8899
65.9%
2023-12-13T06:54:37.727880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5767
 
6.3%
4960
 
5.4%
) 3929
 
4.3%
( 3927
 
4.3%
3607
 
4.0%
2798
 
3.1%
2753
 
3.0%
2682
 
2.9%
2430
 
2.7%
2415
 
2.6%
Other values (475) 55945
61.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78916
86.5%
Close Punctuation 3929
 
4.3%
Open Punctuation 3927
 
4.3%
Space Separator 3607
 
4.0%
Uppercase Letter 418
 
0.5%
Lowercase Letter 172
 
0.2%
Decimal Number 137
 
0.2%
Other Punctuation 80
 
0.1%
Other Symbol 26
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5767
 
7.3%
4960
 
6.3%
2798
 
3.5%
2753
 
3.5%
2682
 
3.4%
2430
 
3.1%
2415
 
3.1%
2380
 
3.0%
2330
 
3.0%
2295
 
2.9%
Other values (416) 48106
61.0%
Uppercase Letter
ValueCountFrequency (%)
S 56
13.4%
E 55
13.2%
K 37
8.9%
G 36
 
8.6%
L 29
 
6.9%
N 28
 
6.7%
M 25
 
6.0%
I 24
 
5.7%
C 23
 
5.5%
T 13
 
3.1%
Other values (13) 92
22.0%
Lowercase Letter
ValueCountFrequency (%)
e 24
14.0%
n 22
12.8%
o 17
9.9%
a 15
8.7%
g 13
 
7.6%
t 12
 
7.0%
i 11
 
6.4%
r 9
 
5.2%
d 8
 
4.7%
s 7
 
4.1%
Other values (11) 34
19.8%
Decimal Number
ValueCountFrequency (%)
1 72
52.6%
2 48
35.0%
4 9
 
6.6%
3 5
 
3.6%
5 3
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 47
58.8%
. 16
 
20.0%
/ 8
 
10.0%
& 6
 
7.5%
: 3
 
3.8%
Close Punctuation
ValueCountFrequency (%)
) 3929
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3927
100.0%
Space Separator
ValueCountFrequency (%)
3607
100.0%
Other Symbol
ValueCountFrequency (%)
26
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78942
86.5%
Common 11681
 
12.8%
Latin 590
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5767
 
7.3%
4960
 
6.3%
2798
 
3.5%
2753
 
3.5%
2682
 
3.4%
2430
 
3.1%
2415
 
3.1%
2380
 
3.0%
2330
 
3.0%
2295
 
2.9%
Other values (417) 48132
61.0%
Latin
ValueCountFrequency (%)
S 56
 
9.5%
E 55
 
9.3%
K 37
 
6.3%
G 36
 
6.1%
L 29
 
4.9%
N 28
 
4.7%
M 25
 
4.2%
e 24
 
4.1%
I 24
 
4.1%
C 23
 
3.9%
Other values (34) 253
42.9%
Common
ValueCountFrequency (%)
) 3929
33.6%
( 3927
33.6%
3607
30.9%
1 72
 
0.6%
2 48
 
0.4%
, 47
 
0.4%
. 16
 
0.1%
4 9
 
0.1%
/ 8
 
0.1%
& 6
 
0.1%
Other values (4) 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78916
86.5%
ASCII 12271
 
13.5%
None 26
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5767
 
7.3%
4960
 
6.3%
2798
 
3.5%
2753
 
3.5%
2682
 
3.4%
2430
 
3.1%
2415
 
3.1%
2380
 
3.0%
2330
 
3.0%
2295
 
2.9%
Other values (416) 48106
61.0%
ASCII
ValueCountFrequency (%)
) 3929
32.0%
( 3927
32.0%
3607
29.4%
1 72
 
0.6%
S 56
 
0.5%
E 55
 
0.4%
2 48
 
0.4%
, 47
 
0.4%
K 37
 
0.3%
G 36
 
0.3%
Other values (48) 457
 
3.7%
None
ValueCountFrequency (%)
26
100.0%

출원등록국가
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대한민국
9164 
국제(PCT)
 
249
미국
 
209
중국
 
154
일본
 
88
Other values (23)
 
136

Length

Max length9
Median length4
Mean length3.9783
Min length2

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row대한민국
2nd row대한민국
3rd row대한민국
4th row대한민국
5th row대한민국

Common Values

ValueCountFrequency (%)
대한민국 9164
91.6%
국제(PCT) 249
 
2.5%
미국 209
 
2.1%
중국 154
 
1.5%
일본 88
 
0.9%
유럽연합 34
 
0.3%
<NA> 21
 
0.2%
베트남 12
 
0.1%
인도네시아 8
 
0.1%
대만 8
 
0.1%
Other values (18) 53
 
0.5%

Length

2023-12-13T06:54:37.898455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대한민국 9164
91.6%
국제(pct 249
 
2.5%
미국 209
 
2.1%
중국 154
 
1.5%
일본 88
 
0.9%
유럽연합 34
 
0.3%
na 21
 
0.2%
베트남 12
 
0.1%
인도네시아 8
 
0.1%
대만 8
 
0.1%
Other values (19) 55
 
0.5%

국내외구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국내
9164 
국외
 
815
<NA>
 
21

Length

Max length4
Median length2
Mean length2.0042
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국내
2nd row국내
3rd row국내
4th row국내
5th row국내

Common Values

ValueCountFrequency (%)
국내 9164
91.6%
국외 815
 
8.2%
<NA> 21
 
0.2%

Length

2023-12-13T06:54:38.040680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T06:54:38.169094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국내 9164
91.6%
국외 815
 
8.2%
na 21
 
0.2%

Interactions

2023-12-13T06:54:29.175373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:28.941926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:29.290170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T06:54:29.062493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T06:54:38.247895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번사업명연구기관유형성과년도출원등록구분출원등록국가국내외구분
순번1.0000.9710.3620.7950.1190.1400.143
사업명0.9711.0000.6930.8000.1420.0000.166
연구기관유형0.3620.6931.0000.2520.0390.2470.086
성과년도0.7950.8000.2521.0000.1330.1010.083
출원등록구분0.1190.1420.0390.1331.0000.1690.205
출원등록국가0.1400.0000.2470.1010.1691.0001.000
국내외구분0.1430.1660.0860.0830.2051.0001.000
2023-12-13T06:54:38.367212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
국내외구분출원등록구분출원등록국가지식재산구분연구기관유형
국내외구분1.0000.1310.9991.0000.067
출원등록구분0.1311.0000.1451.0000.030
출원등록국가0.9990.1451.0001.0000.075
지식재산구분1.0001.0001.0001.0001.000
연구기관유형0.0670.0300.0751.0001.000
2023-12-13T06:54:38.473433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
순번성과년도연구기관유형지식재산구분출원등록구분출원등록국가국내외구분
순번1.000-0.0320.1501.0000.0910.0510.110
성과년도-0.0321.0000.1021.0000.1050.0360.064
연구기관유형0.1500.1021.0001.0000.0300.0750.067
지식재산구분1.0001.0001.0001.0001.0001.0001.000
출원등록구분0.0910.1050.0301.0001.0000.1450.131
출원등록국가0.0510.0360.0751.0000.1451.0000.999
국내외구분0.1100.0640.0671.0000.1310.9991.000

Missing values

2023-12-13T06:54:29.460178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T06:54:29.722820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T06:54:30.048475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

순번사업명연구과제명연구기관연구기관유형연구책임자과제시작일과제종료일성과년도출원등록년월명칭지식재산구분출원등록구분출원번호등록번호출원등록기관출원등록국가국내외구분
25072508글로벌탑환경기술개발사업대기 배출원 복합유해물질 측정분석장치 개발건국대학교 산학협력단대학김조천2014-12-312020-12-3120172017-06-09배출가스 분석장치 및 이의 배출가스 분석방법특허출원10-2017-0072473<NA>건국대학교 산학협력단대한민국국내
1345713445환경산업선진화기술개발사업지하환경 환경진단 모니터링 및 평가 시스템 개발(주)그린솔루스벤처기업봉춘근2011-05-012014-03-312013<NA>실내쾌적도 모니터링 장치 및 방법특허등록<NA>10-1307189-0000(주)그린솔루스대한민국국내
1105911054토양·지하수 오염방지 기술개발사업폐굴껍질과 가축뼈 등 천연폐자원을 이용한 비소 및 중금속으로 오염된 토양의 안정화(주)해천이티에스중소기업김태성2009-03-012011-02-2820092009-12-21납 오염 토양 안정화 또는 수처리용 조성물특허출원10-2009-0127882<NA>(주)해천이티에스대한민국국내
15841585글로벌탑환경기술개발사업대기 배출원 복합유해물질 측정분석장치 개발건국대학교 산학협력단대학김조천2014-12-312020-12-3120182018-10-23히트란 밴드패스필터와 GFC가 결합된 가스분석기를 이용한 가스 농도 보정방법특허출원10-2018-0126957<NA>건국대학교 산학협력단대한민국국내
63746369야생생물 유래 친환경 신소재 및 공정 기술개발사업목재 부산물과 이온성 액체/열 분해 복합 처리를 이용한 풀빅산 유사체 대량 생산 기술 개발 및 화장품 응용 소재 발굴경상국립대학교 산학협력단대학전종록2021-04-012023-12-3120232023-01-11염모제 조성물특허출원10-2023-0004327<NA>경상국립대학교 산학협력단대한민국국내
1015910154차세대 핵심환경 기술개발사업나노 금속/금속산화물을 활용한 공기정화용 고효율 에어 필터 개발(주)엔지텍중소기업김종순2007-04-012008-03-3120082008-02-13나노 금속 입자의 제조방법 및 그 응용제품특허출원10-2008-0012911<NA>(주)엔지텍대한민국국내
1413514123환경융합신기술개발사업덴드리틱 구조에 기반한 유해물질 분리용 나노 담체 합성 및 재생 기술 개발한국과학기술원특정연구기관김상율2009-06-012014-05-3120122012-05-23역상 현탁중합과 전구체를 이용한 가교된 하이퍼브랜치 폴리아미도아민 입자의 제조 방법특허출원10-2012-0054537<NA>한국과학기술원대한민국국내
782783Eco-Star 사업수생태계 생물서식처 복원기술 개발강원대학교 삼척산학협력단대학허우명2007-12-012014-05-312010<NA>수변어소블럭특허등록<NA>10-0938596-0000강원대학교 삼척산학협력단대한민국국내
38213822글로벌탑환경기술개발사업고무 폐자원의 고부가가치 재활용 기술경상국립대학교 산학협력단대학김진국2011-08-012014-04-3020132013-04-18경량화 방근시트의 제조방법 및 이로부터 제조된 경량화 방근시트특허출원10-2013-0042625<NA>경상국립대학교 산학협력단대한민국국내
50175013물관리 연구사업수변 충적층 지하수열 활용 저장 시스템 지상설비 최적화 기술 개발KAIA-임시기관기타612692011-10-312016-06-3020152015-04-01바이패스 밸브를 이용하는 히트펌프 시스템 및 그 작동방법특허등록10-2015-004614210-15386090000<NA>대한민국국내
순번사업명연구과제명연구기관연구기관유형연구책임자과제시작일과제종료일성과년도출원등록년월명칭지식재산구분출원등록구분출원번호등록번호출원등록기관출원등록국가국내외구분
71887183차세대 핵심환경 기술개발사업온실가스 N2O 분해용 촉매 시스템 및 적용기술 개발상명대학교대학장길상2004-06-012006-05-3120052005-06-01일산화탄소를 이용한 질소산화물의 분해방법특허출원10-2005-0046695<NA>장길상,우제완,박용성대한민국국내
53505345미세먼지 사각지대 해소 및 저감 실증화 기술개발사업다단 희석 샘플링 기술을 이용한 고정오염원 배출시설의 미세먼지 연속측정 기술 실증화한국기계연구원정부출연연구기관한방우2020-05-122021-12-3120212021-07-20배기가스 희석장치특허출원10-2021-0094806<NA>한국기계연구원대한민국국내
1562315611환경정책기반공공기술개발사업도심 하수도 악취 저감을 위한 최적 시스템 개발한국건설기술연구원정부출연연구기관유성수2017-04-112021-06-3020192019-11-13하수관용 인버트 구조체 및 그 시공방법특허출원10-2019-0145127<NA>한국건설기술연구원대한민국국내
366367Eco-Star 사업자연하안 창출공법 및 인공하안 대체공법 개발(주)한화건설대기업허형우2007-12-012014-05-3120122012-03-19생태안착형 입체 호안구조 및 이의 시공방법특허출원10-2012-0027796<NA>(주)한화건설대한민국국내
1379613784환경서비스기술개발사업폐형광등 수거 장치 및 수거 모니터링 시스템 구축(주)에코아이이앤씨벤처기업정재수2013-05-012015-03-3120142014-03-31폐형광등 수거 시스템 및 폐형광등 수거방법특허출원10-2014-0037930<NA>(주)에코아이이앤씨대한민국국내
95839578차세대 핵심환경 기술개발사업이온성 액체-나노 융합소재를 이용한 촉매의 회수 및 재사용이화여자대학교 산학협력단대학이상기2006-04-012008-03-3120072007-09-18금속나노입자가 고정화된 이온성 액체-탄소나노튜브지지체 복합체 및 이의 제조방법특허출원10-2007-0094611<NA>이화여자대학교 산학협력단대한민국국내
67886783지중환경오염위해관리기술개발사업해안매립지역 자유상 유류 오염 지중 정화 기술 개발대일이앤씨중소기업이철효2018-06-012020-12-3120212021-03-12축열연소 산화설비를 이용한 토양정화 시스템 및 그 방법특허출원10-2021-0033018<NA>대일이앤씨대한민국국내
26562657글로벌탑환경기술개발사업막 손상/노후막 진단 기술 및 장치 개발한양대학교 산학협력단대학이용수2016-08-102021-10-3120172017-03-30여과막 손상 및 수명 진단 분리 이동형 장치특허출원10-2017-0040510<NA>한양대학교 산학협력단대한민국국내
98249819차세대 핵심환경 기술개발사업질산화 효율증진을 위한 하ㆍ폐수 고도처리 실용화 기술개발(주)디엠퓨어텍벤처기업송준상2004-06-012006-05-312004<NA>슬러지의 혐기성 또는 호기성 소화액으로 배양한 질산화미생물을 이용한 하수고도처리 방법특허등록<NA>등록 제 0434858호(주)디엠퓨어텍대한민국국내
83598354차세대 핵심환경 기술개발사업지구온난화추세 및 아열대기후 환경에서의 한반도 연안환경 기상영향분석 및 정량적 예측기술개발부산대학교 산학협력단대학하경자2008-04-012011-02-282009<NA>밝기온도 표준편차 판정법에 의한 기상관측위성을 이용한 안개 탐지시스템 및 그를 사용한 안개 탐지방법특허등록<NA>10-0934700부산대학교 산학협력단대한민국국내

Duplicate rows

Most frequently occurring

순번사업명연구과제명연구기관연구기관유형연구책임자과제시작일과제종료일성과년도출원등록년월명칭지식재산구분출원등록구분출원번호등록번호출원등록기관출원등록국가국내외구분# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA><NA>5