Overview

Dataset statistics

Number of variables6
Number of observations48
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.4 KiB
Average record size in memory50.8 B

Variable types

Text3
Categorical3

Dataset

Description국립과학수사연구원 등록특허 목록
Author행정안전부 국립과학수사연구원
URLhttps://www.data.go.kr/data/15061577/fileData.do

Alerts

RIGHT_DIV_NM is highly overall correlated with PATENT_DT and 1 other fieldsHigh correlation
PATENT_DT is highly overall correlated with RIGHT_DIV_NMHigh correlation
INVENTOR_NM_LIST is highly overall correlated with RIGHT_DIV_NMHigh correlation
RIGHT_DIV_NM is highly imbalanced (58.7%)Imbalance
TITLE has unique valuesUnique
PATENT_APPLICATION_NO has unique valuesUnique

Reproduction

Analysis started2023-12-12 15:49:46.234201
Analysis finished2023-12-12 15:49:47.168354
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

TITLE
Text

UNIQUE 

Distinct48
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size516.0 B
2023-12-13T00:49:47.429938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length65
Median length31
Mean length24.416667
Min length8

Characters and Unicode

Total characters1172
Distinct characters219
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)100.0%

Sample

1st rowPNA 프로브 및 형광융해곡선분석을 이용한 혈액형 판별방법
2nd row희생자 신원확인 시스템
3rd row식별코드를 이용한 증거물 위치추적 시스템
4th row스마트폰을 사용한 디지털파일 위변조 입증시스템 및 방법
5th row프린터 스테가노그래피 기법을 이용한 위조방지수단이 구비된 문서
ValueCountFrequency (%)
방법 22
 
7.1%
17
 
5.5%
장치 11
 
3.6%
교육용 10
 
3.2%
이용한 9
 
2.9%
시스템 9
 
2.9%
시각화장치 6
 
1.9%
시각화 6
 
1.9%
디지털 4
 
1.3%
제공 3
 
1.0%
Other values (167) 212
68.6%
2023-12-13T00:49:47.945167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
261
 
22.3%
30
 
2.6%
26
 
2.2%
25
 
2.1%
23
 
2.0%
22
 
1.9%
21
 
1.8%
19
 
1.6%
19
 
1.6%
17
 
1.5%
Other values (209) 709
60.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 795
67.8%
Space Separator 261
 
22.3%
Lowercase Letter 95
 
8.1%
Uppercase Letter 20
 
1.7%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
30
 
3.8%
26
 
3.3%
25
 
3.1%
23
 
2.9%
22
 
2.8%
21
 
2.6%
19
 
2.4%
19
 
2.4%
17
 
2.1%
15
 
1.9%
Other values (179) 578
72.7%
Lowercase Letter
ValueCountFrequency (%)
e 12
12.6%
o 10
10.5%
r 7
 
7.4%
a 7
 
7.4%
i 7
 
7.4%
c 7
 
7.4%
d 6
 
6.3%
t 5
 
5.3%
n 5
 
5.3%
s 5
 
5.3%
Other values (11) 24
25.3%
Uppercase Letter
ValueCountFrequency (%)
N 6
30.0%
S 3
15.0%
P 3
15.0%
A 3
15.0%
D 2
 
10.0%
G 2
 
10.0%
M 1
 
5.0%
Space Separator
ValueCountFrequency (%)
261
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 795
67.8%
Common 262
 
22.4%
Latin 115
 
9.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
30
 
3.8%
26
 
3.3%
25
 
3.1%
23
 
2.9%
22
 
2.8%
21
 
2.6%
19
 
2.4%
19
 
2.4%
17
 
2.1%
15
 
1.9%
Other values (179) 578
72.7%
Latin
ValueCountFrequency (%)
e 12
 
10.4%
o 10
 
8.7%
r 7
 
6.1%
a 7
 
6.1%
i 7
 
6.1%
c 7
 
6.1%
N 6
 
5.2%
d 6
 
5.2%
t 5
 
4.3%
n 5
 
4.3%
Other values (18) 43
37.4%
Common
ValueCountFrequency (%)
261
99.6%
- 1
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 795
67.8%
ASCII 377
32.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
261
69.2%
e 12
 
3.2%
o 10
 
2.7%
r 7
 
1.9%
a 7
 
1.9%
i 7
 
1.9%
c 7
 
1.9%
N 6
 
1.6%
d 6
 
1.6%
t 5
 
1.3%
Other values (20) 49
 
13.0%
Hangul
ValueCountFrequency (%)
30
 
3.8%
26
 
3.3%
25
 
3.1%
23
 
2.9%
22
 
2.8%
21
 
2.6%
19
 
2.4%
19
 
2.4%
17
 
2.1%
15
 
1.9%
Other values (179) 578
72.7%
Distinct44
Distinct (%)91.7%
Missing0
Missing (%)0.0%
Memory size516.0 B
2023-12-13T00:49:48.210201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length14
Mean length10.75
Min length4

Characters and Unicode

Total characters516
Distinct characters37
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42 ?
Unique (%)87.5%

Sample

1st row1695477
2nd row1707025
3rd row1713134
4th row1727582
5th row1727585
ValueCountFrequency (%)
컴퓨터 5
 
7.1%
프로그램 5
 
7.1%
5
 
7.1%
장치 4
 
5.7%
us 2
 
2.9%
9 2
 
2.9%
10-1934445 1
 
1.4%
제10-2019047호 1
 
1.4%
제10-1938712호 1
 
1.4%
제10-1886043호 1
 
1.4%
Other values (43) 43
61.4%
2023-12-13T00:49:48.626237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 98
19.0%
7 55
10.7%
0 48
 
9.3%
- 32
 
6.2%
9 31
 
6.0%
3 30
 
5.8%
27
 
5.2%
4 24
 
4.7%
8 23
 
4.5%
20
 
3.9%
Other values (27) 128
24.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 353
68.4%
Other Letter 100
 
19.4%
Dash Punctuation 32
 
6.2%
Space Separator 27
 
5.2%
Uppercase Letter 4
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
20
20.0%
20
20.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
Other values (13) 20
20.0%
Decimal Number
ValueCountFrequency (%)
1 98
27.8%
7 55
15.6%
0 48
13.6%
9 31
 
8.8%
3 30
 
8.5%
4 24
 
6.8%
8 23
 
6.5%
5 17
 
4.8%
2 15
 
4.2%
6 12
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
S 2
50.0%
U 2
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 32
100.0%
Space Separator
ValueCountFrequency (%)
27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 412
79.8%
Hangul 100
 
19.4%
Latin 4
 
0.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
20
20.0%
20
20.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
Other values (13) 20
20.0%
Common
ValueCountFrequency (%)
1 98
23.8%
7 55
13.3%
0 48
11.7%
- 32
 
7.8%
9 31
 
7.5%
3 30
 
7.3%
27
 
6.6%
4 24
 
5.8%
8 23
 
5.6%
5 17
 
4.1%
Other values (2) 27
 
6.6%
Latin
ValueCountFrequency (%)
S 2
50.0%
U 2
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 416
80.6%
Hangul 100
 
19.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 98
23.6%
7 55
13.2%
0 48
11.5%
- 32
 
7.7%
9 31
 
7.5%
3 30
 
7.2%
27
 
6.5%
4 24
 
5.8%
8 23
 
5.5%
5 17
 
4.1%
Other values (4) 31
 
7.5%
Hangul
ValueCountFrequency (%)
20
20.0%
20
20.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
5
 
5.0%
Other values (13) 20
20.0%
Distinct48
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size516.0 B
2023-12-13T00:49:48.895471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length17
Median length15
Mean length14.229167
Min length3

Characters and Unicode

Total characters683
Distinct characters14
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)100.0%

Sample

1st row2014-0063974
2nd row2014-0178281
3rd row2015-0096368
4th row2015-0016211
5th row2015-0181261
ValueCountFrequency (%)
2014-0063974 1
 
2.1%
2014-0178281 1
 
2.1%
제10-2017-0034948호 1
 
2.1%
436 1
 
2.1%
제10-2018-0067647호 1
 
2.1%
제10-2018-0106038호 1
 
2.1%
제10-2017-0056911호 1
 
2.1%
제10-2018-0075222호 1
 
2.1%
제10-2018-0075220호 1
 
2.1%
제10-2018-0123768호 1
 
2.1%
Other values (38) 38
79.2%
2023-12-13T00:49:49.351149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 166
24.3%
1 115
16.8%
- 76
11.1%
2 59
 
8.6%
6 41
 
6.0%
5 38
 
5.6%
7 36
 
5.3%
8 34
 
5.0%
4 25
 
3.7%
24
 
3.5%
Other values (4) 69
10.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 558
81.7%
Dash Punctuation 76
 
11.1%
Other Letter 48
 
7.0%
Other Punctuation 1
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 166
29.7%
1 115
20.6%
2 59
 
10.6%
6 41
 
7.3%
5 38
 
6.8%
7 36
 
6.5%
8 34
 
6.1%
4 25
 
4.5%
3 22
 
3.9%
9 22
 
3.9%
Other Letter
ValueCountFrequency (%)
24
50.0%
24
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 76
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 635
93.0%
Hangul 48
 
7.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 166
26.1%
1 115
18.1%
- 76
12.0%
2 59
 
9.3%
6 41
 
6.5%
5 38
 
6.0%
7 36
 
5.7%
8 34
 
5.4%
4 25
 
3.9%
3 22
 
3.5%
Other values (2) 23
 
3.6%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 635
93.0%
Hangul 48
 
7.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 166
26.1%
1 115
18.1%
- 76
12.0%
2 59
 
9.3%
6 41
 
6.5%
5 38
 
6.0%
7 36
 
5.7%
8 34
 
5.4%
4 25
 
3.9%
3 22
 
3.5%
Other values (2) 23
 
3.6%
Hangul
ValueCountFrequency (%)
24
50.0%
24
50.0%

RIGHT_DIV_NM
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Memory size516.0 B
국내특허
40 
RIGHT001
13/980
 
1
14/234
 
1

Length

Max length8
Median length4
Mean length4.5833333
Min length4

Unique

Unique2 ?
Unique (%)4.2%

Sample

1st row국내특허
2nd row국내특허
3rd row국내특허
4th row국내특허
5th row국내특허

Common Values

ValueCountFrequency (%)
국내특허 40
83.3%
RIGHT001 6
 
12.5%
13/980 1
 
2.1%
14/234 1
 
2.1%

Length

2023-12-13T00:49:49.527605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:49:49.677346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국내특허 40
83.3%
right001 6
 
12.5%
13/980 1
 
2.1%
14/234 1
 
2.1%

PATENT_DT
Categorical

HIGH CORRELATION 

Distinct23
Distinct (%)47.9%
Missing0
Missing (%)0.0%
Memory size516.0 B
국내특허
2017.09.19
2017.04.11
2017.02.09
2018.12.26
Other values (18)
25 

Length

Max length10
Median length10
Mean length8.9583333
Min length3

Unique

Unique13 ?
Unique (%)27.1%

Sample

1st row2017.01.10
2nd row2017.02.09
3rd row2017.02.28
4th row2017.04.11
5th row2017.04.11

Common Values

ValueCountFrequency (%)
국내특허 6
12.5%
2017.09.19 5
 
10.4%
2017.04.11 5
 
10.4%
2017.02.09 4
 
8.3%
2018.12.26 3
 
6.2%
2017.09.11 3
 
6.2%
2018.02.27 3
 
6.2%
2017.10.31 2
 
4.2%
2017.07.28 2
 
4.2%
2019.02.22 2
 
4.2%
Other values (13) 13
27.1%

Length

2023-12-13T00:49:49.842887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국내특허 6
12.5%
2017.04.11 5
 
10.4%
2017.09.19 5
 
10.4%
2017.02.09 4
 
8.3%
2018.12.26 3
 
6.2%
2017.09.11 3
 
6.2%
2018.02.27 3
 
6.2%
2017.10.31 2
 
4.2%
2017.07.28 2
 
4.2%
2019.02.22 2
 
4.2%
Other values (13) 13
27.1%

INVENTOR_NM_LIST
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)35.4%
Missing0
Missing (%)0.0%
Memory size516.0 B
이중
15 
박남규
15 
임시근
김종혁
RIGHT002
Other values (12)
12 

Length

Max length10
Median length3
Mean length3.7708333
Min length2

Unique

Unique12 ?
Unique (%)25.0%

Sample

1st row임시근
2nd row정낙은
3rd row박현철
4th row서중석
5th row이중

Common Values

ValueCountFrequency (%)
이중 15
31.2%
박남규 15
31.2%
임시근 2
 
4.2%
김종혁 2
 
4.2%
RIGHT002 2
 
4.2%
이재형 1
 
2.1%
박현철 1
 
2.1%
서중석 1
 
2.1%
양경무 1
 
2.1%
2018.07.30 1
 
2.1%
Other values (7) 7
14.6%

Length

2023-12-13T00:49:50.006620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
이중 15
31.2%
박남규 15
31.2%
임시근 2
 
4.2%
김종혁 2
 
4.2%
right002 2
 
4.2%
정낙은 1
 
2.1%
2016.08.23 1
 
2.1%
임동아 1
 
2.1%
2017.06.29 1
 
2.1%
2018.03.29 1
 
2.1%
Other values (7) 7
14.6%

Correlations

2023-12-13T00:49:50.116249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
TITLEPATENT_NOPATENT_APPLICATION_NORIGHT_DIV_NMPATENT_DTINVENTOR_NM_LIST
TITLE1.0001.0001.0001.0001.0001.000
PATENT_NO1.0001.0001.0000.0000.9820.000
PATENT_APPLICATION_NO1.0001.0001.0001.0001.0001.000
RIGHT_DIV_NM1.0000.0001.0001.0001.0000.876
PATENT_DT1.0000.9821.0001.0001.0000.766
INVENTOR_NM_LIST1.0000.0001.0000.8760.7661.000
2023-12-13T00:49:50.252518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
INVENTOR_NM_LISTPATENT_DTRIGHT_DIV_NM
INVENTOR_NM_LIST1.0000.2830.590
PATENT_DT0.2831.0000.754
RIGHT_DIV_NM0.5900.7541.000
2023-12-13T00:49:50.370284image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
RIGHT_DIV_NMPATENT_DTINVENTOR_NM_LIST
RIGHT_DIV_NM1.0000.7540.590
PATENT_DT0.7541.0000.283
INVENTOR_NM_LIST0.5900.2831.000

Missing values

2023-12-13T00:49:46.673711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:49:47.126259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

TITLEPATENT_NOPATENT_APPLICATION_NORIGHT_DIV_NMPATENT_DTINVENTOR_NM_LIST
0PNA 프로브 및 형광융해곡선분석을 이용한 혈액형 판별방법16954772014-0063974국내특허2017.01.10임시근
1희생자 신원확인 시스템17070252014-0178281국내특허2017.02.09정낙은
2식별코드를 이용한 증거물 위치추적 시스템17131342015-0096368국내특허2017.02.28박현철
3스마트폰을 사용한 디지털파일 위변조 입증시스템 및 방법17275822015-0016211국내특허2017.04.11서중석
4프린터 스테가노그래피 기법을 이용한 위조방지수단이 구비된 문서17275852015-0181261국내특허2017.04.11이중
5디지털 녹취 파일 녹취록 생성 방법17275872016-0066947국내특허2017.04.11이중
6PNA 프로브 및 융해곡선분석을 이용한 미토콘드리아 DNA의 SNP 분석방법17275982016-0117459국내특허2017.04.11임시근
7디지털 증거물에 대한 원격 접수장치17402992015-0126739국내특허2017.05.22이중
8사진이 구비된 보고서 작성 시스템 및 프로그램을 기록한 컴퓨터로 판독 가능한 기록매체17647742014-0178280국내특허2017.07.28양경무
9음향 분석을 활용한 횡방향 그루빙이 형성된 도로를 지나는 차량의 속도 추정 방법장치 및 컴퓨터 프로그램제10-1885065호RIGHT001국내특허2018.07.30
TITLEPATENT_NOPATENT_APPLICATION_NORIGHT_DIV_NMPATENT_DTINVENTOR_NM_LIST
38교육용 트래킹현상 시각화장치제10-1707027호제10-2016-0058184호국내특허2017.02.09박남규
39교육용 퓨즈용단 시각화장치제10-1707029호제10-2016-0058185호국내특허2017.02.09박남규
40교육용 화재 시각화장치제10-1727591호제10-2016-0077551호국내특허2017.04.11박남규
41부탄가스 폭발 시각화장치제10-1779184호제10-2017-0034947호국내특허2017.09.11박남규
42정전기 발생 시각화 장치제10-1793919호제10-2017-0056910호국내특허2017.10.31박남규
43족적 조회 방법 및 시스템제10-1767380호제10-2016-0054639호국내특허2016.05.03박남규
44공구흔 분석 방법장치 및 컴퓨터 프로그램제10-1885066호RIGHT001국내특허2019.07.19
45혈흔 구별 방법장치 및 컴퓨터 프로그램제10-1812089호RIGHT001국내특허2017.06.29
46잠재 충격흔 현출용 가열장치제10-1913752호제10-2017-0172506호국내특허2017.12.14임동아
47비산혈흔의 충돌 각도를 산출하는 전자 장치비산혈흔 충돌 각도 산출 방법 및 컴퓨터 프로그램제10-1970300호RIGHT001국내특허2019.04.12