Overview

Dataset statistics

Number of variables11
Number of observations25
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.4 KiB
Average record size in memory97.3 B

Variable types

Categorical9
Numeric1
Text1

Dataset

Description특허청_KIPRISPlus_시소러스입니다. 발음, 의미가 유사한 단어들을 모은 단어 사전인 시소러스 정보를 제공합니다. (KIPRISPlus 서비스)
Author특허청
URLhttps://www.data.go.kr/data/15044347/fileData.do

Alerts

시소러스IPC섹션구분코드 has constant value ""Constant
시소러스기준단어명 has constant value ""Constant
시소러스기준단어언어코드 has constant value ""Constant
부서코드 has constant value ""Constant
시소러스단어일련번호 is highly overall correlated with 시소러스단어분류코드High correlation
시소러스단어분류코드 is highly overall correlated with 시소러스단어일련번호 and 1 other fieldsHigh correlation
시소러스관련단어가중치 is highly overall correlated with 시소러스단어분류코드High correlation
시소러스단어입력일자 is highly overall correlated with 시소러스단어변경일자High correlation
시소러스단어변경일자 is highly overall correlated with 시소러스단어입력일자High correlation
시소러스단어입력일자 is highly imbalanced (59.8%)Imbalance
시소러스단어변경일자 is highly imbalanced (59.8%)Imbalance
시소러스관련단어중요도구분 is highly imbalanced (75.8%)Imbalance
시소러스단어일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-13 00:44:25.587466
Analysis finished2023-12-13 00:44:26.153689
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시소러스IPC섹션구분코드
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
G
25 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowG
2nd rowG
3rd rowG
4th rowG
5th rowG

Common Values

ValueCountFrequency (%)
G 25
100.0%

Length

2023-12-13T09:44:26.198715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:26.263836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
g 25
100.0%

시소러스기준단어명
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
유리제품
25 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row유리제품
2nd row유리제품
3rd row유리제품
4th row유리제품
5th row유리제품

Common Values

ValueCountFrequency (%)
유리제품 25
100.0%

Length

2023-12-13T09:44:26.328894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:26.396563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
유리제품 25
100.0%

시소러스단어일련번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct25
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.8
Minimum1
Maximum34
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size357.0 B
2023-12-13T09:44:26.463406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.2
Q19
median18
Q325
95-th percentile32.8
Maximum34
Range33
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.099505
Coefficient of variation (CV)0.56738792
Kurtosis-1.1880774
Mean17.8
Median Absolute Deviation (MAD)9
Skewness0.00026378611
Sum445
Variance102
MonotonicityStrictly increasing
2023-12-13T09:44:26.552658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=25)
ValueCountFrequency (%)
1 1
 
4.0%
3 1
 
4.0%
34 1
 
4.0%
33 1
 
4.0%
32 1
 
4.0%
30 1
 
4.0%
29 1
 
4.0%
28 1
 
4.0%
25 1
 
4.0%
24 1
 
4.0%
Other values (15) 15
60.0%
ValueCountFrequency (%)
1 1
4.0%
3 1
4.0%
4 1
4.0%
6 1
4.0%
7 1
4.0%
8 1
4.0%
9 1
4.0%
11 1
4.0%
12 1
4.0%
13 1
4.0%
ValueCountFrequency (%)
34 1
4.0%
33 1
4.0%
32 1
4.0%
30 1
4.0%
29 1
4.0%
28 1
4.0%
25 1
4.0%
24 1
4.0%
23 1
4.0%
22 1
4.0%
Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
S10801
25 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS10801
2nd rowS10801
3rd rowS10801
4th rowS10801
5th rowS10801

Common Values

ValueCountFrequency (%)
S10801 25
100.0%

Length

2023-12-13T09:44:26.664004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:26.731710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s10801 25
100.0%

시소러스단어분류코드
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
S10702
15 
S10704
S10701

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS10704
2nd rowS10704
3rd rowS10704
4th rowS10704
5th rowS10704

Common Values

ValueCountFrequency (%)
S10702 15
60.0%
S10704 7
28.0%
S10701 3
 
12.0%

Length

2023-12-13T09:44:26.796947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:26.869313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
s10702 15
60.0%
s10704 7
28.0%
s10701 3
 
12.0%
Distinct22
Distinct (%)88.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
2023-12-13T09:44:26.979036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length5.44
Min length2

Characters and Unicode

Total characters136
Distinct characters57
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)76.0%

Sample

1st rowCERAMIC
2nd rowGLASS
3rd rowGlass article
4th rowarticle of glass
5th rowglass product
ValueCountFrequency (%)
glass 5
 
15.6%
세라믹 2
 
6.2%
글래스 2
 
6.2%
프로덕트 2
 
6.2%
article 2
 
6.2%
product 2
 
6.2%
시레믹 1
 
3.1%
ceramic 1
 
3.1%
새라믹 1
 
3.1%
질그릇 1
 
3.1%
Other values (13) 13
40.6%
2023-12-13T09:44:27.443177image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 10
 
7.4%
a 8
 
5.9%
7
 
5.1%
l 7
 
5.1%
5
 
3.7%
r 5
 
3.7%
g 4
 
2.9%
4
 
2.9%
4
 
2.9%
c 4
 
2.9%
Other values (47) 78
57.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 58
42.6%
Other Letter 58
42.6%
Uppercase Letter 13
 
9.6%
Space Separator 7
 
5.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
8.6%
4
 
6.9%
4
 
6.9%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
2
 
3.4%
2
 
3.4%
Other values (22) 26
44.8%
Lowercase Letter
ValueCountFrequency (%)
s 10
17.2%
a 8
13.8%
l 7
12.1%
r 5
8.6%
g 4
 
6.9%
c 4
 
6.9%
t 4
 
6.9%
e 3
 
5.2%
o 3
 
5.2%
i 2
 
3.4%
Other values (5) 8
13.8%
Uppercase Letter
ValueCountFrequency (%)
A 2
15.4%
C 2
15.4%
G 2
15.4%
S 2
15.4%
I 1
7.7%
E 1
7.7%
R 1
7.7%
M 1
7.7%
L 1
7.7%
Space Separator
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 71
52.2%
Hangul 58
42.6%
Common 7
 
5.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
8.6%
4
 
6.9%
4
 
6.9%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
2
 
3.4%
2
 
3.4%
Other values (22) 26
44.8%
Latin
ValueCountFrequency (%)
s 10
14.1%
a 8
 
11.3%
l 7
 
9.9%
r 5
 
7.0%
g 4
 
5.6%
c 4
 
5.6%
t 4
 
5.6%
e 3
 
4.2%
o 3
 
4.2%
A 2
 
2.8%
Other values (14) 21
29.6%
Common
ValueCountFrequency (%)
7
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 78
57.4%
Hangul 58
42.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 10
12.8%
a 8
 
10.3%
7
 
9.0%
l 7
 
9.0%
r 5
 
6.4%
g 4
 
5.1%
c 4
 
5.1%
t 4
 
5.1%
e 3
 
3.8%
o 3
 
3.8%
Other values (15) 23
29.5%
Hangul
ValueCountFrequency (%)
5
 
8.6%
4
 
6.9%
4
 
6.9%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
3
 
5.2%
2
 
3.4%
2
 
3.4%
Other values (22) 26
44.8%

시소러스관련단어가중치
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)12.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
7
16 
4
14
 
1

Length

Max length2
Median length1
Mean length1.04
Min length1

Unique

Unique1 ?
Unique (%)4.0%

Sample

1st row7
2nd row7
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
7 16
64.0%
4 8
32.0%
14 1
 
4.0%

Length

2023-12-13T09:44:27.540155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:27.613310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
7 16
64.0%
4 8
32.0%
14 1
 
4.0%

부서코드
Categorical

CONSTANT 

Distinct1
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
25 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
25
100.0%

Length

2023-12-13T09:44:27.688627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:27.755579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
No values found.

시소러스단어입력일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
20090701
23 
20090602
 
2

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20090701
2nd row20090701
3rd row20090701
4th row20090701
5th row20090701

Common Values

ValueCountFrequency (%)
20090701 23
92.0%
20090602 2
 
8.0%

Length

2023-12-13T09:44:27.822758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:27.888473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20090701 23
92.0%
20090602 2
 
8.0%

시소러스단어변경일자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
20090701
23 
20090707
 
2

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20090701
2nd row20090701
3rd row20090701
4th row20090701
5th row20090701

Common Values

ValueCountFrequency (%)
20090701 23
92.0%
20090707 2
 
8.0%

Length

2023-12-13T09:44:27.960380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:28.026554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20090701 23
92.0%
20090707 2
 
8.0%
Distinct2
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size332.0 B
낮음
24 
높음
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)4.0%

Sample

1st row낮음
2nd row낮음
3rd row낮음
4th row낮음
5th row낮음

Common Values

ValueCountFrequency (%)
낮음 24
96.0%
높음 1
 
4.0%

Length

2023-12-13T09:44:28.107477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:44:28.180248image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
낮음 24
96.0%
높음 1
 
4.0%

Interactions

2023-12-13T09:44:25.899879image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T09:44:28.230188image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시소러스단어일련번호시소러스단어분류코드시소러스관련단어명시소러스관련단어가중치시소러스단어입력일자시소러스단어변경일자시소러스관련단어중요도구분
시소러스단어일련번호1.0000.8010.8660.4740.5040.5040.559
시소러스단어분류코드0.8011.0001.0000.8690.1920.1920.104
시소러스관련단어명0.8661.0001.0001.0000.0000.0000.000
시소러스관련단어가중치0.4740.8691.0001.0000.2060.2060.058
시소러스단어입력일자0.5040.1920.0000.2061.0000.9010.382
시소러스단어변경일자0.5040.1920.0000.2060.9011.0000.382
시소러스관련단어중요도구분0.5590.1040.0000.0580.3820.3821.000
2023-12-13T09:44:28.314270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시소러스단어변경일자시소러스관련단어중요도구분시소러스단어분류코드시소러스관련단어가중치시소러스단어입력일자
시소러스단어변경일자1.0000.2460.3040.3250.714
시소러스관련단어중요도구분0.2461.0000.1580.0740.246
시소러스단어분류코드0.3040.1581.0000.5590.304
시소러스관련단어가중치0.3250.0740.5591.0000.325
시소러스단어입력일자0.7140.2460.3040.3251.000
2023-12-13T09:44:28.394549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시소러스단어일련번호시소러스단어분류코드시소러스관련단어가중치시소러스단어입력일자시소러스단어변경일자시소러스관련단어중요도구분
시소러스단어일련번호1.0000.5640.2380.2920.2920.330
시소러스단어분류코드0.5641.0000.5590.3040.3040.158
시소러스관련단어가중치0.2380.5591.0000.3250.3250.074
시소러스단어입력일자0.2920.3040.3251.0000.7140.246
시소러스단어변경일자0.2920.3040.3250.7141.0000.246
시소러스관련단어중요도구분0.3300.1580.0740.2460.2461.000

Missing values

2023-12-13T09:44:25.990906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:44:26.106577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시소러스IPC섹션구분코드시소러스기준단어명시소러스단어일련번호시소러스기준단어언어코드시소러스단어분류코드시소러스관련단어명시소러스관련단어가중치부서코드시소러스단어입력일자시소러스단어변경일자시소러스관련단어중요도구분
0G유리제품1S10801S10704CERAMIC72009070120090701낮음
1G유리제품3S10801S10704GLASS72009070120090701낮음
2G유리제품4S10801S10704Glass article42009070120090701낮음
3G유리제품6S10801S10704article of glass42009070120090701낮음
4G유리제품7S10801S10704glass product42009070120090701낮음
5G유리제품8S10801S10704glassware42009070120090701낮음
6G유리제품9S10801S10704glass product42009060220090707높음
7G유리제품11S10801S10702광물72009070120090701낮음
8G유리제품12S10801S10702그라스72009070120090701낮음
9G유리제품13S10801S10702그래스72009070120090701낮음
시소러스IPC섹션구분코드시소러스기준단어명시소러스단어일련번호시소러스기준단어언어코드시소러스단어분류코드시소러스관련단어명시소러스관련단어가중치부서코드시소러스단어입력일자시소러스단어변경일자시소러스관련단어중요도구분
15G유리제품22S10801S10702세라믹72009070120090701낮음
16G유리제품23S10801S10702소다유리72009070120090701낮음
17G유리제품24S10801S10702세라믹72009070120090701낮음
18G유리제품25S10801S10702시레믹72009070120090701낮음
19G유리제품28S10801S10702요업72009070120090701낮음
20G유리제품29S10801S10702유리72009070120090701낮음
21G유리제품30S10801S10701유리재품42009070120090701낮음
22G유리제품32S10801S10702점토72009070120090701낮음
23G유리제품33S10801S10702질그릇72009070120090701낮음
24G유리제품34S10801S10702초자142009070120090701낮음