Overview

Dataset statistics

Number of variables4
Number of observations104
Missing cells0
Missing cells (%)0.0%
Duplicate rows2
Duplicate rows (%)1.9%
Total size in memory3.4 KiB
Average record size in memory33.3 B

Variable types

Text2
Categorical2

Dataset

Description한국물기술인증원의 위생안전기준 인증등록정보망에 등록된 시료시판품(기업명, 제품, 인증번호, 결과)에 대한 정보입니다.
Author환경부
URLhttps://www.data.go.kr/data/15071374/fileData.do

Alerts

결과 has constant value ""Constant
Dataset has 2 (1.9%) duplicate rowsDuplicates
제품분류 is highly imbalanced (61.2%)Imbalance

Reproduction

Analysis started2024-05-04 08:14:40.377466
Analysis finished2024-05-04 08:14:43.198172
Duration2.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct95
Distinct (%)91.3%
Missing0
Missing (%)0.0%
Memory size964.0 B
2024-05-04T08:14:43.618796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length40
Median length13
Mean length7.6057692
Min length2

Characters and Unicode

Total characters791
Distinct characters174
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique87 ?
Unique (%)83.7%

Sample

1st row삼효금속공업㈜
2nd row대림통상㈜금구공장
3rd row케이에프(KF)
4th row대원
5th row제일금속공업사
ValueCountFrequency (%)
주식회사 15
 
10.9%
아이제이코리아 3
 
2.2%
sj코리아 3
 
2.2%
임창 3
 
2.2%
co 2
 
1.5%
제이시엘인더스트리 2
 
1.5%
2
 
1.5%
에코수전 2
 
1.5%
조은ws 2
 
1.5%
인포메탈 2
 
1.5%
Other values (98) 101
73.7%
2024-05-04T08:14:45.169915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
43
 
5.4%
35
 
4.4%
( 31
 
3.9%
) 31
 
3.9%
29
 
3.7%
26
 
3.3%
21
 
2.7%
21
 
2.7%
19
 
2.4%
18
 
2.3%
Other values (164) 517
65.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 575
72.7%
Uppercase Letter 72
 
9.1%
Space Separator 35
 
4.4%
Open Punctuation 31
 
3.9%
Close Punctuation 31
 
3.9%
Lowercase Letter 21
 
2.7%
Other Symbol 18
 
2.3%
Other Punctuation 6
 
0.8%
Decimal Number 1
 
0.1%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
43
 
7.5%
29
 
5.0%
26
 
4.5%
21
 
3.7%
21
 
3.7%
19
 
3.3%
18
 
3.1%
16
 
2.8%
15
 
2.6%
11
 
1.9%
Other values (121) 356
61.9%
Uppercase Letter
ValueCountFrequency (%)
S 8
 
11.1%
A 8
 
11.1%
G 5
 
6.9%
W 5
 
6.9%
N 4
 
5.6%
O 4
 
5.6%
T 4
 
5.6%
D 3
 
4.2%
L 3
 
4.2%
U 3
 
4.2%
Other values (13) 25
34.7%
Lowercase Letter
ValueCountFrequency (%)
e 4
19.0%
t 3
14.3%
s 3
14.3%
a 2
9.5%
r 2
9.5%
m 1
 
4.8%
y 1
 
4.8%
k 1
 
4.8%
n 1
 
4.8%
h 1
 
4.8%
Other values (2) 2
9.5%
Other Punctuation
ValueCountFrequency (%)
. 4
66.7%
, 2
33.3%
Space Separator
ValueCountFrequency (%)
35
100.0%
Open Punctuation
ValueCountFrequency (%)
( 31
100.0%
Close Punctuation
ValueCountFrequency (%)
) 31
100.0%
Other Symbol
ValueCountFrequency (%)
18
100.0%
Decimal Number
ValueCountFrequency (%)
2 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 593
75.0%
Common 105
 
13.3%
Latin 93
 
11.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
43
 
7.3%
29
 
4.9%
26
 
4.4%
21
 
3.5%
21
 
3.5%
19
 
3.2%
18
 
3.0%
18
 
3.0%
16
 
2.7%
15
 
2.5%
Other values (122) 367
61.9%
Latin
ValueCountFrequency (%)
S 8
 
8.6%
A 8
 
8.6%
G 5
 
5.4%
W 5
 
5.4%
N 4
 
4.3%
O 4
 
4.3%
e 4
 
4.3%
T 4
 
4.3%
D 3
 
3.2%
t 3
 
3.2%
Other values (25) 45
48.4%
Common
ValueCountFrequency (%)
35
33.3%
( 31
29.5%
) 31
29.5%
. 4
 
3.8%
, 2
 
1.9%
2 1
 
1.0%
- 1
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 575
72.7%
ASCII 198
 
25.0%
None 18
 
2.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
43
 
7.5%
29
 
5.0%
26
 
4.5%
21
 
3.7%
21
 
3.7%
19
 
3.3%
18
 
3.1%
16
 
2.8%
15
 
2.6%
11
 
1.9%
Other values (121) 356
61.9%
ASCII
ValueCountFrequency (%)
35
17.7%
( 31
15.7%
) 31
15.7%
S 8
 
4.0%
A 8
 
4.0%
G 5
 
2.5%
W 5
 
2.5%
N 4
 
2.0%
O 4
 
2.0%
. 4
 
2.0%
Other values (32) 63
31.8%
None
ValueCountFrequency (%)
18
100.0%

제품분류
Categorical

IMBALANCE 

Distinct21
Distinct (%)20.2%
Missing0
Missing (%)0.0%
Memory size964.0 B
수도꼭지
81 
샤워기 헤드
 
2
제어밸브
 
2
체크밸브
 
2
원심펌프
 
1
Other values (16)
16 

Length

Max length17
Median length4
Mean length4.6346154
Min length3

Unique

Unique17 ?
Unique (%)16.3%

Sample

1st row청동 밸브
2nd row수도꼭지
3rd row수도꼭지
4th row수도꼭지
5th row수도꼭지

Common Values

ValueCountFrequency (%)
수도꼭지 81
77.9%
샤워기 헤드 2
 
1.9%
제어밸브 2
 
1.9%
체크밸브 2
 
1.9%
원심펌프 1
 
1.0%
감압밸브 1
 
1.0%
수도용 스테인리스 강관 이음쇠 1
 
1.0%
게이트밸브 1
 
1.0%
체크밸브 1
 
1.0%
일반 배관용 스테인리스 강관 1
 
1.0%
Other values (11) 11
 
10.6%

Length

2024-05-04T08:14:45.802855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
수도꼭지 81
67.5%
체크밸브 3
 
2.5%
스테인리스 3
 
2.5%
수도용 2
 
1.7%
샤워기 2
 
1.7%
게이트밸브 2
 
1.7%
강관 2
 
1.7%
제어밸브 2
 
1.7%
헤드 2
 
1.7%
밸브 1
 
0.8%
Other values (20) 20
 
16.7%
Distinct93
Distinct (%)89.4%
Missing0
Missing (%)0.0%
Memory size964.0 B
2024-05-04T08:14:46.566897image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length13
Mean length13.278846
Min length13

Characters and Unicode

Total characters1381
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83 ?
Unique (%)79.8%

Sample

1st rowKCW-2012-0456
2nd rowKCW-2012-0230
3rd rowKCW-2012-0444
4th rowKCW-2013-0246
5th rowKCW-2012-0227
ValueCountFrequency (%)
kcw-2014-0005 3
 
2.9%
kcw-2013-0246 2
 
1.9%
kcw-2012-0440 2
 
1.9%
kcw-2012-0318 2
 
1.9%
kcw-2012-0369 2
 
1.9%
kcw-2012-0241 2
 
1.9%
kcw-2014-0054 2
 
1.9%
kcw-2014-0176 2
 
1.9%
kcw-2012-0328 2
 
1.9%
kcw-2012-0317 2
 
1.9%
Other values (76) 83
79.8%
2024-05-04T08:14:48.017351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 264
19.1%
- 208
15.1%
2 200
14.5%
1 139
10.1%
K 104
 
7.5%
C 104
 
7.5%
W 104
 
7.5%
4 45
 
3.3%
3 40
 
2.9%
8 39
 
2.8%
Other values (5) 134
9.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 832
60.2%
Uppercase Letter 312
 
22.6%
Dash Punctuation 208
 
15.1%
Space Separator 29
 
2.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 264
31.7%
2 200
24.0%
1 139
16.7%
4 45
 
5.4%
3 40
 
4.8%
8 39
 
4.7%
6 29
 
3.5%
7 29
 
3.5%
5 24
 
2.9%
9 23
 
2.8%
Uppercase Letter
ValueCountFrequency (%)
K 104
33.3%
C 104
33.3%
W 104
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 208
100.0%
Space Separator
ValueCountFrequency (%)
29
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1069
77.4%
Latin 312
 
22.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 264
24.7%
- 208
19.5%
2 200
18.7%
1 139
13.0%
4 45
 
4.2%
3 40
 
3.7%
8 39
 
3.6%
29
 
2.7%
6 29
 
2.7%
7 29
 
2.7%
Other values (2) 47
 
4.4%
Latin
ValueCountFrequency (%)
K 104
33.3%
C 104
33.3%
W 104
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1381
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 264
19.1%
- 208
15.1%
2 200
14.5%
1 139
10.1%
K 104
 
7.5%
C 104
 
7.5%
W 104
 
7.5%
4 45
 
3.3%
3 40
 
2.9%
8 39
 
2.8%
Other values (5) 134
9.7%

결과
Categorical

CONSTANT 

Distinct1
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size964.0 B
부적합
104 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부적합
2nd row부적합
3rd row부적합
4th row부적합
5th row부적합

Common Values

ValueCountFrequency (%)
부적합 104
100.0%

Length

2024-05-04T08:14:48.966214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-04T08:14:49.450022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
부적합 104
100.0%

Correlations

2024-05-04T08:14:49.704844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기업명제품분류인증번호
기업명1.0001.0000.989
제품분류1.0001.0001.000
인증번호0.9891.0001.000

Missing values

2024-05-04T08:14:42.621907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-04T08:14:43.088233image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기업명제품분류인증번호결과
0삼효금속공업㈜청동 밸브KCW-2012-0456부적합
1대림통상㈜금구공장수도꼭지KCW-2012-0230부적합
2케이에프(KF)수도꼭지KCW-2012-0444부적합
3대원수도꼭지KCW-2013-0246부적합
4제일금속공업사수도꼭지KCW-2012-0227부적합
5㈜제이시엘인더스트리수도꼭지KCW-2014-0005부적합
6KWC Franke Water Systems AG수도꼭지KCW-2015-0028부적합
7㈜디아수도꼭지KCW-2014-0176부적합
8세븐워터수도꼭지KCW-2014-0082부적합
9에코수전수도꼭지KCW-2014-0054부적합
기업명제품분류인증번호결과
94주식회사 신원아너스수도꼭지KCW-2017-0187부적합
95조은WS수도꼭지KCW-2018-0179부적합
96트랜드 주식회사수도꼭지KCW-2020-0050부적합
97(주)글로벌에스티수도꼭지KCW-2018-0203부적합
98WENZHOU HAIBA SANITARY CO., LTD.수도꼭지KCW-2018-0076부적합
99아이제이코리아수도꼭지KCW-2016-0052부적합
100주식회사 혜성코리아수도꼭지KCW-2015-0108부적합
101주식회사 임창수도꼭지KCW-2012-0349부적합
102SJ코리아수도꼭지KCW-2019-0052부적합
103SJ코리아수도꼭지KCW-2012-0226부적합

Duplicate rows

Most frequently occurring

기업명제품분류인증번호결과# duplicates
0아이제이코리아수도꼭지KCW-2016-0052부적합2
1조은WS수도꼭지KCW-2018-0179부적합2