Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Categorical2
Text1

Dataset

Description국가기술표준원에서 운영하고 있는 제품안전정보포털(센터)에서 제공하고 있는 제품안전 인증제품 첨부파일 정보를 공유합니다.
URLhttps://www.data.go.kr/data/15040702/fileData.do

Alerts

확장자 is highly imbalanced (99.9%)Imbalance

Reproduction

Analysis started2023-12-12 20:44:29.236413
Analysis finished2023-12-12 20:44:29.933983
Duration0.7 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

저장경로
Categorical

Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
/certInfo/2020/07
786 
/certInfo/2020/06
710 
/certInfo/2020/09
 
588
/certInfo/2020/08
 
561
/certInfo/2020/04
 
494
Other values (23)
6861 

Length

Max length17
Median length17
Mean length17
Min length17

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row/certInfo/2020/09
2nd row/certInfo/2021/12
3rd row/certInfo/2021/06
4th row/certInfo/2021/02
5th row/certInfo/2021/06

Common Values

ValueCountFrequency (%)
/certInfo/2020/07 786
 
7.9%
/certInfo/2020/06 710
 
7.1%
/certInfo/2020/09 588
 
5.9%
/certInfo/2020/08 561
 
5.6%
/certInfo/2020/04 494
 
4.9%
/certInfo/2021/07 492
 
4.9%
/certInfo/2021/06 472
 
4.7%
/certInfo/2021/12 422
 
4.2%
/certInfo/2020/02 412
 
4.1%
/certInfo/2022/02 372
 
3.7%
Other values (18) 4691
46.9%

Length

2023-12-13T05:44:30.057118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
certinfo/2020/07 786
 
7.9%
certinfo/2020/06 710
 
7.1%
certinfo/2020/09 588
 
5.9%
certinfo/2020/08 561
 
5.6%
certinfo/2020/04 494
 
4.9%
certinfo/2021/07 492
 
4.9%
certinfo/2021/06 472
 
4.7%
certinfo/2021/12 422
 
4.2%
certinfo/2020/02 412
 
4.1%
certinfo/2022/02 372
 
3.7%
Other values (18) 4691
46.9%
Distinct9992
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T05:44:30.283623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length23
Mean length20.4805
Min length19

Characters and Unicode

Total characters204805
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9984 ?
Unique (%)99.8%

Sample

1st rowB531R549-20001_1.jpg
2nd rowCB015S0022-1012_1.jpg
3rd rowXU101439-21025A_1.jpg
4th rowZU10322-21005_1.jpg
5th rowCA021A001-7004_2.jpg
ValueCountFrequency (%)
ca021h114-1022_1.jpg 2
 
< 0.1%
b731b001-21005_1.jpg 2
 
< 0.1%
ca021d003-9007a_1.jpg 2
 
< 0.1%
ca021o001-5008_1.jpg 2
 
< 0.1%
b731b001-21007_1.jpg 2
 
< 0.1%
ca031h001-9003_1.jpg 2
 
< 0.1%
ca011r041-8001a_5.jpg 2
 
< 0.1%
ca021h023-9058_1.jpg 2
 
< 0.1%
cb014a1088-0002_1.jpg 1
 
< 0.1%
ju04041-20001_1.jpg 1
 
< 0.1%
Other values (9982) 9982
99.8%
2023-12-13T05:44:30.695945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 37176
18.2%
1 30237
14.8%
2 12584
 
6.1%
- 10046
 
4.9%
g 10000
 
4.9%
_ 10000
 
4.9%
. 10000
 
4.9%
j 10000
 
4.9%
p 10000
 
4.9%
3 6568
 
3.2%
Other values (34) 58194
28.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 115389
56.3%
Lowercase Letter 30007
 
14.7%
Uppercase Letter 29363
 
14.3%
Dash Punctuation 10046
 
4.9%
Connector Punctuation 10000
 
4.9%
Other Punctuation 10000
 
4.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 6513
22.2%
B 5779
19.7%
A 4428
15.1%
R 3929
13.4%
H 2610
8.9%
U 2030
 
6.9%
X 816
 
2.8%
S 484
 
1.6%
J 464
 
1.6%
Y 407
 
1.4%
Other values (15) 1903
 
6.5%
Decimal Number
ValueCountFrequency (%)
0 37176
32.2%
1 30237
26.2%
2 12584
 
10.9%
3 6568
 
5.7%
4 5901
 
5.1%
6 5794
 
5.0%
5 5110
 
4.4%
7 4693
 
4.1%
9 4217
 
3.7%
8 3109
 
2.7%
Lowercase Letter
ValueCountFrequency (%)
g 10000
33.3%
j 10000
33.3%
p 10000
33.3%
c 3
 
< 0.1%
h 3
 
< 0.1%
e 1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 10046
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 10000
100.0%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 145435
71.0%
Latin 59370
29.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 10000
16.8%
j 10000
16.8%
p 10000
16.8%
C 6513
11.0%
B 5779
9.7%
A 4428
7.5%
R 3929
 
6.6%
H 2610
 
4.4%
U 2030
 
3.4%
X 816
 
1.4%
Other values (21) 3265
 
5.5%
Common
ValueCountFrequency (%)
0 37176
25.6%
1 30237
20.8%
2 12584
 
8.7%
- 10046
 
6.9%
_ 10000
 
6.9%
. 10000
 
6.9%
3 6568
 
4.5%
4 5901
 
4.1%
6 5794
 
4.0%
5 5110
 
3.5%
Other values (3) 12019
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 204805
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 37176
18.2%
1 30237
14.8%
2 12584
 
6.1%
- 10046
 
4.9%
g 10000
 
4.9%
_ 10000
 
4.9%
. 10000
 
4.9%
j 10000
 
4.9%
p 10000
 
4.9%
3 6568
 
3.2%
Other values (34) 58194
28.4%

확장자
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
jpg
9999 
peg
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowjpg
2nd rowjpg
3rd rowjpg
4th rowjpg
5th rowjpg

Common Values

ValueCountFrequency (%)
jpg 9999
> 99.9%
peg 1
 
< 0.1%

Length

2023-12-13T05:44:30.878594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:44:30.999945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
jpg 9999
> 99.9%
peg 1
 
< 0.1%

Correlations

2023-12-13T05:44:31.063890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
저장경로확장자
저장경로1.0000.000
확장자0.0001.000
2023-12-13T05:44:31.167991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
확장자저장경로
확장자1.0000.000
저장경로0.0001.000
2023-12-13T05:44:31.270994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
저장경로확장자
저장경로1.0000.000
확장자0.0001.000

Missing values

2023-12-13T05:44:29.777649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:44:29.888185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

저장경로이미지파일이름확장자
45791/certInfo/2020/09B531R549-20001_1.jpgjpg
90314/certInfo/2021/12CB015S0022-1012_1.jpgjpg
70548/certInfo/2021/06XU101439-21025A_1.jpgjpg
56297/certInfo/2021/02ZU10322-21005_1.jpgjpg
68730/certInfo/2021/06CA021A001-7004_2.jpgjpg
10059/certInfo/2020/02JH11983-20001_1.jpgjpg
74869/certInfo/2021/07CA021H004-1067_1.jpgjpg
96423/certInfo/2022/02CB015R0919-1016_2.jpgjpg
88503/certInfo/2021/12CB014R1624-1004_1.jpgjpg
29024/certInfo/2020/06CB065R0561-9002A_1.jpgjpg
저장경로이미지파일이름확장자
49838/certInfo/2020/09HU071858-20018A_1.jpgjpg
12962/certInfo/2020/02CB061R5321-0002_1.jpgjpg
96649/certInfo/2022/02B675R0072-21001_1.jpgjpg
65325/certInfo/2021/05CB064R2790-1001_2.jpgjpg
37187/certInfo/2020/07CB061R169-5004B_4.jpgjpg
77728/certInfo/2021/08CB061T082-1002_3.jpgjpg
69648/certInfo/2021/06CA023H115-0048A_2.jpgjpg
28253/certInfo/2020/06ZU101167-20001_1.jpgjpg
14264/certInfo/2020/03B193R017-9002B_1.jpgjpg
62136/certInfo/2021/04CB061R6134-1001_1.jpgjpg