Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 10000 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 312.5 KiB |
Average record size in memory | 32.0 B |
Variable types
Categorical | 2 |
---|---|
Text | 1 |
Dataset
Description | 국가기술표준원에서 운영하고 있는 제품안전정보포털(센터)에서 제공하고 있는 제품안전 인증제품 첨부파일 정보를 공유합니다. |
---|---|
URL | https://www.data.go.kr/data/15040702/fileData.do |
확장자 is highly imbalanced (99.9%) | Imbalance |
Reproduction
Analysis started | 2023-12-12 20:44:29.236413 |
---|---|
Analysis finished | 2023-12-12 20:44:29.933983 |
Duration | 0.7 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
저장경로
Categorical
Distinct | 28 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
/certInfo/2020/07 | |
---|---|
/certInfo/2020/06 | |
/certInfo/2020/09 | 588 |
/certInfo/2020/08 | 561 |
/certInfo/2020/04 | 494 |
Other values (23) |
Length
Max length | 17 |
---|---|
Median length | 17 |
Mean length | 17 |
Min length | 17 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | /certInfo/2020/09 |
---|---|
2nd row | /certInfo/2021/12 |
3rd row | /certInfo/2021/06 |
4th row | /certInfo/2021/02 |
5th row | /certInfo/2021/06 |
Common Values
Value | Count | Frequency (%) |
/certInfo/2020/07 | 786 | 7.9% |
/certInfo/2020/06 | 710 | 7.1% |
/certInfo/2020/09 | 588 | 5.9% |
/certInfo/2020/08 | 561 | 5.6% |
/certInfo/2020/04 | 494 | 4.9% |
/certInfo/2021/07 | 492 | 4.9% |
/certInfo/2021/06 | 472 | 4.7% |
/certInfo/2021/12 | 422 | 4.2% |
/certInfo/2020/02 | 412 | 4.1% |
/certInfo/2022/02 | 372 | 3.7% |
Other values (18) | 4691 |
Length
Value | Count | Frequency (%) |
certinfo/2020/07 | 786 | 7.9% |
certinfo/2020/06 | 710 | 7.1% |
certinfo/2020/09 | 588 | 5.9% |
certinfo/2020/08 | 561 | 5.6% |
certinfo/2020/04 | 494 | 4.9% |
certinfo/2021/07 | 492 | 4.9% |
certinfo/2021/06 | 472 | 4.7% |
certinfo/2021/12 | 422 | 4.2% |
certinfo/2020/02 | 412 | 4.1% |
certinfo/2022/02 | 372 | 3.7% |
Other values (18) | 4691 |
이미지파일이름
Text
Distinct | 9992 |
---|---|
Distinct (%) | 99.9% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Length
Max length | 24 |
---|---|
Median length | 23 |
Mean length | 20.4805 |
Min length | 19 |
Characters and Unicode
Total characters | 204805 |
---|---|
Distinct characters | 44 |
Distinct categories | 6 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique
Unique | 9984 ? |
---|---|
Unique (%) | 99.8% |
Sample
1st row | B531R549-20001_1.jpg |
---|---|
2nd row | CB015S0022-1012_1.jpg |
3rd row | XU101439-21025A_1.jpg |
4th row | ZU10322-21005_1.jpg |
5th row | CA021A001-7004_2.jpg |
Value | Count | Frequency (%) |
ca021h114-1022_1.jpg | 2 | < 0.1% |
b731b001-21005_1.jpg | 2 | < 0.1% |
ca021d003-9007a_1.jpg | 2 | < 0.1% |
ca021o001-5008_1.jpg | 2 | < 0.1% |
b731b001-21007_1.jpg | 2 | < 0.1% |
ca031h001-9003_1.jpg | 2 | < 0.1% |
ca011r041-8001a_5.jpg | 2 | < 0.1% |
ca021h023-9058_1.jpg | 2 | < 0.1% |
cb014a1088-0002_1.jpg | 1 | < 0.1% |
ju04041-20001_1.jpg | 1 | < 0.1% |
Other values (9982) | 9982 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 37176 | |
1 | 30237 | |
2 | 12584 | 6.1% |
- | 10046 | 4.9% |
g | 10000 | 4.9% |
_ | 10000 | 4.9% |
. | 10000 | 4.9% |
j | 10000 | 4.9% |
p | 10000 | 4.9% |
3 | 6568 | 3.2% |
Other values (34) | 58194 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 115389 | |
Lowercase Letter | 30007 | 14.7% |
Uppercase Letter | 29363 | 14.3% |
Dash Punctuation | 10046 | 4.9% |
Connector Punctuation | 10000 | 4.9% |
Other Punctuation | 10000 | 4.9% |
Most frequent character per category
Uppercase Letter
Value | Count | Frequency (%) |
C | 6513 | |
B | 5779 | |
A | 4428 | |
R | 3929 | |
H | 2610 | |
U | 2030 | 6.9% |
X | 816 | 2.8% |
S | 484 | 1.6% |
J | 464 | 1.6% |
Y | 407 | 1.4% |
Other values (15) | 1903 | 6.5% |
Decimal Number
Value | Count | Frequency (%) |
0 | 37176 | |
1 | 30237 | |
2 | 12584 | 10.9% |
3 | 6568 | 5.7% |
4 | 5901 | 5.1% |
6 | 5794 | 5.0% |
5 | 5110 | 4.4% |
7 | 4693 | 4.1% |
9 | 4217 | 3.7% |
8 | 3109 | 2.7% |
Lowercase Letter
Value | Count | Frequency (%) |
g | 10000 | |
j | 10000 | |
p | 10000 | |
c | 3 | < 0.1% |
h | 3 | < 0.1% |
e | 1 | < 0.1% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 10046 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 10000 |
Other Punctuation
Value | Count | Frequency (%) |
. | 10000 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 145435 | |
Latin | 59370 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
g | 10000 | |
j | 10000 | |
p | 10000 | |
C | 6513 | |
B | 5779 | |
A | 4428 | |
R | 3929 | 6.6% |
H | 2610 | 4.4% |
U | 2030 | 3.4% |
X | 816 | 1.4% |
Other values (21) | 3265 | 5.5% |
Common
Value | Count | Frequency (%) |
0 | 37176 | |
1 | 30237 | |
2 | 12584 | 8.7% |
- | 10046 | 6.9% |
_ | 10000 | 6.9% |
. | 10000 | 6.9% |
3 | 6568 | 4.5% |
4 | 5901 | 4.1% |
6 | 5794 | 4.0% |
5 | 5110 | 3.5% |
Other values (3) | 12019 | 8.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 204805 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 37176 | |
1 | 30237 | |
2 | 12584 | 6.1% |
- | 10046 | 4.9% |
g | 10000 | 4.9% |
_ | 10000 | 4.9% |
. | 10000 | 4.9% |
j | 10000 | 4.9% |
p | 10000 | 4.9% |
3 | 6568 | 3.2% |
Other values (34) | 58194 |
확장자
Categorical
IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
jpg | |
---|---|
peg | 1 |
Length
Max length | 3 |
---|---|
Median length | 3 |
Mean length | 3 |
Min length | 3 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | jpg |
---|---|
2nd row | jpg |
3rd row | jpg |
4th row | jpg |
5th row | jpg |
Common Values
Value | Count | Frequency (%) |
jpg | 9999 | |
peg | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
jpg | 9999 | |
peg | 1 | < 0.1% |
저장경로 | 확장자 | |
---|---|---|
저장경로 | 1.000 | 0.000 |
확장자 | 0.000 | 1.000 |
확장자 | 저장경로 | |
---|---|---|
확장자 | 1.000 | 0.000 |
저장경로 | 0.000 | 1.000 |
저장경로 | 확장자 | |
---|---|---|
저장경로 | 1.000 | 0.000 |
확장자 | 0.000 | 1.000 |
저장경로 | 이미지파일이름 | 확장자 | |
---|---|---|---|
45791 | /certInfo/2020/09 | B531R549-20001_1.jpg | jpg |
90314 | /certInfo/2021/12 | CB015S0022-1012_1.jpg | jpg |
70548 | /certInfo/2021/06 | XU101439-21025A_1.jpg | jpg |
56297 | /certInfo/2021/02 | ZU10322-21005_1.jpg | jpg |
68730 | /certInfo/2021/06 | CA021A001-7004_2.jpg | jpg |
10059 | /certInfo/2020/02 | JH11983-20001_1.jpg | jpg |
74869 | /certInfo/2021/07 | CA021H004-1067_1.jpg | jpg |
96423 | /certInfo/2022/02 | CB015R0919-1016_2.jpg | jpg |
88503 | /certInfo/2021/12 | CB014R1624-1004_1.jpg | jpg |
29024 | /certInfo/2020/06 | CB065R0561-9002A_1.jpg | jpg |
저장경로 | 이미지파일이름 | 확장자 | |
---|---|---|---|
49838 | /certInfo/2020/09 | HU071858-20018A_1.jpg | jpg |
12962 | /certInfo/2020/02 | CB061R5321-0002_1.jpg | jpg |
96649 | /certInfo/2022/02 | B675R0072-21001_1.jpg | jpg |
65325 | /certInfo/2021/05 | CB064R2790-1001_2.jpg | jpg |
37187 | /certInfo/2020/07 | CB061R169-5004B_4.jpg | jpg |
77728 | /certInfo/2021/08 | CB061T082-1002_3.jpg | jpg |
69648 | /certInfo/2021/06 | CA023H115-0048A_2.jpg | jpg |
28253 | /certInfo/2020/06 | ZU101167-20001_1.jpg | jpg |
14264 | /certInfo/2020/03 | B193R017-9002B_1.jpg | jpg |
62136 | /certInfo/2021/04 | CB061R6134-1001_1.jpg | jpg |