Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text3
Categorical1

Dataset

Description김해시 AI기반 대형생활폐기물 학습데이터를 통해 빅데이터를 활용하여 정책결정, 업무개선의 기반 마련
Author경상남도 김해시
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15076741

Alerts

파일명 has unique valuesUnique

Reproduction

Analysis started2024-03-11 03:33:14.232121
Analysis finished2024-03-11 03:33:15.712207
Duration1.48 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

파일명
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:33:15.842029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length25
Mean length25.0054
Min length25

Characters and Unicode

Total characters250054
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row33980545_b4b664be76_3.jpg
2nd row33981417_8d3ceec2d9_1.jpg
3rd row33987212_e3e8135854_2.jpg
4th row33983923_62836e2906_1.jpg
5th row33983829_b0f05d4615_3.jpg
ValueCountFrequency (%)
33980545_b4b664be76_3.jpg 1
 
< 0.1%
35564736_75b63e21d8_2.jpg 1
 
< 0.1%
33983241_114041329a_3.jpg 1
 
< 0.1%
34002711_dd0d7d3b66_3.jpg 1
 
< 0.1%
33990188_bb378f05cb_3.jpg 1
 
< 0.1%
34013509_5e81b69731_1.jpg 1
 
< 0.1%
33988095_d5eee9aaab_1.jpg 1
 
< 0.1%
34014785_0b0f07c4a2_4.jpg 1
 
< 0.1%
33976117_bd3c8e2bc9_2.jpg 1
 
< 0.1%
33983002_e6e876ffa3_2.jpg 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-03-11T12:33:16.127865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 28933
 
11.6%
_ 20000
 
8.0%
5 18078
 
7.2%
9 16901
 
6.8%
1 13684
 
5.5%
4 13564
 
5.4%
2 12933
 
5.2%
7 12770
 
5.1%
0 12329
 
4.9%
8 12072
 
4.8%
Other values (11) 88790
35.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 152617
61.0%
Lowercase Letter 67437
27.0%
Connector Punctuation 20000
 
8.0%
Other Punctuation 10000
 
4.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 28933
19.0%
5 18078
11.8%
9 16901
11.1%
1 13684
9.0%
4 13564
8.9%
2 12933
8.5%
7 12770
8.4%
0 12329
8.1%
8 12072
7.9%
6 11353
 
7.4%
Lowercase Letter
ValueCountFrequency (%)
j 10000
14.8%
p 10000
14.8%
g 10000
14.8%
a 6401
9.5%
c 6280
9.3%
f 6256
9.3%
b 6238
9.3%
e 6207
9.2%
d 6055
9.0%
Connector Punctuation
ValueCountFrequency (%)
_ 20000
100.0%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 182617
73.0%
Latin 67437
 
27.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 28933
15.8%
_ 20000
11.0%
5 18078
9.9%
9 16901
9.3%
1 13684
7.5%
4 13564
7.4%
2 12933
7.1%
7 12770
7.0%
0 12329
6.8%
8 12072
6.6%
Other values (2) 21353
11.7%
Latin
ValueCountFrequency (%)
j 10000
14.8%
p 10000
14.8%
g 10000
14.8%
a 6401
9.5%
c 6280
9.3%
f 6256
9.3%
b 6238
9.3%
e 6207
9.2%
d 6055
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 250054
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 28933
 
11.6%
_ 20000
 
8.0%
5 18078
 
7.2%
9 16901
 
6.8%
1 13684
 
5.5%
4 13564
 
5.4%
2 12933
 
5.2%
7 12770
 
5.1%
0 12329
 
4.9%
8 12072
 
4.8%
Other values (11) 88790
35.5%
Distinct90
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:33:16.358147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length13
Mean length4.0069
Min length1

Characters and Unicode

Total characters40069
Distinct characters156
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row청소기
2nd row냉장고
3rd row
4th row청소기
5th row공기청정기및가습기
ValueCountFrequency (%)
의자 2128
21.3%
텔레비전 902
 
9.0%
공기청정기및가습기 687
 
6.9%
에어컨및온풍기 627
 
6.3%
소파 523
 
5.2%
청소기 484
 
4.8%
392
 
3.9%
실내조명등기구 365
 
3.6%
시계 339
 
3.4%
가방 285
 
2.9%
Other values (80) 3268
32.7%
2024-03-11T12:33:16.667637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4549
 
11.4%
2412
 
6.0%
2209
 
5.5%
1876
 
4.7%
1343
 
3.4%
1316
 
3.3%
1171
 
2.9%
1127
 
2.8%
1010
 
2.5%
995
 
2.5%
Other values (146) 22061
55.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 38085
95.0%
Close Punctuation 597
 
1.5%
Open Punctuation 597
 
1.5%
Other Punctuation 510
 
1.3%
Uppercase Letter 280
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4549
 
11.9%
2412
 
6.3%
2209
 
5.8%
1876
 
4.9%
1343
 
3.5%
1316
 
3.5%
1171
 
3.1%
1127
 
3.0%
1010
 
2.7%
995
 
2.6%
Other values (139) 20077
52.7%
Uppercase Letter
ValueCountFrequency (%)
V 135
48.2%
T 125
44.6%
P 10
 
3.6%
C 10
 
3.6%
Close Punctuation
ValueCountFrequency (%)
) 597
100.0%
Open Punctuation
ValueCountFrequency (%)
( 597
100.0%
Other Punctuation
ValueCountFrequency (%)
, 510
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 38085
95.0%
Common 1704
 
4.3%
Latin 280
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4549
 
11.9%
2412
 
6.3%
2209
 
5.8%
1876
 
4.9%
1343
 
3.5%
1316
 
3.5%
1171
 
3.1%
1127
 
3.0%
1010
 
2.7%
995
 
2.6%
Other values (139) 20077
52.7%
Latin
ValueCountFrequency (%)
V 135
48.2%
T 125
44.6%
P 10
 
3.6%
C 10
 
3.6%
Common
ValueCountFrequency (%)
) 597
35.0%
( 597
35.0%
, 510
29.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 38085
95.0%
ASCII 1984
 
5.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4549
 
11.9%
2412
 
6.3%
2209
 
5.8%
1876
 
4.9%
1343
 
3.5%
1316
 
3.5%
1171
 
3.1%
1127
 
3.0%
1010
 
2.7%
995
 
2.6%
Other values (139) 20077
52.7%
ASCII
ValueCountFrequency (%)
) 597
30.1%
( 597
30.1%
, 510
25.7%
V 135
 
6.8%
T 125
 
6.3%
P 10
 
0.5%
C 10
 
0.5%
Distinct159
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:33:16.863587image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length21
Mean length11.0185
Min length6

Characters and Unicode

Total characters110185
Distinct characters235
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)0.1%

Sample

1st row청소기_가정용(모든규격)
2nd row냉장고_500ℓ이상
3rd row상_4인용미만
4th row청소기_가정용(모든규격)
5th row공기청정기및가습기_높이1m미만
ValueCountFrequency (%)
의자_편의용(안락,흔들,식탁 843
 
8.4%
의자_사무용 785
 
7.8%
텔레비전_30인치이상 779
 
7.8%
공기청정기및가습기_높이1m미만 548
 
5.5%
의자_보조,간이 500
 
5.0%
청소기_가정용(모든규격 407
 
4.1%
시계_벽걸이용 337
 
3.4%
에어컨및온풍기_1.0㎡이상 300
 
3.0%
상_4인용미만 282
 
2.8%
소파_3인용이상 272
 
2.7%
Other values (149) 4947
49.5%
2024-03-11T12:33:17.151737image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 10000
 
9.1%
4751
 
4.3%
4475
 
4.1%
4062
 
3.7%
3052
 
2.8%
2989
 
2.7%
, 2785
 
2.5%
0 2598
 
2.4%
2466
 
2.2%
) 2457
 
2.2%
Other values (225) 70550
64.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 80829
73.4%
Connector Punctuation 10000
 
9.1%
Decimal Number 7805
 
7.1%
Other Punctuation 3557
 
3.2%
Close Punctuation 2457
 
2.2%
Open Punctuation 2457
 
2.2%
Other Symbol 1506
 
1.4%
Lowercase Letter 1263
 
1.1%
Uppercase Letter 280
 
0.3%
Math Symbol 31
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4751
 
5.9%
4475
 
5.5%
4062
 
5.0%
3052
 
3.8%
2989
 
3.7%
2466
 
3.1%
2230
 
2.8%
2212
 
2.7%
2030
 
2.5%
1919
 
2.4%
Other values (196) 50643
62.7%
Decimal Number
ValueCountFrequency (%)
0 2598
33.3%
1 1930
24.7%
3 1478
18.9%
4 593
 
7.6%
9 403
 
5.2%
5 337
 
4.3%
2 275
 
3.5%
8 115
 
1.5%
7 76
 
1.0%
Other Symbol
ValueCountFrequency (%)
761
50.5%
498
33.1%
235
 
15.6%
10
 
0.7%
2
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
V 135
48.2%
T 125
44.6%
C 10
 
3.6%
P 10
 
3.6%
Other Punctuation
ValueCountFrequency (%)
, 2785
78.3%
. 762
 
21.4%
· 10
 
0.3%
Lowercase Letter
ValueCountFrequency (%)
m 957
75.8%
252
 
20.0%
c 54
 
4.3%
Math Symbol
ValueCountFrequency (%)
+ 21
67.7%
× 10
32.3%
Connector Punctuation
ValueCountFrequency (%)
_ 10000
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2457
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2457
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 80829
73.4%
Common 28065
 
25.5%
Latin 1291
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4751
 
5.9%
4475
 
5.5%
4062
 
5.0%
3052
 
3.8%
2989
 
3.7%
2466
 
3.1%
2230
 
2.8%
2212
 
2.7%
2030
 
2.5%
1919
 
2.4%
Other values (196) 50643
62.7%
Common
ValueCountFrequency (%)
_ 10000
35.6%
, 2785
 
9.9%
0 2598
 
9.3%
) 2457
 
8.8%
( 2457
 
8.8%
1 1930
 
6.9%
3 1478
 
5.3%
. 762
 
2.7%
761
 
2.7%
4 593
 
2.1%
Other values (13) 2244
 
8.0%
Latin
ValueCountFrequency (%)
m 957
74.1%
V 135
 
10.5%
T 125
 
9.7%
c 54
 
4.2%
C 10
 
0.8%
P 10
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 80829
73.4%
ASCII 27578
 
25.0%
CJK Compat 1506
 
1.4%
Letterlike Symbols 252
 
0.2%
None 20
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 10000
36.3%
, 2785
 
10.1%
0 2598
 
9.4%
) 2457
 
8.9%
( 2457
 
8.9%
1 1930
 
7.0%
3 1478
 
5.4%
m 957
 
3.5%
. 762
 
2.8%
4 593
 
2.2%
Other values (11) 1561
 
5.7%
Hangul
ValueCountFrequency (%)
4751
 
5.9%
4475
 
5.5%
4062
 
5.0%
3052
 
3.8%
2989
 
3.7%
2466
 
3.1%
2230
 
2.8%
2212
 
2.7%
2030
 
2.5%
1919
 
2.4%
Other values (196) 50643
62.7%
CJK Compat
ValueCountFrequency (%)
761
50.5%
498
33.1%
235
 
15.6%
10
 
0.7%
2
 
0.1%
Letterlike Symbols
ValueCountFrequency (%)
252
100.0%
None
ValueCountFrequency (%)
· 10
50.0%
× 10
50.0%

등급구분
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
A
4026 
B
3008 
C
1971 
D
995 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowA
3rd rowB
4th rowC
5th rowB

Common Values

ValueCountFrequency (%)
A 4026
40.3%
B 3008
30.1%
C 1971
19.7%
D 995
 
10.0%

Length

2024-03-11T12:33:17.263696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-11T12:33:17.357374image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
a 4026
40.3%
b 3008
30.1%
c 1971
19.7%
d 995
 
10.0%

Correlations

2024-03-11T12:33:17.412072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류등급구분
대분류1.0000.981
등급구분0.9811.000

Missing values

2024-03-11T12:33:15.557216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-11T12:33:15.659140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

파일명대분류소분류등급구분
2150233980545_b4b664be76_3.jpg청소기청소기_가정용(모든규격)C
396133981417_8d3ceec2d9_1.jpg냉장고냉장고_500ℓ이상A
1211133987212_e3e8135854_2.jpg상_4인용미만B
2490033983923_62836e2906_1.jpg청소기청소기_가정용(모든규격)C
1437233983829_b0f05d4615_3.jpg공기청정기및가습기공기청정기및가습기_높이1m미만B
232534009577_57b153a361_2.jpg의자의자_사무용A
2162733980211_043aee3dd9_3.jpg청소기청소기_가정용(모든규격)C
2620233979785_b304ead8c7_1.jpg자전거자전거_성인용C
2801833971403_d3e55c1525_1.jpg전기밥솥전기밥솥_모든규격D
2140833989366_c8a4015079_3.jpg수족관수족관_가로90㎝이상C
파일명대분류소분류등급구분
1154835567368_33497162c8_1.jpg냉장고냉장고_500ℓ이상A
2173133989179_62ab54fa42_2.jpg청소기청소기_가정용(모든규격)C
2541434003792_50d99af3ea_3.jpg온장고온장고_높이50cm미만C
730535538970_6538b18daf_1.jpg의자의자_편의용(안락,흔들,식탁)A
2050333981682_64bb7b2ea4_3.jpg상_4인용미만B
2843733977102_9caa3283f1_5.jpg쌀통쌀통_모든규격D
1512034002698_7875a8212c_1.jpg에어컨및온풍기에어컨및온풍기_0.5㎡미만B
844335546946_feb337d611_2.jpg텔레비전텔레비전_30인치이상A
2191933990508_8a79efb1ed_3.jpg청소기청소기_가정용(모든규격)C
795835553648_3f5aae1918_3.jpg진열장(장식장,책장,찬장)진열장(장식장,책장,찬장)_가로90㎝미만A