Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory390.6 KiB
Average record size in memory40.0 B

Variable types

Text3
Categorical1

Dataset

Description김해시 AI기반 대형생활폐기물 학습데이터를 통해 빅데이터를 활용하여 정책결정, 업무개선의 기반 마련
Author경상남도 김해시
URLhttps://www.data.go.kr/data/15076741/fileData.do

Alerts

파일명 has unique valuesUnique

Reproduction

Analysis started2024-03-11 03:35:06.954007
Analysis finished2024-03-11 03:35:08.541655
Duration1.59 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

파일명
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:35:08.677213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length25
Mean length25.0035
Min length25

Characters and Unicode

Total characters250035
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row35554487_7592616ad1_1.jpg
2nd row35571127_75d0e94a97_1.jpg
3rd row33984936_92d1335a26_6.jpg
4th row33977908_79062b7092_3.jpg
5th row34004978_1773280082_1.jpg
ValueCountFrequency (%)
35554487_7592616ad1_1.jpg 1
 
< 0.1%
35539984_0d2514035c_3.jpg 1
 
< 0.1%
33978840_27c1c3df01_3.jpg 1
 
< 0.1%
35539213_43dd4ea640_2.jpg 1
 
< 0.1%
33999510_526fbcf087_5.jpg 1
 
< 0.1%
35566839_e75edc18a4_3.jpg 1
 
< 0.1%
35541448_aaa1bb097d_1.jpg 1
 
< 0.1%
33999417_a198cfea2a_1.jpg 1
 
< 0.1%
33977384_9c2737bf7b_2.jpg 1
 
< 0.1%
34014339_547948cc06_3.jpg 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-03-11T12:35:09.019270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 28791
 
11.5%
_ 20000
 
8.0%
5 18151
 
7.3%
9 16899
 
6.8%
4 13811
 
5.5%
1 13562
 
5.4%
2 12869
 
5.1%
7 12598
 
5.0%
8 12159
 
4.9%
0 12089
 
4.8%
Other values (11) 89106
35.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 152259
60.9%
Lowercase Letter 67776
27.1%
Connector Punctuation 20000
 
8.0%
Other Punctuation 10000
 
4.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 28791
18.9%
5 18151
11.9%
9 16899
11.1%
4 13811
9.1%
1 13562
8.9%
2 12869
8.5%
7 12598
8.3%
8 12159
8.0%
0 12089
7.9%
6 11330
 
7.4%
Lowercase Letter
ValueCountFrequency (%)
j 10000
14.8%
p 10000
14.8%
g 10000
14.8%
c 6448
9.5%
f 6348
9.4%
a 6314
9.3%
b 6304
9.3%
e 6261
9.2%
d 6101
9.0%
Connector Punctuation
ValueCountFrequency (%)
_ 20000
100.0%
Other Punctuation
ValueCountFrequency (%)
. 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 182259
72.9%
Latin 67776
 
27.1%

Most frequent character per script

Common
ValueCountFrequency (%)
3 28791
15.8%
_ 20000
11.0%
5 18151
10.0%
9 16899
9.3%
4 13811
7.6%
1 13562
7.4%
2 12869
7.1%
7 12598
6.9%
8 12159
6.7%
0 12089
6.6%
Other values (2) 21330
11.7%
Latin
ValueCountFrequency (%)
j 10000
14.8%
p 10000
14.8%
g 10000
14.8%
c 6448
9.5%
f 6348
9.4%
a 6314
9.3%
b 6304
9.3%
e 6261
9.2%
d 6101
9.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 250035
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 28791
 
11.5%
_ 20000
 
8.0%
5 18151
 
7.3%
9 16899
 
6.8%
4 13811
 
5.5%
1 13562
 
5.4%
2 12869
 
5.1%
7 12598
 
5.0%
8 12159
 
4.9%
0 12089
 
4.8%
Other values (11) 89106
35.6%
Distinct90
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:35:09.287754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length13
Mean length3.9681
Min length1

Characters and Unicode

Total characters39681
Distinct characters155
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st row텔레비전
2nd row의자
3rd row항아리
4th row항아리
5th row자전거
ValueCountFrequency (%)
의자 2161
21.6%
텔레비전 937
 
9.4%
공기청정기및가습기 703
 
7.0%
에어컨및온풍기 612
 
6.1%
청소기 516
 
5.2%
소파 500
 
5.0%
426
 
4.3%
실내조명등기구 385
 
3.9%
시계 322
 
3.2%
가방 298
 
3.0%
Other values (80) 3140
31.4%
2024-03-11T12:35:09.640679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4613
 
11.6%
2415
 
6.1%
2233
 
5.6%
1753
 
4.4%
1363
 
3.4%
1318
 
3.3%
1219
 
3.1%
1144
 
2.9%
1036
 
2.6%
1024
 
2.6%
Other values (145) 21563
54.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 37870
95.4%
Open Punctuation 536
 
1.4%
Close Punctuation 536
 
1.4%
Other Punctuation 468
 
1.2%
Uppercase Letter 271
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4613
 
12.2%
2415
 
6.4%
2233
 
5.9%
1753
 
4.6%
1363
 
3.6%
1318
 
3.5%
1219
 
3.2%
1144
 
3.0%
1036
 
2.7%
1024
 
2.7%
Other values (138) 19752
52.2%
Uppercase Letter
ValueCountFrequency (%)
V 132
48.7%
T 125
46.1%
C 7
 
2.6%
P 7
 
2.6%
Open Punctuation
ValueCountFrequency (%)
( 536
100.0%
Close Punctuation
ValueCountFrequency (%)
) 536
100.0%
Other Punctuation
ValueCountFrequency (%)
, 468
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 37870
95.4%
Common 1540
 
3.9%
Latin 271
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4613
 
12.2%
2415
 
6.4%
2233
 
5.9%
1753
 
4.6%
1363
 
3.6%
1318
 
3.5%
1219
 
3.2%
1144
 
3.0%
1036
 
2.7%
1024
 
2.7%
Other values (138) 19752
52.2%
Latin
ValueCountFrequency (%)
V 132
48.7%
T 125
46.1%
C 7
 
2.6%
P 7
 
2.6%
Common
ValueCountFrequency (%)
( 536
34.8%
) 536
34.8%
, 468
30.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 37870
95.4%
ASCII 1811
 
4.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4613
 
12.2%
2415
 
6.4%
2233
 
5.9%
1753
 
4.6%
1363
 
3.6%
1318
 
3.5%
1219
 
3.2%
1144
 
3.0%
1036
 
2.7%
1024
 
2.7%
Other values (138) 19752
52.2%
ASCII
ValueCountFrequency (%)
( 536
29.6%
) 536
29.6%
, 468
25.8%
V 132
 
7.3%
T 125
 
6.9%
C 7
 
0.4%
P 7
 
0.4%
Distinct161
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-11T12:35:09.850916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length21
Mean length11.0357
Min length6

Characters and Unicode

Total characters110357
Distinct characters235
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)0.1%

Sample

1st row텔레비전_30인치이상
2nd row의자_사무용
3rd row항아리_7리터이상
4th row항아리_7리터이상
5th row자전거_성인용
ValueCountFrequency (%)
의자_편의용(안락,흔들,식탁 900
 
9.0%
텔레비전_30인치이상 822
 
8.2%
의자_사무용 779
 
7.8%
공기청정기및가습기_높이1m미만 567
 
5.7%
의자_보조,간이 482
 
4.8%
청소기_가정용(모든규격 431
 
4.3%
시계_벽걸이용 320
 
3.2%
상_4인용미만 300
 
3.0%
에어컨및온풍기_1.0㎡이상 278
 
2.8%
소파_3인용이상 261
 
2.6%
Other values (151) 4860
48.6%
2024-03-11T12:35:10.188743image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 10000
 
9.1%
4820
 
4.4%
4401
 
4.0%
4140
 
3.8%
3133
 
2.8%
3023
 
2.7%
, 2839
 
2.6%
0 2524
 
2.3%
) 2488
 
2.3%
( 2488
 
2.3%
Other values (225) 70501
63.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 81029
73.4%
Connector Punctuation 10000
 
9.1%
Decimal Number 7743
 
7.0%
Other Punctuation 3610
 
3.3%
Close Punctuation 2488
 
2.3%
Open Punctuation 2488
 
2.3%
Other Symbol 1469
 
1.3%
Lowercase Letter 1224
 
1.1%
Uppercase Letter 271
 
0.2%
Math Symbol 35
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4820
 
5.9%
4401
 
5.4%
4140
 
5.1%
3133
 
3.9%
3023
 
3.7%
2467
 
3.0%
2266
 
2.8%
2096
 
2.6%
2075
 
2.6%
1908
 
2.4%
Other values (196) 50700
62.6%
Decimal Number
ValueCountFrequency (%)
0 2524
32.6%
1 1908
24.6%
3 1504
19.4%
4 622
 
8.0%
9 383
 
4.9%
5 333
 
4.3%
2 292
 
3.8%
8 112
 
1.4%
7 65
 
0.8%
Other Symbol
ValueCountFrequency (%)
728
49.6%
491
33.4%
240
 
16.3%
7
 
0.5%
3
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
V 132
48.7%
T 125
46.1%
P 7
 
2.6%
C 7
 
2.6%
Other Punctuation
ValueCountFrequency (%)
, 2839
78.6%
. 762
 
21.1%
· 9
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
m 960
78.4%
212
 
17.3%
c 52
 
4.2%
Math Symbol
ValueCountFrequency (%)
+ 28
80.0%
× 7
 
20.0%
Connector Punctuation
ValueCountFrequency (%)
_ 10000
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2488
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2488
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 81029
73.4%
Common 28045
 
25.4%
Latin 1283
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4820
 
5.9%
4401
 
5.4%
4140
 
5.1%
3133
 
3.9%
3023
 
3.7%
2467
 
3.0%
2266
 
2.8%
2096
 
2.6%
2075
 
2.6%
1908
 
2.4%
Other values (196) 50700
62.6%
Common
ValueCountFrequency (%)
_ 10000
35.7%
, 2839
 
10.1%
0 2524
 
9.0%
) 2488
 
8.9%
( 2488
 
8.9%
1 1908
 
6.8%
3 1504
 
5.4%
. 762
 
2.7%
728
 
2.6%
4 622
 
2.2%
Other values (13) 2182
 
7.8%
Latin
ValueCountFrequency (%)
m 960
74.8%
V 132
 
10.3%
T 125
 
9.7%
c 52
 
4.1%
P 7
 
0.5%
C 7
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 81029
73.4%
ASCII 27631
 
25.0%
CJK Compat 1469
 
1.3%
Letterlike Symbols 212
 
0.2%
None 16
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 10000
36.2%
, 2839
 
10.3%
0 2524
 
9.1%
) 2488
 
9.0%
( 2488
 
9.0%
1 1908
 
6.9%
3 1504
 
5.4%
m 960
 
3.5%
. 762
 
2.8%
4 622
 
2.3%
Other values (11) 1536
 
5.6%
Hangul
ValueCountFrequency (%)
4820
 
5.9%
4401
 
5.4%
4140
 
5.1%
3133
 
3.9%
3023
 
3.7%
2467
 
3.0%
2266
 
2.8%
2096
 
2.6%
2075
 
2.6%
1908
 
2.4%
Other values (196) 50700
62.6%
CJK Compat
ValueCountFrequency (%)
728
49.6%
491
33.4%
240
 
16.3%
7
 
0.5%
3
 
0.2%
Letterlike Symbols
ValueCountFrequency (%)
212
100.0%
None
ValueCountFrequency (%)
· 9
56.2%
× 7
43.8%

등급구분
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
A
4046 
B
3040 
C
1960 
D
954 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowC
4th rowC
5th rowC

Common Values

ValueCountFrequency (%)
A 4046
40.5%
B 3040
30.4%
C 1960
19.6%
D 954
 
9.5%

Length

2024-03-11T12:35:10.292815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-11T12:35:10.375917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
a 4046
40.5%
b 3040
30.4%
c 1960
19.6%
d 954
 
9.5%

Correlations

2024-03-11T12:35:10.432383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류등급구분
대분류1.0000.981
등급구분0.9811.000

Missing values

2024-03-11T12:35:08.384115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-11T12:35:08.501057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

파일명대분류소분류등급구분
1173935554487_7592616ad1_1.jpg텔레비전텔레비전_30인치이상A
781235571127_75d0e94a97_1.jpg의자의자_사무용A
2616533984936_92d1335a26_6.jpg항아리항아리_7리터이상C
2641733977908_79062b7092_3.jpg항아리항아리_7리터이상C
2582034004978_1773280082_1.jpg자전거자전거_성인용C
2617333976461_6ac59238cb_2.jpg청소기청소기_가정용(모든규격)C
2272233993808_e6e45f5f3a_6.jpg의료기의료기_일반C
206635563876_9bf6d4d7e5_5.jpg에어컨및온풍기에어컨및온풍기_1.0㎡이상A
1036733975548_1c8ecc405e_3.jpg의자의자_사무용A
2791033969017_cbbbb83434_2.jpg소화기소화기_3.5㎏이하(약제기준)D
파일명대분류소분류등급구분
65935549193_27060a28e0_3.jpg침대침대_2인용(일반)A
32834005481_427f0650f3_1.jpg에어컨및온풍기에어컨및온풍기_1.0㎡이상A
162133994617_b5c3920d7d_1.jpg의자의자_편의용(안락,흔들,식탁)A
1176133970323_b97faac8eb_1.jpg의자의자_사무용A
357033986902_c2f03d733b_2.jpg의자의자_편의용(안락,흔들,식탁)A
2697233992801_ea05e4b3fb_6.jpg청소기청소기_가정용(모든규격)C
2513033970819_c45416e2fd_3.jpg시계시계_벽걸이용C
506633968831_431edcd41d_3.jpg의자의자_편의용(안락,흔들,식탁)A
710035554734_f77853c7c2_5.jpg의자의자_사무용A
1342233982911_88df1a021a_2.jpg피아노피아노_어프라이트B