Overview

Dataset statistics

Number of variables3
Number of observations3999
Missing cells0
Missing cells (%)0.0%
Duplicate rows104
Duplicate rows (%)2.6%
Total size in memory93.9 KiB
Average record size in memory24.0 B

Variable types

Text1
Categorical2

Dataset

Description보건소 모바일 헬스케어 모바일 앱에서 식사일기 입력 시 활용하는 식품 목록 데이터로 음식명, 카테고리, 섭취단위를 제공합니다.
Author한국건강증진개발원
URLhttps://www.data.go.kr/data/15068785/fileData.do

Alerts

Dataset has 104 (2.6%) duplicate rowsDuplicates
섭취단위 is highly imbalanced (86.2%)Imbalance

Reproduction

Analysis started2023-12-12 14:38:41.242815
Analysis finished2023-12-12 14:38:41.899926
Duration0.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct3862
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
2023-12-12T23:38:42.518615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length68
Median length45
Mean length11.691173
Min length2

Characters and Unicode

Total characters46753
Distinct characters920
Distinct categories14 ?
Distinct scripts4 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3754 ?
Unique (%)93.9%

Sample

1st row팥죽
2nd row해산물을 곁들인 매콤한 토마토 스튜
3rd row호박죽
4th row흑임자 두부 죽
5th rowOh!징어버거(삼강)
ValueCountFrequency (%)
생것 170
 
1.9%
유기농 68
 
0.8%
풀무원 65
 
0.7%
초콜릿 62
 
0.7%
데친것 58
 
0.6%
미니 58
 
0.6%
마른것 50
 
0.6%
청정원 44
 
0.5%
오렌지 34
 
0.4%
우유 33
 
0.4%
Other values (4573) 8300
92.8%
2023-12-12T23:38:43.115940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4947
 
10.6%
1162
 
2.5%
, 1038
 
2.2%
) 1035
 
2.2%
( 1026
 
2.2%
841
 
1.8%
789
 
1.7%
627
 
1.3%
551
 
1.2%
550
 
1.2%
Other values (910) 34187
73.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36097
77.2%
Space Separator 4947
 
10.6%
Other Punctuation 1279
 
2.7%
Close Punctuation 1046
 
2.2%
Open Punctuation 1037
 
2.2%
Decimal Number 834
 
1.8%
Uppercase Letter 673
 
1.4%
Lowercase Letter 463
 
1.0%
Connector Punctuation 154
 
0.3%
Other Symbol 142
 
0.3%
Other values (4) 81
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1162
 
3.2%
841
 
2.3%
789
 
2.2%
627
 
1.7%
551
 
1.5%
550
 
1.5%
549
 
1.5%
503
 
1.4%
492
 
1.4%
487
 
1.3%
Other values (827) 29546
81.9%
Uppercase Letter
ValueCountFrequency (%)
S 78
 
11.6%
C 50
 
7.4%
L 44
 
6.5%
M 44
 
6.5%
O 41
 
6.1%
A 40
 
5.9%
B 37
 
5.5%
R 37
 
5.5%
I 37
 
5.5%
E 35
 
5.2%
Other values (15) 230
34.2%
Lowercase Letter
ValueCountFrequency (%)
g 89
19.2%
a 43
 
9.3%
l 36
 
7.8%
e 28
 
6.0%
o 27
 
5.8%
s 26
 
5.6%
t 20
 
4.3%
i 17
 
3.7%
c 17
 
3.7%
n 17
 
3.7%
Other values (14) 143
30.9%
Other Punctuation
ValueCountFrequency (%)
, 1038
81.2%
& 116
 
9.1%
. 42
 
3.3%
% 27
 
2.1%
* 27
 
2.1%
/ 13
 
1.0%
; 5
 
0.4%
' 3
 
0.2%
? 2
 
0.2%
· 2
 
0.2%
Other values (2) 4
 
0.3%
Decimal Number
ValueCountFrequency (%)
0 263
31.5%
1 159
19.1%
2 107
12.8%
3 88
 
10.6%
5 56
 
6.7%
4 42
 
5.0%
9 37
 
4.4%
6 30
 
3.6%
8 29
 
3.5%
7 23
 
2.8%
Close Punctuation
ValueCountFrequency (%)
) 1035
98.9%
] 11
 
1.1%
Open Punctuation
ValueCountFrequency (%)
( 1026
98.9%
[ 11
 
1.1%
Other Symbol
ValueCountFrequency (%)
141
99.3%
1
 
0.7%
Space Separator
ValueCountFrequency (%)
4947
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 154
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 65
100.0%
Math Symbol
ValueCountFrequency (%)
+ 10
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 5
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36227
77.5%
Common 9378
 
20.1%
Latin 1137
 
2.4%
Han 11
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1162
 
3.2%
841
 
2.3%
789
 
2.2%
627
 
1.7%
551
 
1.5%
550
 
1.5%
549
 
1.5%
503
 
1.4%
492
 
1.4%
487
 
1.3%
Other values (823) 29676
81.9%
Latin
ValueCountFrequency (%)
g 89
 
7.8%
S 78
 
6.9%
C 50
 
4.4%
L 44
 
3.9%
M 44
 
3.9%
a 43
 
3.8%
O 41
 
3.6%
A 40
 
3.5%
B 37
 
3.3%
R 37
 
3.3%
Other values (40) 634
55.8%
Common
ValueCountFrequency (%)
4947
52.8%
, 1038
 
11.1%
) 1035
 
11.0%
( 1026
 
10.9%
0 263
 
2.8%
1 159
 
1.7%
_ 154
 
1.6%
& 116
 
1.2%
2 107
 
1.1%
3 88
 
0.9%
Other values (22) 445
 
4.7%
Han
ValueCountFrequency (%)
5
45.5%
2
 
18.2%
2
 
18.2%
1
 
9.1%
1
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36086
77.2%
ASCII 10509
 
22.5%
None 145
 
0.3%
CJK 11
 
< 0.1%
Letterlike Symbols 1
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4947
47.1%
, 1038
 
9.9%
) 1035
 
9.8%
( 1026
 
9.8%
0 263
 
2.5%
1 159
 
1.5%
_ 154
 
1.5%
& 116
 
1.1%
2 107
 
1.0%
g 89
 
0.8%
Other values (68) 1575
 
15.0%
Hangul
ValueCountFrequency (%)
1162
 
3.2%
841
 
2.3%
789
 
2.2%
627
 
1.7%
551
 
1.5%
550
 
1.5%
549
 
1.5%
503
 
1.4%
492
 
1.4%
487
 
1.3%
Other values (822) 29535
81.8%
None
ValueCountFrequency (%)
141
97.2%
· 2
 
1.4%
2
 
1.4%
CJK
ValueCountFrequency (%)
5
45.5%
2
 
18.2%
2
 
18.2%
1
 
9.1%
1
 
9.1%
Letterlike Symbols
ValueCountFrequency (%)
1
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%

카테고리
Categorical

Distinct36
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
과자류
1050 
즉석섭취·편의식품류
469 
코코아가공품류 또는 초콜릿류
449 
과일·채소류음료
394 
채소류
297 
Other values (31)
1340 

Length

Max length15
Median length9
Mean length5.9217304
Min length2

Unique

Unique6 ?
Unique (%)0.2%

Sample

1st row죽 및 스프류
2nd row죽 및 스프류
3rd row죽 및 스프류
4th row죽 및 스프류
5th row즉석섭취·편의식품류

Common Values

ValueCountFrequency (%)
과자류 1050
26.3%
즉석섭취·편의식품류 469
11.7%
코코아가공품류 또는 초콜릿류 449
11.2%
과일·채소류음료 394
 
9.9%
채소류 297
 
7.4%
가공유류 288
 
7.2%
기타 279
 
7.0%
피자 118
 
3.0%
탄산음료류 109
 
2.7%
커피 84
 
2.1%
Other values (26) 462
11.6%

Length

2023-12-12T23:38:43.332826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
과자류 1050
20.6%
즉석섭취·편의식품류 469
9.2%
코코아가공품류 449
8.8%
또는 449
8.8%
초콜릿류 449
8.8%
과일·채소류음료 394
 
7.7%
채소류 297
 
5.8%
가공유류 288
 
5.7%
기타 279
 
5.5%
피자 118
 
2.3%
Other values (37) 851
16.7%

섭취단위
Categorical

IMBALANCE 

Distinct13
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size31.4 KiB
3708 
 
139
인분
 
109
 
14
큰술
 
7
Other values (8)
 
22

Length

Max length4
Median length1
Mean length1.0310078
Min length1

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
3708
92.7%
139
 
3.5%
인분 109
 
2.7%
14
 
0.4%
큰술 7
 
0.2%
6
 
0.2%
5
 
0.1%
3
 
0.1%
2
 
0.1%
작은접시 2
 
0.1%
Other values (3) 4
 
0.1%

Length

2023-12-12T23:38:43.466276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
3708
92.7%
139
 
3.5%
인분 109
 
2.7%
14
 
0.4%
큰술 7
 
0.2%
6
 
0.1%
5
 
0.1%
3
 
0.1%
3
 
0.1%
2
 
< 0.1%
Other values (3) 4
 
0.1%

Correlations

2023-12-12T23:38:43.558097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리섭취단위
카테고리1.0000.806
섭취단위0.8061.000
2023-12-12T23:38:43.637780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리섭취단위
카테고리1.0000.397
섭취단위0.3971.000
2023-12-12T23:38:43.714220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
카테고리섭취단위
카테고리1.0000.397
섭취단위0.3971.000

Missing values

2023-12-12T23:38:41.775167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T23:38:41.852992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

음식명카테고리섭취단위
0팥죽죽 및 스프류
1해산물을 곁들인 매콤한 토마토 스튜죽 및 스프류
2호박죽죽 및 스프류
3흑임자 두부 죽죽 및 스프류
4Oh!징어버거(삼강)즉석섭취·편의식품류
51,000냥참치햄샐러드김밥즉석섭취·편의식품류
61000냥왕김밥즉석섭취·편의식품류
71000천냥불고기김밥즉석섭취·편의식품류
81000천냥원조김밥즉석섭취·편의식품류
91000천냥참치김치김밥즉석섭취·편의식품류
음식명카테고리섭취단위
3989미니볼리에 패스트리크림과자류
3990미니스낵 베리필링과자류
3991미니스낵 코코아크림과자류
3992미니약과과자류
3993미니약과과자류
3994미니약과과자류
3995미니양갱과자류
3996미니와퍼딸기향과자류
3997미니와퍼토란향과자류
3998미니요우칸 네리과자류

Duplicate rows

Most frequently occurring

음식명카테고리섭취단위# duplicates
45리츠 크래커과자류8
65불고기버거즉석섭취·편의식품류4
89참치김치삼각김밥즉석섭취·편의식품류4
90참치마요네즈즉석섭취·편의식품류4
27데리야끼버거즉석섭취·편의식품류3
32두부과자과자류3
36두부스낵과자류3
37듬뿍넣은햄샌드즉석섭취·편의식품류3
47마늘바게트과자류3
51매직 치즈 샌드 크래커과자류3