Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells9406
Missing cells (%)13.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory634.8 KiB
Average record size in memory65.0 B

Variable types

Text3
Categorical1
Numeric1
Boolean2

Dataset

Description한국소비자원 참가격정보서비스에서 제공하고 있는 생필품 가격 정보로 상품명, 조사일, 판매가격, 판매업소, 제조사, 세일여부, 원플러스원 컬럼을 포함하고 있습니다.
Author한국소비자원
URLhttps://www.data.go.kr/data/15083256/fileData.do

Alerts

원플러스원 is highly imbalanced (99.7%)Imbalance
세일여부 has 3301 (33.0%) missing valuesMissing
원플러스원 has 6105 (61.1%) missing valuesMissing

Reproduction

Analysis started2024-04-06 08:09:48.904228
Analysis finished2024-04-06 08:09:51.330135
Duration2.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct426
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-06T17:09:51.720329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length41
Median length27
Mean length14.8693
Min length3

Characters and Unicode

Total characters148693
Distinct characters562
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41 ?
Unique (%)0.4%

Sample

1st row서울우유 흰우유(1L)
2nd row오가니스트 히말라야 핑크솔트 바디워시 리프레싱 민트향(865ml)
3rd row큐원 중력 밀가루(1kg)
4th row샤프란케어 은은한향(900ml)
5th row도브 센스티브 스킨 바디워시(1L)
ValueCountFrequency (%)
오뚜기 738
 
3.1%
백설 656
 
2.8%
청정원 313
 
1.3%
3분 270
 
1.1%
밀가루(1kg 209
 
0.9%
소면(900g 204
 
0.9%
보노 196
 
0.8%
옛날 180
 
0.8%
x 172
 
0.7%
진라면 163
 
0.7%
Other values (810) 20712
87.0%
2024-04-06T17:09:52.565122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13847
 
9.3%
) 9922
 
6.7%
( 9922
 
6.7%
0 8855
 
6.0%
g 4856
 
3.3%
1 3310
 
2.2%
5 2964
 
2.0%
2483
 
1.7%
m 2280
 
1.5%
1986
 
1.3%
Other values (552) 88268
59.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 78343
52.7%
Decimal Number 23821
 
16.0%
Space Separator 13847
 
9.3%
Lowercase Letter 10594
 
7.1%
Close Punctuation 9922
 
6.7%
Open Punctuation 9922
 
6.7%
Uppercase Letter 1233
 
0.8%
Other Punctuation 946
 
0.6%
Math Symbol 63
 
< 0.1%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2483
 
3.2%
1986
 
2.5%
1819
 
2.3%
1661
 
2.1%
1626
 
2.1%
1506
 
1.9%
1431
 
1.8%
1184
 
1.5%
1028
 
1.3%
992
 
1.3%
Other values (503) 62627
79.9%
Uppercase Letter
ValueCountFrequency (%)
L 503
40.8%
A 224
18.2%
C 135
 
10.9%
T 71
 
5.8%
G 62
 
5.0%
J 58
 
4.7%
B 56
 
4.5%
N 45
 
3.6%
F 30
 
2.4%
K 24
 
1.9%
Other values (7) 25
 
2.0%
Decimal Number
ValueCountFrequency (%)
0 8855
37.2%
1 3310
 
13.9%
5 2964
 
12.4%
3 1976
 
8.3%
2 1891
 
7.9%
6 1242
 
5.2%
8 998
 
4.2%
9 937
 
3.9%
4 881
 
3.7%
7 767
 
3.2%
Lowercase Letter
ValueCountFrequency (%)
g 4856
45.8%
m 2280
21.5%
l 1917
 
18.1%
k 1069
 
10.1%
c 262
 
2.5%
x 172
 
1.6%
a 18
 
0.2%
p 18
 
0.2%
e 1
 
< 0.1%
o 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 564
59.6%
, 165
 
17.4%
* 98
 
10.4%
& 52
 
5.5%
! 49
 
5.2%
; 18
 
1.9%
Math Symbol
ValueCountFrequency (%)
+ 50
79.4%
~ 13
 
20.6%
Space Separator
ValueCountFrequency (%)
13847
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9922
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9922
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 78343
52.7%
Common 58523
39.4%
Latin 11827
 
8.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2483
 
3.2%
1986
 
2.5%
1819
 
2.3%
1661
 
2.1%
1626
 
2.1%
1506
 
1.9%
1431
 
1.8%
1184
 
1.5%
1028
 
1.3%
992
 
1.3%
Other values (503) 62627
79.9%
Latin
ValueCountFrequency (%)
g 4856
41.1%
m 2280
19.3%
l 1917
 
16.2%
k 1069
 
9.0%
L 503
 
4.3%
c 262
 
2.2%
A 224
 
1.9%
x 172
 
1.5%
C 135
 
1.1%
T 71
 
0.6%
Other values (17) 338
 
2.9%
Common
ValueCountFrequency (%)
13847
23.7%
) 9922
17.0%
( 9922
17.0%
0 8855
15.1%
1 3310
 
5.7%
5 2964
 
5.1%
3 1976
 
3.4%
2 1891
 
3.2%
6 1242
 
2.1%
8 998
 
1.7%
Other values (12) 3596
 
6.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 78343
52.7%
ASCII 70350
47.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13847
19.7%
) 9922
14.1%
( 9922
14.1%
0 8855
12.6%
g 4856
 
6.9%
1 3310
 
4.7%
5 2964
 
4.2%
m 2280
 
3.2%
3 1976
 
2.8%
l 1917
 
2.7%
Other values (39) 10501
14.9%
Hangul
ValueCountFrequency (%)
2483
 
3.2%
1986
 
2.5%
1819
 
2.3%
1661
 
2.1%
1626
 
2.1%
1506
 
1.9%
1431
 
1.8%
1184
 
1.5%
1028
 
1.3%
992
 
1.3%
Other values (503) 62627
79.9%

조사일
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-08
8517 
2024-03-22
1483 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-03-08
2nd row2024-03-08
3rd row2024-03-22
4th row2024-03-08
5th row2024-03-08

Common Values

ValueCountFrequency (%)
2024-03-08 8517
85.2%
2024-03-22 1483
 
14.8%

Length

2024-04-06T17:09:52.871330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:09:53.092362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2024-03-08 8517
85.2%
2024-03-22 1483
 
14.8%

판매가격
Real number (ℝ)

Distinct599
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5672.1398
Minimum360
Maximum54900
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-06T17:09:53.326003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum360
5-th percentile1200
Q12600
median3980
Q36882.5
95-th percentile15200
Maximum54900
Range54540
Interquartile range (IQR)4282.5

Descriptive statistics

Standard deviation5559.8319
Coefficient of variation (CV)0.98020008
Kurtosis20.955146
Mean5672.1398
Median Absolute Deviation (MAD)1940
Skewness3.7229936
Sum56721398
Variance30911731
MonotonicityNot monotonic
2024-04-06T17:09:53.602713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3580 256
 
2.6%
1980 163
 
1.6%
2980 154
 
1.5%
3980 151
 
1.5%
5990 146
 
1.5%
6980 133
 
1.3%
6900 127
 
1.3%
3180 125
 
1.2%
5480 124
 
1.2%
3990 122
 
1.2%
Other values (589) 8499
85.0%
ValueCountFrequency (%)
360 1
 
< 0.1%
390 22
0.2%
440 13
0.1%
450 1
 
< 0.1%
470 12
0.1%
480 7
 
0.1%
495 1
 
< 0.1%
500 9
0.1%
530 16
0.2%
532 1
 
< 0.1%
ValueCountFrequency (%)
54900 18
0.2%
49800 5
 
0.1%
45900 1
 
< 0.1%
44800 11
0.1%
42900 1
 
< 0.1%
40590 1
 
< 0.1%
36900 25
0.2%
35100 1
 
< 0.1%
34810 1
 
< 0.1%
33800 1
 
< 0.1%
Distinct488
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-06T17:09:54.036439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length12
Mean length8.3957
Min length6

Characters and Unicode

Total characters83957
Distinct characters253
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row롯데슈퍼연수함박점
2nd row이마트경기광주점
3rd rowGS더프레시창원감계점
4th rowGS더프레시목감레이크점
5th row이마트둔산점
ValueCountFrequency (%)
주)농협하나로유통 223
 
2.1%
주)농협유통 169
 
1.6%
현대백화점천호점 44
 
0.4%
현대백화점중동점 43
 
0.4%
현대백화점목동점 41
 
0.4%
현대백화점킨텍스점 41
 
0.4%
대전점 39
 
0.4%
고양점 38
 
0.4%
현대백화점울산동구점 36
 
0.3%
부산점 36
 
0.3%
Other values (480) 9682
93.2%
2024-04-06T17:09:54.760889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10562
 
12.6%
3718
 
4.4%
3718
 
4.4%
3718
 
4.4%
3718
 
4.4%
G 2982
 
3.6%
2975
 
3.5%
2896
 
3.4%
2828
 
3.4%
2660
 
3.2%
Other values (243) 44182
52.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 76723
91.4%
Uppercase Letter 5493
 
6.5%
Close Punctuation 455
 
0.5%
Open Punctuation 455
 
0.5%
Decimal Number 439
 
0.5%
Space Separator 392
 
0.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10562
 
13.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
2975
 
3.9%
2896
 
3.8%
2828
 
3.7%
2660
 
3.5%
2614
 
3.4%
Other values (230) 37316
48.6%
Decimal Number
ValueCountFrequency (%)
2 189
43.1%
3 82
18.7%
1 73
 
16.6%
7 39
 
8.9%
5 38
 
8.7%
8 18
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
G 2982
54.3%
S 2471
45.0%
C 20
 
0.4%
U 20
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 455
100.0%
Open Punctuation
ValueCountFrequency (%)
( 455
100.0%
Space Separator
ValueCountFrequency (%)
392
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 76723
91.4%
Latin 5493
 
6.5%
Common 1741
 
2.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10562
 
13.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
2975
 
3.9%
2896
 
3.8%
2828
 
3.7%
2660
 
3.5%
2614
 
3.4%
Other values (230) 37316
48.6%
Common
ValueCountFrequency (%)
) 455
26.1%
( 455
26.1%
392
22.5%
2 189
10.9%
3 82
 
4.7%
1 73
 
4.2%
7 39
 
2.2%
5 38
 
2.2%
8 18
 
1.0%
Latin
ValueCountFrequency (%)
G 2982
54.3%
S 2471
45.0%
C 20
 
0.4%
U 20
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 76723
91.4%
ASCII 7234
 
8.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
10562
 
13.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
3718
 
4.8%
2975
 
3.9%
2896
 
3.8%
2828
 
3.7%
2660
 
3.5%
2614
 
3.4%
Other values (230) 37316
48.6%
ASCII
ValueCountFrequency (%)
G 2982
41.2%
S 2471
34.2%
) 455
 
6.3%
( 455
 
6.3%
392
 
5.4%
2 189
 
2.6%
3 82
 
1.1%
1 73
 
1.0%
7 39
 
0.5%
5 38
 
0.5%
Other values (3) 58
 
0.8%
Distinct100
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-06T17:09:55.315722image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length4.0146
Min length2

Characters and Unicode

Total characters40146
Distinct characters187
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row서울우유
2nd rowLG생활건강
3rd row삼양사
4th rowLG생활건강
5th row유니레버코리아
ValueCountFrequency (%)
오뚜기 1587
 
15.9%
cj제일제당 1298
 
13.0%
농심 646
 
6.5%
대상 361
 
3.6%
롯데칠성음료 345
 
3.5%
동서식품 302
 
3.0%
lg생활건강 257
 
2.6%
일반 239
 
2.4%
광동제약 212
 
2.1%
풀무원 185
 
1.8%
Other values (90) 4568
45.7%
2024-04-06T17:09:55.988910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3285
 
8.2%
1869
 
4.7%
1698
 
4.2%
1613
 
4.0%
1587
 
4.0%
C 1356
 
3.4%
J 1356
 
3.4%
1331
 
3.3%
865
 
2.2%
709
 
1.8%
Other values (177) 24477
61.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 36876
91.9%
Uppercase Letter 3228
 
8.0%
Open Punctuation 20
 
< 0.1%
Close Punctuation 20
 
< 0.1%
Decimal Number 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3285
 
8.9%
1869
 
5.1%
1698
 
4.6%
1613
 
4.4%
1587
 
4.3%
1331
 
3.6%
865
 
2.3%
709
 
1.9%
669
 
1.8%
669
 
1.8%
Other values (169) 22581
61.2%
Uppercase Letter
ValueCountFrequency (%)
C 1356
42.0%
J 1356
42.0%
G 257
 
8.0%
L 257
 
8.0%
M 2
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 20
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Decimal Number
ValueCountFrequency (%)
3 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 36876
91.9%
Latin 3228
 
8.0%
Common 42
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3285
 
8.9%
1869
 
5.1%
1698
 
4.6%
1613
 
4.4%
1587
 
4.3%
1331
 
3.6%
865
 
2.3%
709
 
1.9%
669
 
1.8%
669
 
1.8%
Other values (169) 22581
61.2%
Latin
ValueCountFrequency (%)
C 1356
42.0%
J 1356
42.0%
G 257
 
8.0%
L 257
 
8.0%
M 2
 
0.1%
Common
ValueCountFrequency (%)
( 20
47.6%
) 20
47.6%
3 2
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 36876
91.9%
ASCII 3270
 
8.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3285
 
8.9%
1869
 
5.1%
1698
 
4.6%
1613
 
4.4%
1587
 
4.3%
1331
 
3.6%
865
 
2.3%
709
 
1.9%
669
 
1.8%
669
 
1.8%
Other values (169) 22581
61.2%
ASCII
ValueCountFrequency (%)
C 1356
41.5%
J 1356
41.5%
G 257
 
7.9%
L 257
 
7.9%
( 20
 
0.6%
) 20
 
0.6%
M 2
 
0.1%
3 2
 
0.1%

세일여부
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing3301
Missing (%)33.0%
Memory size97.7 KiB
False
3882 
True
2817 
(Missing)
3301 
ValueCountFrequency (%)
False 3882
38.8%
True 2817
28.2%
(Missing) 3301
33.0%
2024-04-06T17:09:56.208238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

원플러스원
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)0.1%
Missing6105
Missing (%)61.1%
Memory size97.7 KiB
False
3894 
True
 
1
(Missing)
6105 
ValueCountFrequency (%)
False 3894
38.9%
True 1
 
< 0.1%
(Missing) 6105
61.1%
2024-04-06T17:09:56.362464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2024-04-06T17:09:50.402831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T17:09:56.480679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사일판매가격제조사세일여부원플러스원
조사일1.0000.2740.5960.0340.000
판매가격0.2741.0000.9360.1420.000
제조사0.5960.9361.0000.4880.000
세일여부0.0340.1420.4881.0000.000
원플러스원0.0000.0000.0000.0001.000
2024-04-06T17:09:56.654124image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사일원플러스원세일여부
조사일1.0000.0000.021
원플러스원0.0001.0000.000
세일여부0.0210.0001.000
2024-04-06T17:09:56.877086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
판매가격조사일세일여부원플러스원
판매가격1.0000.2100.1090.000
조사일0.2101.0000.0210.000
세일여부0.1090.0211.0000.000
원플러스원0.0000.0000.0001.000

Missing values

2024-04-06T17:09:50.763903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T17:09:50.999455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-06T17:09:51.223882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

상품명조사일판매가격판매업소제조사세일여부원플러스원
61873서울우유 흰우유(1L)2024-03-083290롯데슈퍼연수함박점서울우유NN
74417오가니스트 히말라야 핑크솔트 바디워시 리프레싱 민트향(865ml)2024-03-0812900이마트경기광주점LG생활건강<NA><NA>
93600큐원 중력 밀가루(1kg)2024-03-221480GS더프레시창원감계점삼양사Y<NA>
68678샤프란케어 은은한향(900ml)2024-03-085400GS더프레시목감레이크점LG생활건강Y<NA>
74007도브 센스티브 스킨 바디워시(1L)2024-03-0813900이마트둔산점유니레버코리아<NA><NA>
9903켈로그 콘푸로스트(600g)2024-03-086980GS더프레시울산구영점농심Y<NA>
54614맥심 티오피 마스터 라떼(275ml)2024-03-082150GS더프레시마산점동서식품Y<NA>
57166마주앙 레드(750ml)2024-03-0810100롯데슈퍼하당점롯데주류NN
48469삼다수(500ml)2024-03-08530롯데슈퍼화성봉담점광동제약NN
8962보노 포르치니버섯스프(3개입)2024-03-083350GS더프레시진해여좌점농심Y<NA>
상품명조사일판매가격판매업소제조사세일여부원플러스원
81781적상추(100g)2024-03-082290롯데슈퍼신부점일반NN
82372CJ 행복한콩 안심아삭 콩나물(380g)2024-03-081880이마트진접점CJ제일제당<NA><NA>
29586사조참치 살코기 안심따개(4캔)2024-03-088980이마트천호점사조대림<NA><NA>
73555플로쉴드 옐로우 면도날(4개)2024-03-0833200이마트용산점질레트<NA><NA>
3076맛있는라면(5개입)2024-03-085350롯데슈퍼시흥조남점삼양식품NN
32405청정원 발사믹 드레싱2024-03-083580이마트김포한강점대상<NA><NA>
79553크리넥스 데코엔소프트 3겹(24롤)2024-03-0828200CU(본사)유한킴벌리<NA><NA>
10020켈로그 콘푸로스트(600g)2024-03-087190롯데슈퍼용인남사점농심NN
57341진로 와인(500ml)2024-03-083490롯데슈퍼용인남사점진로NN
31335샘표 토장(900g)2024-03-089800GS더프레시전주호성점샘표Y<NA>