Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows976
Duplicate rows (%)9.8%
Total size in memory644.5 KiB
Average record size in memory66.0 B

Variable types

Text3
Categorical3
Numeric1

Dataset

Description품목명,단위,등급,가격,산지,친환경구분,입력일
Author서울시농수산식품공사
URLhttps://data.seoul.go.kr/dataList/OA-2662/S/1/datasetView.do

Alerts

Dataset has 976 (9.8%) duplicate rowsDuplicates
등급 is highly imbalanced (68.5%)Imbalance
친환경구분 is highly imbalanced (56.7%)Imbalance

Reproduction

Analysis started2024-04-21 00:40:05.374416
Analysis finished2024-04-21 00:40:07.798451
Duration2.42 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct651
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T09:40:07.938171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length18
Mean length9.231
Min length5

Characters and Unicode

Total characters92310
Distinct characters301
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique128 ?
Unique (%)1.3%

Sample

1st row[기타양채]통로메인
2nd row[오이]백다다기
3rd row[딸기]설향(딸기)
4th row[토마토]대추빨강(방울토마토)
5th row[버섯]버섯(꽃느타리)
ValueCountFrequency (%)
오이]백다다기 402
 
4.0%
딸기]설향(딸기 280
 
2.8%
딸기]설향 261
 
2.6%
참외]참외 247
 
2.5%
토마토]토마토대저 207
 
2.1%
깻잎]깻잎 188
 
1.9%
참외]금싸라기(참외 172
 
1.7%
호박]애호박 171
 
1.7%
버섯]표고버섯 141
 
1.4%
시금치]시금치 136
 
1.4%
Other values (644) 7798
78.0%
2024-04-21T09:40:08.279365image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
[ 10000
 
10.8%
] 10000
 
10.8%
( 3880
 
4.2%
) 3853
 
4.2%
3582
 
3.9%
2728
 
3.0%
2298
 
2.5%
2196
 
2.4%
2196
 
2.4%
1849
 
2.0%
Other values (291) 49728
53.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64177
69.5%
Open Punctuation 13880
 
15.0%
Close Punctuation 13853
 
15.0%
Uppercase Letter 266
 
0.3%
Dash Punctuation 99
 
0.1%
Other Punctuation 25
 
< 0.1%
Decimal Number 7
 
< 0.1%
Space Separator 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3582
 
5.6%
2728
 
4.3%
2298
 
3.6%
2196
 
3.4%
2196
 
3.4%
1849
 
2.9%
1831
 
2.9%
1800
 
2.8%
1784
 
2.8%
1293
 
2.0%
Other values (276) 42620
66.4%
Uppercase Letter
ValueCountFrequency (%)
B 74
27.8%
O 74
27.8%
X 74
27.8%
R 22
 
8.3%
T 22
 
8.3%
Open Punctuation
ValueCountFrequency (%)
[ 10000
72.0%
( 3880
 
28.0%
Close Punctuation
ValueCountFrequency (%)
] 10000
72.2%
) 3853
 
27.8%
Other Punctuation
ValueCountFrequency (%)
. 22
88.0%
, 3
 
12.0%
Decimal Number
ValueCountFrequency (%)
2 5
71.4%
1 2
 
28.6%
Dash Punctuation
ValueCountFrequency (%)
- 99
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64177
69.5%
Common 27867
30.2%
Latin 266
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3582
 
5.6%
2728
 
4.3%
2298
 
3.6%
2196
 
3.4%
2196
 
3.4%
1849
 
2.9%
1831
 
2.9%
1800
 
2.8%
1784
 
2.8%
1293
 
2.0%
Other values (276) 42620
66.4%
Common
ValueCountFrequency (%)
[ 10000
35.9%
] 10000
35.9%
( 3880
 
13.9%
) 3853
 
13.8%
- 99
 
0.4%
. 22
 
0.1%
2 5
 
< 0.1%
3
 
< 0.1%
, 3
 
< 0.1%
1 2
 
< 0.1%
Latin
ValueCountFrequency (%)
B 74
27.8%
O 74
27.8%
X 74
27.8%
R 22
 
8.3%
T 22
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64177
69.5%
ASCII 28133
30.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
[ 10000
35.5%
] 10000
35.5%
( 3880
 
13.8%
) 3853
 
13.7%
- 99
 
0.4%
B 74
 
0.3%
O 74
 
0.3%
X 74
 
0.3%
R 22
 
0.1%
T 22
 
0.1%
Other values (5) 35
 
0.1%
Hangul
ValueCountFrequency (%)
3582
 
5.6%
2728
 
4.3%
2298
 
3.6%
2196
 
3.4%
2196
 
3.4%
1849
 
2.9%
1831
 
2.9%
1800
 
2.8%
1784
 
2.8%
1293
 
2.0%
Other values (276) 42620
66.4%

단위
Text

Distinct75
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T09:40:08.454166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length3
Mean length3.4943
Min length3

Characters and Unicode

Total characters34943
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)0.2%

Sample

1st row2kg
2nd row21kg
3rd row2kg
4th row3kg
5th row2kg
ValueCountFrequency (%)
10kg 2027
20.3%
2kg 1711
17.1%
4kg 1311
13.1%
8kg 942
9.4%
1kg 808
 
8.1%
5kg 766
 
7.7%
20kg 404
 
4.0%
2.5kg 346
 
3.5%
3kg 326
 
3.3%
15kg 246
 
2.5%
Other values (55) 1113
11.1%
2024-04-21T09:40:08.752740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
k 10000
28.6%
g 10000
28.6%
1 4011
11.5%
2 2631
 
7.5%
0 2465
 
7.1%
5 1557
 
4.5%
4 1348
 
3.9%
8 1169
 
3.3%
. 919
 
2.6%
3 494
 
1.4%
Other values (3) 349
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 20000
57.2%
Decimal Number 14024
40.1%
Other Punctuation 919
 
2.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4011
28.6%
2 2631
18.8%
0 2465
17.6%
5 1557
 
11.1%
4 1348
 
9.6%
8 1169
 
8.3%
3 494
 
3.5%
6 143
 
1.0%
7 125
 
0.9%
9 81
 
0.6%
Lowercase Letter
ValueCountFrequency (%)
k 10000
50.0%
g 10000
50.0%
Other Punctuation
ValueCountFrequency (%)
. 919
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 20000
57.2%
Common 14943
42.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4011
26.8%
2 2631
17.6%
0 2465
16.5%
5 1557
 
10.4%
4 1348
 
9.0%
8 1169
 
7.8%
. 919
 
6.2%
3 494
 
3.3%
6 143
 
1.0%
7 125
 
0.8%
Latin
ValueCountFrequency (%)
k 10000
50.0%
g 10000
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 34943
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
k 10000
28.6%
g 10000
28.6%
1 4011
11.5%
2 2631
 
7.5%
0 2465
 
7.1%
5 1557
 
4.5%
4 1348
 
3.9%
8 1169
 
3.3%
. 919
 
2.6%
3 494
 
1.4%
Other values (3) 349
 
1.0%

등급
Categorical

IMBALANCE 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등)
8199 
상(2등)
963 
보통(3등)
 
450
9등(등외)
 
265
없음
 
70
Other values (4)
 
53

Length

Max length6
Median length5
Mean length5.0346
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row특(1등)
2nd row특(1등)
3rd row특(1등)
4th row특(1등)
5th row특(1등)

Common Values

ValueCountFrequency (%)
특(1등) 8199
82.0%
상(2등) 963
 
9.6%
보통(3등) 450
 
4.5%
9등(등외) 265
 
2.6%
없음 70
 
0.7%
4등 34
 
0.3%
5등 13
 
0.1%
6등 5
 
0.1%
8등 1
 
< 0.1%

Length

2024-04-21T09:40:08.892252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T09:40:09.015375image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 8199
82.0%
상(2등 963
 
9.6%
보통(3등 450
 
4.5%
9등(등외 265
 
2.6%
없음 70
 
0.7%
4등 34
 
0.3%
5등 13
 
0.1%
6등 5
 
< 0.1%
8등 1
 
< 0.1%

가격
Real number (ℝ)

Distinct608
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25043.345
Minimum300
Maximum268800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T09:40:09.152341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum300
5-th percentile2400
Q18000
median15500
Q331075
95-th percentile81000
Maximum268800
Range268500
Interquartile range (IQR)23075

Descriptive statistics

Standard deviation27229.429
Coefficient of variation (CV)1.087292
Kurtosis8.9594909
Mean25043.345
Median Absolute Deviation (MAD)9300
Skewness2.4800599
Sum2.5043346 × 108
Variance7.4144178 × 108
MonotonicityNot monotonic
2024-04-21T09:40:09.304064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 227
 
2.3%
12000 226
 
2.3%
15000 224
 
2.2%
11000 218
 
2.2%
16000 210
 
2.1%
14000 197
 
2.0%
18000 192
 
1.9%
13000 192
 
1.9%
9000 190
 
1.9%
17000 177
 
1.8%
Other values (598) 7947
79.5%
ValueCountFrequency (%)
300 2
 
< 0.1%
350 7
0.1%
400 2
 
< 0.1%
450 2
 
< 0.1%
500 10
0.1%
550 4
 
< 0.1%
600 8
0.1%
650 3
 
< 0.1%
700 12
0.1%
730 1
 
< 0.1%
ValueCountFrequency (%)
268800 1
 
< 0.1%
259200 1
 
< 0.1%
250000 3
< 0.1%
228000 1
 
< 0.1%
222000 1
 
< 0.1%
209000 1
 
< 0.1%
207000 1
 
< 0.1%
206000 1
 
< 0.1%
205000 2
< 0.1%
203500 1
 
< 0.1%

산지
Text

Distinct179
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T09:40:09.605727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length7.6197
Min length2

Characters and Unicode

Total characters76197
Distinct characters150
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st row경기도 성남시
2nd row충청남도 천안시
3rd row경상남도 산청군
4th row충청남도 예산군
5th row경기도 양평군
ValueCountFrequency (%)
경기도 1967
 
10.1%
경상남도 1927
 
9.9%
충청남도 1710
 
8.8%
전라남도 1089
 
5.6%
경상북도 965
 
4.9%
진주시 461
 
2.4%
전라북도 449
 
2.3%
성주군 437
 
2.2%
밀양시 374
 
1.9%
부산광역시 362
 
1.9%
Other values (178) 9795
50.1%
2024-04-21T09:40:10.022510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9536
 
12.5%
9056
 
11.9%
5762
 
7.6%
5015
 
6.6%
4934
 
6.5%
3821
 
5.0%
3055
 
4.0%
2609
 
3.4%
2579
 
3.4%
2071
 
2.7%
Other values (140) 27759
36.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66661
87.5%
Space Separator 9536
 
12.5%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9056
 
13.6%
5762
 
8.6%
5015
 
7.5%
4934
 
7.4%
3821
 
5.7%
3055
 
4.6%
2609
 
3.9%
2579
 
3.9%
2071
 
3.1%
1972
 
3.0%
Other values (139) 25787
38.7%
Space Separator
ValueCountFrequency (%)
9536
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66661
87.5%
Common 9536
 
12.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9056
 
13.6%
5762
 
8.6%
5015
 
7.5%
4934
 
7.4%
3821
 
5.7%
3055
 
4.6%
2609
 
3.9%
2579
 
3.9%
2071
 
3.1%
1972
 
3.0%
Other values (139) 25787
38.7%
Common
ValueCountFrequency (%)
9536
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66661
87.5%
ASCII 9536
 
12.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9536
100.0%
Hangul
ValueCountFrequency (%)
9056
 
13.6%
5762
 
8.6%
5015
 
7.5%
4934
 
7.4%
3821
 
5.7%
3055
 
4.6%
2609
 
3.9%
2579
 
3.9%
2071
 
3.1%
1972
 
3.0%
Other values (139) 25787
38.7%

친환경구분
Categorical

IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반
7490 
우수농산물
1707 
무농약
 
632
품질인증
 
115
저농약
 
44

Length

Max length5
Median length2
Mean length2.6039
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row우수농산물
4th row우수농산물
5th row무농약

Common Values

ValueCountFrequency (%)
일반 7490
74.9%
우수농산물 1707
 
17.1%
무농약 632
 
6.3%
품질인증 115
 
1.1%
저농약 44
 
0.4%
유기농 12
 
0.1%

Length

2024-04-21T09:40:10.174937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T09:40:10.304992image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 7490
74.9%
우수농산물 1707
 
17.1%
무농약 632
 
6.3%
품질인증 115
 
1.1%
저농약 44
 
0.4%
유기농 12
 
0.1%

입력일
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20240419
3717 
20240417
3071 
20240418
2953 
20240416
 
259

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20240417
2nd row20240419
3rd row20240417
4th row20240418
5th row20240417

Common Values

ValueCountFrequency (%)
20240419 3717
37.2%
20240417 3071
30.7%
20240418 2953
29.5%
20240416 259
 
2.6%

Length

2024-04-21T09:40:10.422925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T09:40:10.532986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20240419 3717
37.2%
20240417 3071
30.7%
20240418 2953
29.5%
20240416 259
 
2.6%

Interactions

2024-04-21T09:40:07.449021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T09:40:10.609035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단위등급가격친환경구분입력일
단위1.0000.4250.6660.4870.161
등급0.4251.0000.1100.1150.111
가격0.6660.1101.0000.1330.055
친환경구분0.4870.1150.1331.0000.176
입력일0.1610.1110.0550.1761.000
2024-04-21T09:40:10.717847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
입력일등급친환경구분
입력일1.0000.0710.114
등급0.0711.0000.057
친환경구분0.1140.0571.000
2024-04-21T09:40:10.814215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가격등급친환경구분입력일
가격1.0000.0500.0700.033
등급0.0501.0000.0570.071
친환경구분0.0700.0571.0000.114
입력일0.0330.0710.1141.000

Missing values

2024-04-21T09:40:07.626525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T09:40:07.740345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목명단위등급가격산지친환경구분입력일
72257[기타양채]통로메인2kg특(1등)2400경기도 성남시일반20240417
29790[오이]백다다기21kg특(1등)36000충청남도 천안시일반20240419
70702[딸기]설향(딸기)2kg특(1등)8000경상남도 산청군우수농산물20240417
63098[토마토]대추빨강(방울토마토)3kg특(1등)15000충청남도 예산군우수농산물20240418
79638[버섯]버섯(꽃느타리)2kg특(1등)4800경기도 양평군무농약20240417
53518[버섯]표고버섯8kg특(1등)36000충청남도 홍성군일반20240418
76401[고구마]호박밤고구마10kg특(1등)33500전라남도 영암군무농약20240417
46444[딸기]설향(딸기)2kg특(1등)14000전라남도 담양군우수농산물20240418
87557[양파]양파15kg특(1등)16500제주자치도 제주시일반20240417
26333[쑥갓]쑥갓4kg특(1등)8500경기도 광주시일반20240419
품목명단위등급가격산지친환경구분입력일
85387[생고추]오이맛고추10kg상(2등)28000경상남도 밀양시일반20240417
99798[버섯]표고버섯(중국산)13kg특(1등)26000전라남도 장흥군일반20240416
43511[깻잎]깻잎관4kg특(1등)9000충청남도 금산군일반20240418
29672[오이]백다다기8kg특(1등)14500경기도 양주시일반20240419
76331[고구마]호박밤고구마10kg특(1등)20000전라북도 익산시일반20240417
42756[깻잎]깻잎3kg특(1등)16500충청남도 금산군우수농산물20240418
21482[버섯]표고버섯(국내산)15kg특(1등)82500충청남도 천안시일반20240419
53557[버섯]표고버섯16kg특(1등)100000충청북도 진천군일반20240418
38427[파]대파10kg특(1등)18300전라남도 영광군일반20240418
80187[버섯]새송이6kg보통(3등)7500충청남도 천안시무농약20240417

Duplicate rows

Most frequently occurring

품목명단위등급가격산지친환경구분입력일# duplicates
783[토마토]대저토마토2.5kg없음8000부산광역시 강서구일반202404199
653[오이]백다다기20kg상(2등)32000충청남도 천안시우수농산물202404198
217[딸기]설향(딸기)2kg특(1등)7000경상남도 산청군우수농산물202404177
223[딸기]설향(딸기)2kg특(1등)9000경상남도 산청군우수농산물202404197
858[토마토]토마토대저2.5kg특(1등)15000부산광역시 강서구일반202404197
416[버섯]생표고(국내산)8kg특(1등)40000전라북도 고창군일반202404176
596[아스파라거스]아스파라거스1kg특(1등)7000강원도 양구군일반202404176
655[오이]백다다기20kg상(2등)34000충청남도 천안시우수농산물202404186
753[참외]참외10kg특(1등)75000경상북도 성주군일반202404176
856[토마토]토마토대저2.5kg특(1등)15000부산광역시 강서구일반202404176