Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1109
Duplicate rows (%)11.1%
Total size in memory712.9 KiB
Average record size in memory73.0 B

Variable types

DateTime1
Text4
Categorical2
Numeric1

Dataset

Description거래일,품목,품종,단위,등급,가격,출하지,친환경구분(일반)
Author서울시농수산식품공사
URLhttps://data.seoul.go.kr/dataList/OA-20950/S/1/datasetView.do

Alerts

친환경구분(일반) has constant value ""Constant
Dataset has 1109 (11.1%) duplicate rowsDuplicates
등급 is highly imbalanced (87.5%)Imbalance

Reproduction

Analysis started2024-05-11 06:23:16.566457
Analysis finished2024-05-11 06:23:19.000109
Duration2.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct48
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
Minimum2024-03-05 00:00:00
Maximum2024-05-10 00:00:00
2024-05-11T06:23:19.164960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-11T06:23:19.581265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)

품목
Text

Distinct131
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:23:20.280490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length7
Mean length3.0547
Min length1

Characters and Unicode

Total characters30547
Distinct characters182
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)0.2%

Sample

1st row고사리
2nd row채소류 기타
3rd row고구마
4th row채소류 기타
5th row콩나물
ValueCountFrequency (%)
기타 1011
 
9.1%
채소류 938
 
8.4%
콩나물 898
 
8.1%
마늘 750
 
6.7%
베이비 684
 
6.1%
숙주나물 666
 
6.0%
두부 659
 
5.9%
고사리 522
 
4.7%
새싹 413
 
3.7%
무순 394
 
3.5%
Other values (126) 4211
37.8%
2024-05-11T06:23:21.761657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2150
 
7.0%
2142
 
7.0%
1443
 
4.7%
1146
 
3.8%
1113
 
3.6%
1025
 
3.4%
1017
 
3.3%
953
 
3.1%
943
 
3.1%
938
 
3.1%
Other values (172) 17677
57.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 29283
95.9%
Space Separator 1146
 
3.8%
Open Punctuation 59
 
0.2%
Close Punctuation 59
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2150
 
7.3%
2142
 
7.3%
1443
 
4.9%
1113
 
3.8%
1025
 
3.5%
1017
 
3.5%
953
 
3.3%
943
 
3.2%
938
 
3.2%
907
 
3.1%
Other values (169) 16652
56.9%
Space Separator
ValueCountFrequency (%)
1146
100.0%
Open Punctuation
ValueCountFrequency (%)
( 59
100.0%
Close Punctuation
ValueCountFrequency (%)
) 59
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 29283
95.9%
Common 1264
 
4.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2150
 
7.3%
2142
 
7.3%
1443
 
4.9%
1113
 
3.8%
1025
 
3.5%
1017
 
3.5%
953
 
3.3%
943
 
3.2%
938
 
3.2%
907
 
3.1%
Other values (169) 16652
56.9%
Common
ValueCountFrequency (%)
1146
90.7%
( 59
 
4.7%
) 59
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 29283
95.9%
ASCII 1264
 
4.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2150
 
7.3%
2142
 
7.3%
1443
 
4.9%
1113
 
3.8%
1025
 
3.5%
1017
 
3.5%
953
 
3.3%
943
 
3.2%
938
 
3.2%
907
 
3.1%
Other values (169) 16652
56.9%
ASCII
ValueCountFrequency (%)
1146
90.7%
( 59
 
4.7%
) 59
 
4.7%

품종
Text

Distinct225
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:23:22.530576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length5.6049
Min length2

Characters and Unicode

Total characters56049
Distinct characters220
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique46 ?
Unique (%)0.5%

Sample

1st row고사리 수입
2nd row채소류 기타(상장예외)
3rd row밤 고구마
4th row채소류 기타(상장예외)
5th row콩나물 수입
ValueCountFrequency (%)
수입 3514
21.9%
기타(상장예외 948
 
5.9%
채소류 938
 
5.8%
콩나물 898
 
5.6%
베이비 684
 
4.3%
숙주나물 666
 
4.1%
고사리 506
 
3.1%
깐마늘 447
 
2.8%
새싹 413
 
2.6%
무순 394
 
2.5%
Other values (206) 6659
41.4%
2024-05-11T06:23:24.002253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6067
 
10.8%
3551
 
6.3%
3519
 
6.3%
2323
 
4.1%
2142
 
3.8%
) 1323
 
2.4%
( 1323
 
2.4%
1284
 
2.3%
1268
 
2.3%
1177
 
2.1%
Other values (210) 32072
57.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47336
84.5%
Space Separator 6067
 
10.8%
Close Punctuation 1323
 
2.4%
Open Punctuation 1323
 
2.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3551
 
7.5%
3519
 
7.4%
2323
 
4.9%
2142
 
4.5%
1284
 
2.7%
1268
 
2.7%
1177
 
2.5%
1113
 
2.4%
1039
 
2.2%
1031
 
2.2%
Other values (207) 28889
61.0%
Space Separator
ValueCountFrequency (%)
6067
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1323
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1323
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47336
84.5%
Common 8713
 
15.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3551
 
7.5%
3519
 
7.4%
2323
 
4.9%
2142
 
4.5%
1284
 
2.7%
1268
 
2.7%
1177
 
2.5%
1113
 
2.4%
1039
 
2.2%
1031
 
2.2%
Other values (207) 28889
61.0%
Common
ValueCountFrequency (%)
6067
69.6%
) 1323
 
15.2%
( 1323
 
15.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47336
84.5%
ASCII 8713
 
15.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6067
69.6%
) 1323
 
15.2%
( 1323
 
15.2%
Hangul
ValueCountFrequency (%)
3551
 
7.5%
3519
 
7.4%
2323
 
4.9%
2142
 
4.5%
1284
 
2.7%
1268
 
2.7%
1177
 
2.5%
1113
 
2.4%
1039
 
2.2%
1031
 
2.2%
Other values (207) 28889
61.0%

단위
Text

Distinct121
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:23:24.554882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length5
Mean length3.8916
Min length3

Characters and Unicode

Total characters38916
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.4%

Sample

1st row10키로
2nd row10키로
3rd row10키로
4th row10키로
5th row3.5키로
ValueCountFrequency (%)
10키로 1647
16.5%
3.5키로 1197
12.0%
4키로 1150
11.5%
1키로 1097
11.0%
500그람 731
7.3%
20키로 651
 
6.5%
50그람 485
 
4.9%
2키로 475
 
4.8%
5키로 434
 
4.3%
12키로 328
 
3.3%
Other values (106) 1805
18.1%
2024-05-11T06:23:25.529959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8248
21.2%
8248
21.2%
0 4889
12.6%
1 3816
9.8%
5 3364
8.6%
1752
 
4.5%
1752
 
4.5%
2 1637
 
4.2%
. 1497
 
3.8%
3 1398
 
3.6%
Other values (5) 2315
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 20000
51.4%
Decimal Number 17419
44.8%
Other Punctuation 1497
 
3.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4889
28.1%
1 3816
21.9%
5 3364
19.3%
2 1637
 
9.4%
3 1398
 
8.0%
4 1323
 
7.6%
6 504
 
2.9%
8 164
 
0.9%
7 163
 
0.9%
9 161
 
0.9%
Other Letter
ValueCountFrequency (%)
8248
41.2%
8248
41.2%
1752
 
8.8%
1752
 
8.8%
Other Punctuation
ValueCountFrequency (%)
. 1497
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 20000
51.4%
Common 18916
48.6%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4889
25.8%
1 3816
20.2%
5 3364
17.8%
2 1637
 
8.7%
. 1497
 
7.9%
3 1398
 
7.4%
4 1323
 
7.0%
6 504
 
2.7%
8 164
 
0.9%
7 163
 
0.9%
Hangul
ValueCountFrequency (%)
8248
41.2%
8248
41.2%
1752
 
8.8%
1752
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 20000
51.4%
ASCII 18916
48.6%

Most frequent character per block

Hangul
ValueCountFrequency (%)
8248
41.2%
8248
41.2%
1752
 
8.8%
1752
 
8.8%
ASCII
ValueCountFrequency (%)
0 4889
25.8%
1 3816
20.2%
5 3364
17.8%
2 1637
 
8.7%
. 1497
 
7.9%
3 1398
 
7.4%
4 1323
 
7.0%
6 504
 
2.7%
8 164
 
0.9%
7 163
 
0.9%

등급
Categorical

IMBALANCE 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
기타
9538 
 
126
 
125
등외
 
94
 
84
Other values (3)
 
33

Length

Max length2
Median length2
Mean length1.9632
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row기타
2nd row기타
3rd row기타
4th row기타
5th row기타

Common Values

ValueCountFrequency (%)
기타 9538
95.4%
126
 
1.3%
125
 
1.2%
등외 94
 
0.9%
84
 
0.8%
27
 
0.3%
4
 
< 0.1%
2
 
< 0.1%

Length

2024-05-11T06:23:26.048464image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:23:26.403915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
기타 9538
95.4%
126
 
1.3%
125
 
1.2%
등외 94
 
0.9%
84
 
0.8%
27
 
0.3%
4
 
< 0.1%
2
 
< 0.1%

가격
Real number (ℝ)

Distinct541
Distinct (%)5.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29063.392
Minimum300
Maximum3000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T06:23:26.906206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum300
5-th percentile750
Q14300
median12000
Q330000
95-th percentile123000
Maximum3000000
Range2999700
Interquartile range (IQR)25700

Descriptive statistics

Standard deviation56598.048
Coefficient of variation (CV)1.9473999
Kurtosis778.7472
Mean29063.392
Median Absolute Deviation (MAD)9300
Skewness17.29018
Sum2.9063392 × 108
Variance3.203339 × 109
MonotonicityNot monotonic
2024-05-11T06:23:27.701160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2700 482
 
4.8%
4600 269
 
2.7%
18000 256
 
2.6%
10700 242
 
2.4%
4300 223
 
2.2%
17000 205
 
2.1%
7500 177
 
1.8%
850 177
 
1.8%
4800 167
 
1.7%
21000 157
 
1.6%
Other values (531) 7645
76.4%
ValueCountFrequency (%)
300 1
 
< 0.1%
320 8
 
0.1%
350 2
 
< 0.1%
360 19
 
0.2%
370 141
1.4%
380 1
 
< 0.1%
390 9
 
0.1%
400 12
 
0.1%
480 17
 
0.2%
490 2
 
< 0.1%
ValueCountFrequency (%)
3000000 1
< 0.1%
910000 1
< 0.1%
780000 1
< 0.1%
750000 1
< 0.1%
650000 1
< 0.1%
520000 1
< 0.1%
494172 2
< 0.1%
465000 1
< 0.1%
440000 1
< 0.1%
435000 1
< 0.1%
Distinct161
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:23:28.708007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7468
Min length2

Characters and Unicode

Total characters47468
Distinct characters162
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)0.3%

Sample

1st row중국
2nd row경기도 이천시
3rd row경기도 이천시
4th row제주자치도 서귀포시
5th row중국
ValueCountFrequency (%)
중국 3618
24.6%
경기도 1852
 
12.6%
전라남도 1036
 
7.0%
광주시 701
 
4.8%
페루 406
 
2.8%
경상남도 405
 
2.8%
제주자치도 361
 
2.5%
경상북도 357
 
2.4%
태국 307
 
2.1%
충청남도 296
 
2.0%
Other values (163) 5370
36.5%
2024-05-11T06:23:30.083505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4898
 
10.3%
4709
 
9.9%
4015
 
8.5%
3620
 
7.6%
3540
 
7.5%
2629
 
5.5%
2164
 
4.6%
1879
 
4.0%
1599
 
3.4%
1393
 
2.9%
Other values (152) 17022
35.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 42757
90.1%
Space Separator 4709
 
9.9%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4898
 
11.5%
4015
 
9.4%
3620
 
8.5%
3540
 
8.3%
2629
 
6.1%
2164
 
5.1%
1879
 
4.4%
1599
 
3.7%
1393
 
3.3%
1098
 
2.6%
Other values (149) 15922
37.2%
Space Separator
ValueCountFrequency (%)
4709
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 42757
90.1%
Common 4711
 
9.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4898
 
11.5%
4015
 
9.4%
3620
 
8.5%
3540
 
8.3%
2629
 
6.1%
2164
 
5.1%
1879
 
4.4%
1599
 
3.7%
1393
 
3.3%
1098
 
2.6%
Other values (149) 15922
37.2%
Common
ValueCountFrequency (%)
4709
> 99.9%
) 1
 
< 0.1%
( 1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 42757
90.1%
ASCII 4711
 
9.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4898
 
11.5%
4015
 
9.4%
3620
 
8.5%
3540
 
8.3%
2629
 
6.1%
2164
 
5.1%
1879
 
4.4%
1599
 
3.7%
1393
 
3.3%
1098
 
2.6%
Other values (149) 15922
37.2%
ASCII
ValueCountFrequency (%)
4709
> 99.9%
) 1
 
< 0.1%
( 1
 
< 0.1%

친환경구분(일반)
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
일반
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row일반
2nd row일반
3rd row일반
4th row일반
5th row일반

Common Values

ValueCountFrequency (%)
일반 10000
100.0%

Length

2024-05-11T06:23:30.554521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:23:30.848573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
일반 10000
100.0%

Interactions

2024-05-11T06:23:17.900457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:23:31.284571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
거래일등급가격
거래일1.0000.0260.000
등급0.0261.0000.234
가격0.0000.2341.000
2024-05-11T06:23:31.552943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가격등급
가격1.0000.145
등급0.1451.000

Missing values

2024-05-11T06:23:18.326509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:23:18.828625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

거래일품목품종단위등급가격출하지친환경구분(일반)
786462024-03-19고사리고사리 수입10키로기타17000중국일반
45672024-05-08채소류 기타채소류 기타(상장예외)10키로기타20000경기도 이천시일반
129372024-05-02고구마밤 고구마10키로기타23000경기도 이천시일반
687552024-03-25채소류 기타채소류 기타(상장예외)10키로기타21000제주자치도 서귀포시일반
338162024-04-17콩나물콩나물 수입3.5키로기타2700중국일반
281662024-04-22마늘마늘 쫑 수입7키로기타23000중국일반
75902024-05-07숙주나물숙주나물 수입3.5키로기타4600중국일반
734002024-03-21두부포장두부6.6키로기타16000인도일반
208482024-04-25고구마밤 고구마10키로기타25000경기도 여주시일반
113802024-05-02두부순두부16키로기타19000중국일반
거래일품목품종단위등급가격출하지친환경구분(일반)
454482024-04-09숙주나물숙주나물 수입3.5키로기타4500중국일반
907942024-03-11콩나물콩나물 수입6키로기타7700중국일반
417752024-04-11조미오징어류조미오징어류10키로기타265000페루일반
976412024-03-06두부연두부12키로기타10700중국일반
768482024-03-20도라지깐도라지 수입10키로기타43000중국일반
497042024-04-05두부포장두부12키로기타17000인도일반
319702024-04-18베이비베이비500그람기타4200경기도 광주시일반
665502024-03-26망고망고 수입5키로기타37000태국일반
471702024-04-08방풍나물방풍나물2키로기타4500전라남도 여수시일반
680862024-03-26마늘깐마늘 대서20키로115000경상남도 창녕군일반

Duplicate rows

Most frequently occurring

거래일품목품종단위등급가격출하지친환경구분(일반)# duplicates
2702024-03-20콩나물콩나물 수입3.5키로기타2700중국일반13
832024-03-08콩나물콩나물 수입3.5키로기타2700중국일반12
1992024-03-15콩나물콩나물 수입3.5키로기타2700중국일반12
4852024-04-02콩나물콩나물 수입3.5키로기타2700중국일반12
1532024-03-13콩나물콩나물 수입3.5키로기타2700중국일반11
4132024-03-28콩나물콩나물 수입3.5키로기타2700중국일반11
4342024-03-29콩나물콩나물 수입3.5키로기타2700중국일반11
9532024-04-30콩나물콩나물 수입3.5키로기타2700중국일반11
402024-03-06콩나물콩나물 수입3.5키로기타2700중국일반10
652024-03-07콩나물콩나물 수입3.5키로기타2700중국일반10