Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows735
Duplicate rows (%)7.3%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://www.data.go.kr/data/15051664/fileData.do

Alerts

Dataset has 735 (7.3%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
단위 is highly imbalanced (99.9%)Imbalance

Reproduction

Analysis started2024-04-21 01:18:50.272605
Analysis finished2024-04-21 01:18:52.401096
Duration2.13 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct332
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T10:18:52.569065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length18
Mean length9.2846
Min length5

Characters and Unicode

Total characters92846
Distinct characters281
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.4%

Sample

1st row전분 및 사료제조(청포묵)
2nd row달래(달래(일반))
3rd row양파(양파(일반))
4th row파프리카(파프리카(일반))
5th row마늘(풋마늘)
ValueCountFrequency (%)
딸기(설향 507
 
4.9%
딸기(기타 307
 
3.0%
표고버섯(생표고 236
 
2.3%
시금치(시금치(일반 211
 
2.0%
새송이(새송이(일반 165
 
1.6%
딸기(금실 146
 
1.4%
냉이(일반냉이 143
 
1.4%
곡물제조(두부 137
 
1.3%
수박(수박(일반)(꼭지절단 133
 
1.3%
오이(백다다기 126
 
1.2%
Other values (326) 8223
79.6%
2024-04-21T10:18:52.920115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14099
 
15.2%
) 14099
 
15.2%
3472
 
3.7%
3466
 
3.7%
3250
 
3.5%
2148
 
2.3%
2043
 
2.2%
1976
 
2.1%
1555
 
1.7%
1409
 
1.5%
Other values (271) 45329
48.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64151
69.1%
Open Punctuation 14099
 
15.2%
Close Punctuation 14099
 
15.2%
Space Separator 334
 
0.4%
Other Punctuation 142
 
0.2%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3472
 
5.4%
3466
 
5.4%
3250
 
5.1%
2148
 
3.3%
2043
 
3.2%
1976
 
3.1%
1555
 
2.4%
1409
 
2.2%
1294
 
2.0%
1203
 
1.9%
Other values (266) 42335
66.0%
Open Punctuation
ValueCountFrequency (%)
( 14099
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14099
100.0%
Space Separator
ValueCountFrequency (%)
334
100.0%
Other Punctuation
ValueCountFrequency (%)
, 142
100.0%
Decimal Number
ValueCountFrequency (%)
1 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64151
69.1%
Common 28695
30.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3472
 
5.4%
3466
 
5.4%
3250
 
5.1%
2148
 
3.3%
2043
 
3.2%
1976
 
3.1%
1555
 
2.4%
1409
 
2.2%
1294
 
2.0%
1203
 
1.9%
Other values (266) 42335
66.0%
Common
ValueCountFrequency (%)
( 14099
49.1%
) 14099
49.1%
334
 
1.2%
, 142
 
0.5%
1 21
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64151
69.1%
ASCII 28695
30.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14099
49.1%
) 14099
49.1%
334
 
1.2%
, 142
 
0.5%
1 21
 
0.1%
Hangul
ValueCountFrequency (%)
3472
 
5.4%
3466
 
5.4%
3250
 
5.1%
2148
 
3.3%
2043
 
3.2%
1976
 
3.1%
1555
 
2.4%
1409
 
2.2%
1294
 
2.0%
1203
 
1.9%
Other values (266) 42335
66.0%

등급
Categorical

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6100 
상(2등
2405 
보통(3
671 
4등
 
300
9등(등
 
197
Other values (5)
 
327

Length

Max length17
Median length16
Mean length16.046
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row상(2등
3rd row상(2등
4th row상(2등
5th row특(1등

Common Values

ValueCountFrequency (%)
특(1등 6100
61.0%
상(2등 2405
 
24.1%
보통(3 671
 
6.7%
4등 300
 
3.0%
9등(등 197
 
2.0%
없음 167
 
1.7%
5등 73
 
0.7%
6등 46
 
0.5%
7등 21
 
0.2%
8등 20
 
0.2%

Length

2024-04-21T10:18:53.045231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T10:18:53.167217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6100
61.0%
상(2등 2405
 
24.1%
보통(3 671
 
6.7%
4등 300
 
3.0%
9등(등 197
 
2.0%
없음 167
 
1.7%
5등 73
 
0.7%
6등 46
 
0.5%
7등 21
 
0.2%
8등 20
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct78
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.598883
Minimum0.01
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T10:18:53.293600image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q12
median4
Q310
95-th percentile17
Maximum102
Range101.99
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.199428
Coefficient of variation (CV)0.92865453
Kurtosis15.017707
Mean5.598883
Median Absolute Deviation (MAD)3
Skewness2.0717996
Sum55988.83
Variance27.034052
MonotonicityNot monotonic
2024-04-21T10:18:53.414746image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.0 1836
18.4%
10.0 1597
16.0%
2.0 1231
12.3%
1.0 969
9.7%
0.5 831
8.3%
8.0 631
 
6.3%
5.0 502
 
5.0%
20.0 323
 
3.2%
3.0 260
 
2.6%
9.0 167
 
1.7%
Other values (68) 1653
16.5%
ValueCountFrequency (%)
0.01 22
 
0.2%
0.05 40
0.4%
0.1 14
 
0.1%
0.12 1
 
< 0.1%
0.15 22
 
0.2%
0.16 15
 
0.1%
0.2 87
0.9%
0.25 32
 
0.3%
0.3 54
0.5%
0.35 8
 
0.1%
ValueCountFrequency (%)
102.0 1
 
< 0.1%
51.0 1
 
< 0.1%
50.0 1
 
< 0.1%
40.0 6
 
0.1%
30.0 17
 
0.2%
25.0 4
 
< 0.1%
22.0 1
 
< 0.1%
20.0 323
3.2%
19.0 6
 
0.1%
18.0 86
 
0.9%

단위
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
9999 
g
 
1

Length

Max length2
Median length2
Mean length1.9999
Min length1

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 9999
> 99.9%
g 1
 
< 0.1%

Length

2024-04-21T10:18:53.526180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T10:18:53.605776image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 9999
> 99.9%
g 1
 
< 0.1%

평균가
Real number (ℝ)

HIGH CORRELATION 

Distinct4802
Distinct (%)48.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23101.864
Minimum200
Maximum1062500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T10:18:53.697407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum200
5-th percentile2000
Q15946.75
median14245
Q329200
95-th percentile70277.15
Maximum1062500
Range1062300
Interquartile range (IQR)23253.25

Descriptive statistics

Standard deviation31298.011
Coefficient of variation (CV)1.3547829
Kurtosis183.29076
Mean23101.864
Median Absolute Deviation (MAD)9755
Skewness8.2386325
Sum2.3101864 × 108
Variance9.7956548 × 108
MonotonicityNot monotonic
2024-04-21T10:18:53.821256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4000 150
 
1.5%
10000 148
 
1.5%
2000 133
 
1.3%
8000 130
 
1.3%
15000 129
 
1.3%
3000 126
 
1.3%
13000 119
 
1.2%
20000 105
 
1.1%
7000 104
 
1.0%
5000 103
 
1.0%
Other values (4792) 8753
87.5%
ValueCountFrequency (%)
200 1
 
< 0.1%
300 8
0.1%
334 1
 
< 0.1%
335 1
 
< 0.1%
337 1
 
< 0.1%
338 1
 
< 0.1%
342 2
 
< 0.1%
343 1
 
< 0.1%
345 1
 
< 0.1%
347 1
 
< 0.1%
ValueCountFrequency (%)
1062500 1
< 0.1%
833625 1
< 0.1%
531200 1
< 0.1%
274500 1
< 0.1%
272000 1
< 0.1%
260000 1
< 0.1%
256000 1
< 0.1%
255000 2
< 0.1%
246500 1
< 0.1%
242000 1
< 0.1%

Interactions

2024-04-21T10:18:52.103795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:18:51.896204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:18:52.183810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T10:18:52.026270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T10:18:53.908961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량단위평균가
등급1.0000.0510.0920.091
단량0.0511.0000.0000.902
단위0.0920.0001.0000.000
평균가0.0910.9020.0001.000
2024-04-21T10:18:53.987761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단위
등급1.0000.071
단위0.0711.000
2024-04-21T10:18:54.061774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급단위
단량1.0000.7490.0270.000
평균가0.7491.0000.0480.000
등급0.0270.0481.0000.071
단위0.0000.0000.0711.000

Missing values

2024-04-21T10:18:52.275861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T10:18:52.355537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
10291전분 및 사료제조(청포묵)특(1등4.0kg7778
2441달래(달래(일반))상(2등4.0kg28000
8696양파(양파(일반))상(2등15.0kg23500
11914파프리카(파프리카(일반))상(2등5.0kg35909
5002마늘(풋마늘)특(1등20.0kg52500
6882사과(후지)상(2등10.0kg22500
4761마늘(깐마늘 남도)보통(310.0kg62000
5947미역(줄기미역)특(1등7.5kg11000
11327콩(기타)특(1등4.0kg12500
10682참나물(참나물(일반))특(1등4.0kg16338
품목등급단량단위평균가
694고구마(밤고구마)보통(310.0kg26317
13427호박(쥬키니호박)특(1등10.0kg22948
4069딸기(설향)4등1.0kg3168
6080방울양배추(스프로스)(방울양배추(일반))특(1등0.5kg1700
11749파인애플(파인애플(수입))특(1등11.5kg27000
2563당근(기타)특(1등10.0kg7167
1797깻잎(깻잎(일반))상(2등2.0kg18511
5311머위대(머위잎)특(1등3.0kg26000
12701풋고추(롱그린)상(2등12.0kg72000
3039딸기(금실)상(2등0.5kg5226

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
51곡물제조(순두부)특(1등16.0kg1780022
324미역(줄기미역)특(1등5.5kg800022
39곡물제조(두부)특(1등0.5kg123021
43곡물제조(두부)특(1등3.0kg530020
640콩나물(콩나물(일반))특(1등5.0kg750020
83꼬시래기(꼬시래기(일반))특(1등8.0kg1350019
53곡물제조(연두부)특(1등12.0kg1780017
47곡물제조(두부)특(1등7.0kg750016
302무순(무순(일반))특(1등0.15kg80016
328미역(줄기미역)특(1등7.5kg1100016