Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows658
Duplicate rows (%)6.6%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 658 (6.6%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (54.5%)Imbalance
평균가 is highly skewed (γ1 = 99.41452503)Skewed

Reproduction

Analysis started2024-01-28 15:45:36.896066
Analysis finished2024-01-28 15:45:37.666573
Duration0.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct406
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-29T00:45:37.782989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.306
Min length5

Characters and Unicode

Total characters93060
Distinct characters297
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)0.5%

Sample

1st row표고버섯(기타)
2nd row오이(백다다기)
3rd row로메인(로메인(일반))
4th row도라지(기타)
5th row호박(늙은호박)
ValueCountFrequency (%)
표고버섯(생표고 281
 
2.7%
오이(백다다기 217
 
2.1%
수박(수박(일반)(꼭지절단 187
 
1.8%
기타(엽경채류(기타 183
 
1.8%
표고버섯(표고버섯(일반 159
 
1.5%
가지(가지(일반 153
 
1.5%
시금치(시금치(일반 140
 
1.4%
풋고추(청양 131
 
1.3%
호박(애호박 121
 
1.2%
새송이(새송이(일반 117
 
1.1%
Other values (400) 8617
83.6%
2024-01-29T00:45:38.062705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14098
 
15.1%
) 14098
 
15.1%
3535
 
3.8%
3466
 
3.7%
2978
 
3.2%
2884
 
3.1%
2421
 
2.6%
2117
 
2.3%
1338
 
1.4%
1296
 
1.4%
Other values (287) 44829
48.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64413
69.2%
Open Punctuation 14098
 
15.1%
Close Punctuation 14098
 
15.1%
Space Separator 306
 
0.3%
Other Punctuation 121
 
0.1%
Decimal Number 24
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3535
 
5.5%
3466
 
5.4%
2978
 
4.6%
2884
 
4.5%
2421
 
3.8%
2117
 
3.3%
1338
 
2.1%
1296
 
2.0%
1220
 
1.9%
1114
 
1.7%
Other values (281) 42044
65.3%
Decimal Number
ValueCountFrequency (%)
1 18
75.0%
8 6
 
25.0%
Open Punctuation
ValueCountFrequency (%)
( 14098
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14098
100.0%
Space Separator
ValueCountFrequency (%)
306
100.0%
Other Punctuation
ValueCountFrequency (%)
, 121
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64413
69.2%
Common 28647
30.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3535
 
5.5%
3466
 
5.4%
2978
 
4.6%
2884
 
4.5%
2421
 
3.8%
2117
 
3.3%
1338
 
2.1%
1296
 
2.0%
1220
 
1.9%
1114
 
1.7%
Other values (281) 42044
65.3%
Common
ValueCountFrequency (%)
( 14098
49.2%
) 14098
49.2%
306
 
1.1%
, 121
 
0.4%
1 18
 
0.1%
8 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64413
69.2%
ASCII 28647
30.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14098
49.2%
) 14098
49.2%
306
 
1.1%
, 121
 
0.4%
1 18
 
0.1%
8 6
 
< 0.1%
Hangul
ValueCountFrequency (%)
3535
 
5.5%
3466
 
5.4%
2978
 
4.6%
2884
 
4.5%
2421
 
3.8%
2117
 
3.3%
1338
 
2.1%
1296
 
2.0%
1220
 
1.9%
1114
 
1.7%
Other values (281) 42044
65.3%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6374 
상(2등
2562 
보통(3
 
464
4등
 
202
9등(등
 
162
Other values (5)
 
236

Length

Max length17
Median length16
Mean length16.0352
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row4등
3rd row특(1등
4th row특(1등
5th row4등

Common Values

ValueCountFrequency (%)
특(1등 6374
63.7%
상(2등 2562
25.6%
보통(3 464
 
4.6%
4등 202
 
2.0%
9등(등 162
 
1.6%
없음 86
 
0.9%
5등 49
 
0.5%
8등 48
 
0.5%
6등 34
 
0.3%
7등 19
 
0.2%

Length

2024-01-29T00:45:38.167571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:38.259545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6374
63.7%
상(2등 2562
25.6%
보통(3 464
 
4.6%
4등 202
 
2.0%
9등(등 162
 
1.6%
없음 86
 
0.9%
5등 49
 
0.5%
8등 48
 
0.5%
6등 34
 
0.3%
7등 19
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct89
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.481754
Minimum0.01
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:38.370439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile16
Maximum102
Range101.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.0246465
Coefficient of variation (CV)0.77519857
Kurtosis32.384156
Mean6.481754
Median Absolute Deviation (MAD)3
Skewness2.705508
Sum64817.54
Variance25.247072
MonotonicityNot monotonic
2024-01-29T00:45:38.553847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2104
21.0%
4.0 1988
19.9%
2.0 1138
11.4%
5.0 804
 
8.0%
8.0 773
 
7.7%
1.0 514
 
5.1%
3.0 325
 
3.2%
15.0 314
 
3.1%
20.0 287
 
2.9%
0.5 245
 
2.5%
Other values (79) 1508
15.1%
ValueCountFrequency (%)
0.01 17
 
0.2%
0.05 37
0.4%
0.06 11
 
0.1%
0.1 18
 
0.2%
0.12 7
 
0.1%
0.15 3
 
< 0.1%
0.16 14
 
0.1%
0.2 63
0.6%
0.25 6
 
0.1%
0.3 40
0.4%
ValueCountFrequency (%)
102.0 1
 
< 0.1%
89.0 1
 
< 0.1%
85.0 2
 
< 0.1%
40.0 2
 
< 0.1%
25.0 4
 
< 0.1%
21.0 2
 
< 0.1%
20.0 287
2.9%
18.0 89
 
0.9%
17.5 1
 
< 0.1%
17.0 45
 
0.4%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-01-29T00:45:38.657030image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:38.960142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct4498
Distinct (%)45.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22170.739
Minimum100
Maximum40008000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:39.048401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1500
Q16000
median12000
Q322000
95-th percentile55000
Maximum40008000
Range40007900
Interquartile range (IQR)16000

Descriptive statistics

Standard deviation400685.42
Coefficient of variation (CV)18.072714
Kurtosis9921.634
Mean22170.739
Median Absolute Deviation (MAD)7178.5
Skewness99.414525
Sum2.2170739 × 108
Variance1.6054881 × 1011
MonotonicityNot monotonic
2024-01-29T00:45:39.169049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 208
 
2.1%
8000 189
 
1.9%
4000 178
 
1.8%
15000 169
 
1.7%
13000 159
 
1.6%
12000 157
 
1.6%
3000 148
 
1.5%
5000 143
 
1.4%
7000 142
 
1.4%
6000 141
 
1.4%
Other values (4488) 8366
83.7%
ValueCountFrequency (%)
100 2
 
< 0.1%
200 3
 
< 0.1%
250 1
 
< 0.1%
276 1
 
< 0.1%
300 26
0.3%
336 1
 
< 0.1%
350 12
0.1%
367 1
 
< 0.1%
400 17
0.2%
429 1
 
< 0.1%
ValueCountFrequency (%)
40008000 1
 
< 0.1%
1062500 1
 
< 0.1%
532057 1
 
< 0.1%
448177 1
 
< 0.1%
406200 2
 
< 0.1%
240000 3
< 0.1%
215000 1
 
< 0.1%
210000 2
 
< 0.1%
200000 7
0.1%
195000 7
0.1%

Interactions

2024-01-29T00:45:37.385615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:37.239053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:37.458251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:37.304209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-29T00:45:39.257037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.0670.000
단량0.0671.0000.000
평균가0.0000.0001.000
2024-01-29T00:45:39.325760image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5800.035
평균가0.5801.0000.000
등급0.0350.0001.000

Missing values

2024-01-29T00:45:37.561974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-29T00:45:37.633430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
13668표고버섯(기타)특(1등4.0kg23286
10606오이(백다다기)4등15.0kg14333
4929로메인(로메인(일반))특(1등2.0kg10000
4263도라지(기타)특(1등1.0kg30755
15569호박(늙은호박)4등5.0kg1000
7792상추(적포기)특(1등4.0kg24043
3756당근(기타)특(1등10.0kg6021
14602풋고추(맛광)특(1등10.0kg46000
13546포도(샤인마스캇)상(2등2.0kg5690
2424근대(근대(일반))특(1등4.0kg6000
품목등급단량단위평균가
13533포도(샤인마스캇)특(1등4.0kg17449
13128파프리카(파프리카(일반))상(2등5.0kg19556
8281생강(기타)특(1등10.0kg95000
12510콩나물(콩나물(일반))특(1등5.0kg7500
4609떫은감(대봉시)특(1등20.0kg23000
5207마늘(마늘쫑(수입))특(1등1.0kg2500
13712표고버섯(생표고)6등16.0kg52000
3486단감(기타)상(2등10.0kg15800
2402근대(근대(일반))상(2등4.0kg4500
12750토마토(토마토(일반))특(1등10.0kg48719

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
73곡물제조(두부)특(1등3.0kg530018
81곡물제조(연두부)특(1등12.0kg1780017
112꼬시래기(꼬시래기(일반))특(1등8.0kg1050017
332숙주나물(숙주나물(일반))특(1등3.5kg450017
208무순(무순(일반))특(1등0.05kg30016
69곡물제조(두부)특(1등0.5kg123015
242미역(줄기미역)특(1등7.5kg1100015
372어묵,어분,어비(기타)특(1등3.0kg825014
47고구마순(생고구마순)특(1등2.0kg400013
188마늘(깐마늘 남도)특(1등0.2kg100013