Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows638
Duplicate rows (%)6.4%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 638 (6.4%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (55.0%)Imbalance

Reproduction

Analysis started2024-01-28 15:45:23.489528
Analysis finished2024-01-28 15:45:24.493061
Duration1 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct409
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-29T00:45:24.630173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.3046
Min length5

Characters and Unicode

Total characters93046
Distinct characters299
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64 ?
Unique (%)0.6%

Sample

1st row얼갈이배추(얼갈이배추)
2nd row고구마(호박고구마)
3rd row홍고추(홍청양)
4th row토마토(토마토(일반))
5th row멜론(네트계)
ValueCountFrequency (%)
표고버섯(생표고 254
 
2.5%
수박(수박(일반)(꼭지절단 200
 
1.9%
오이(백다다기 200
 
1.9%
기타(엽경채류(기타 177
 
1.7%
표고버섯(표고버섯(일반 158
 
1.5%
풋고추(청양 143
 
1.4%
가지(가지(일반 141
 
1.4%
시금치(시금치(일반 135
 
1.3%
새송이(새송이(일반 130
 
1.3%
호박(애호박 107
 
1.0%
Other values (403) 8644
84.0%
2024-01-29T00:45:24.919520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14092
 
15.1%
) 14092
 
15.1%
3523
 
3.8%
3467
 
3.7%
2929
 
3.1%
2922
 
3.1%
2415
 
2.6%
2131
 
2.3%
1342
 
1.4%
1293
 
1.4%
Other values (289) 44840
48.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64418
69.2%
Open Punctuation 14092
 
15.1%
Close Punctuation 14092
 
15.1%
Space Separator 289
 
0.3%
Other Punctuation 129
 
0.1%
Decimal Number 26
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3523
 
5.5%
3467
 
5.4%
2929
 
4.5%
2922
 
4.5%
2415
 
3.7%
2131
 
3.3%
1342
 
2.1%
1293
 
2.0%
1275
 
2.0%
1132
 
1.8%
Other values (283) 41989
65.2%
Decimal Number
ValueCountFrequency (%)
1 18
69.2%
8 8
30.8%
Open Punctuation
ValueCountFrequency (%)
( 14092
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14092
100.0%
Space Separator
ValueCountFrequency (%)
289
100.0%
Other Punctuation
ValueCountFrequency (%)
, 129
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64418
69.2%
Common 28628
30.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3523
 
5.5%
3467
 
5.4%
2929
 
4.5%
2922
 
4.5%
2415
 
3.7%
2131
 
3.3%
1342
 
2.1%
1293
 
2.0%
1275
 
2.0%
1132
 
1.8%
Other values (283) 41989
65.2%
Common
ValueCountFrequency (%)
( 14092
49.2%
) 14092
49.2%
289
 
1.0%
, 129
 
0.5%
1 18
 
0.1%
8 8
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64418
69.2%
ASCII 28628
30.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14092
49.2%
) 14092
49.2%
289
 
1.0%
, 129
 
0.5%
1 18
 
0.1%
8 8
 
< 0.1%
Hangul
ValueCountFrequency (%)
3523
 
5.5%
3467
 
5.4%
2929
 
4.5%
2922
 
4.5%
2415
 
3.7%
2131
 
3.3%
1342
 
2.1%
1293
 
2.0%
1275
 
2.0%
1132
 
1.8%
Other values (283) 41989
65.2%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6417 
상(2등
2555 
보통(3
 
426
4등
 
194
9등(등
 
179
Other values (5)
 
229

Length

Max length17
Median length16
Mean length16.0334
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row상(2등
2nd row특(1등
3rd row특(1등
4th row4등
5th row상(2등

Common Values

ValueCountFrequency (%)
특(1등 6417
64.2%
상(2등 2555
 
25.6%
보통(3 426
 
4.3%
4등 194
 
1.9%
9등(등 179
 
1.8%
없음 89
 
0.9%
5등 44
 
0.4%
6등 41
 
0.4%
8등 38
 
0.4%
7등 17
 
0.2%

Length

2024-01-29T00:45:25.022545image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:25.110507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6417
64.2%
상(2등 2555
 
25.6%
보통(3 426
 
4.3%
4등 194
 
1.9%
9등(등 179
 
1.8%
없음 89
 
0.9%
5등 44
 
0.4%
6등 41
 
0.4%
8등 38
 
0.4%
7등 17
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.477985
Minimum0.01
Maximum136
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:25.227303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile15
Maximum136
Range135.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.0976224
Coefficient of variation (CV)0.78691481
Kurtosis66.661551
Mean6.477985
Median Absolute Deviation (MAD)3
Skewness3.8820682
Sum64779.85
Variance25.985754
MonotonicityNot monotonic
2024-01-29T00:45:25.342662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2153
21.5%
4.0 2027
20.3%
2.0 1122
11.2%
5.0 841
 
8.4%
8.0 734
 
7.3%
1.0 481
 
4.8%
3.0 319
 
3.2%
15.0 292
 
2.9%
20.0 278
 
2.8%
0.5 241
 
2.4%
Other values (76) 1512
15.1%
ValueCountFrequency (%)
0.01 18
 
0.2%
0.05 35
0.4%
0.06 10
 
0.1%
0.1 12
 
0.1%
0.12 6
 
0.1%
0.15 6
 
0.1%
0.16 13
 
0.1%
0.2 64
0.6%
0.25 5
 
0.1%
0.3 38
0.4%
ValueCountFrequency (%)
136.0 1
 
< 0.1%
102.0 1
 
< 0.1%
89.0 1
 
< 0.1%
85.0 1
 
< 0.1%
40.0 3
 
< 0.1%
34.0 1
 
< 0.1%
25.0 4
 
< 0.1%
21.0 2
 
< 0.1%
20.0 278
2.8%
18.0 76
 
0.8%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-01-29T00:45:25.441976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:25.506396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION 

Distinct4459
Distinct (%)44.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18530.967
Minimum100
Maximum1417000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:25.586657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1500
Q15777.25
median12000
Q322000
95-th percentile55129
Maximum1417000
Range1416900
Interquartile range (IQR)16222.75

Descriptive statistics

Standard deviation30155.313
Coefficient of variation (CV)1.627293
Kurtosis647.26104
Mean18530.967
Median Absolute Deviation (MAD)7415
Skewness17.85934
Sum1.8530967 × 108
Variance9.0934291 × 108
MonotonicityNot monotonic
2024-01-29T00:45:25.692779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 219
 
2.2%
4000 200
 
2.0%
8000 187
 
1.9%
15000 164
 
1.6%
13000 160
 
1.6%
5000 158
 
1.6%
12000 153
 
1.5%
3000 152
 
1.5%
2000 143
 
1.4%
6000 138
 
1.4%
Other values (4449) 8326
83.3%
ValueCountFrequency (%)
100 1
 
< 0.1%
150 1
 
< 0.1%
200 3
 
< 0.1%
300 18
0.2%
336 1
 
< 0.1%
350 12
 
0.1%
400 16
0.2%
429 1
 
< 0.1%
480 1
 
< 0.1%
500 32
0.3%
ValueCountFrequency (%)
1417000 1
 
< 0.1%
1062500 1
 
< 0.1%
604200 1
 
< 0.1%
532057 1
 
< 0.1%
448177 1
 
< 0.1%
406200 3
< 0.1%
354200 1
 
< 0.1%
240000 6
0.1%
215000 2
 
< 0.1%
210000 1
 
< 0.1%

Interactions

2024-01-29T00:45:23.967847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:23.818377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:24.043205image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:23.890377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-29T00:45:25.763418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.0730.000
단량0.0731.0000.889
평균가0.0000.8891.000
2024-01-29T00:45:25.833784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5860.038
평균가0.5861.0000.000
등급0.0380.0001.000

Missing values

2024-01-29T00:45:24.138342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-29T00:45:24.458480image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
10214얼갈이배추(얼갈이배추)상(2등4.0kg3202
1467고구마(호박고구마)특(1등10.0kg14958
16096홍고추(홍청양)특(1등4.0kg15000
12749토마토(토마토(일반))4등5.0kg7000
5422멜론(네트계)상(2등8.0kg15925
2787기타식품(기타)상(2등8.0kg12000
2019고추잎(생고추잎)특(1등4.0kg14889
10121얼갈이배추(얼갈이배추)상(2등4.0kg2355
1829고사리(기타)특(1등10.0kg24000
9822양파(양파(일반))특(1등10.0kg11000
품목등급단량단위평균가
11604쪽파(깐쪽파)특(1등10.0kg85000
12229케일(쌈케일)특(1등2.0kg12615
8659수박(수박(일반)(꼭지절단))특(1등9.0kg7741
3533단감(대안)상(2등10.0kg17561
15995홍고추(홍고추(일반))특(1등20.0kg42000
15941호박(풋호박)상(2등12.0kg8000
15716호박(애호박)9등(등8.0kg11500
11357전분 및 사료제조(도토리묵)특(1등6.0kg9875
6969부추(일반부추)특(1등0.5kg1534
8299생강(생강(일반))특(1등5.0kg33000

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
209무청(건무청)특(1등10.0kg2000020
68곡물제조(두부)특(1등3.0kg530017
320숙주나물(숙주나물(일반))특(1등3.5kg450017
42고구마순(생고구마순)특(1등2.0kg400016
100꼬시래기(꼬시래기(일반))특(1등8.0kg1050016
233미역(줄기미역)특(1등7.5kg1100016
69곡물제조(두부)특(1등7.0kg750015
230미역(줄기미역)특(1등5.5kg800015
72곡물제조(순두부)특(1등16.0kg1780014
73곡물제조(연두부)특(1등12.0kg1780014