Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows629
Duplicate rows (%)6.3%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 629 (6.3%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (54.3%)Imbalance
평균가 is highly skewed (γ1 = 99.50110846)Skewed

Reproduction

Analysis started2024-04-21 10:02:24.110557
Analysis finished2024-04-21 10:02:26.218288
Duration2.11 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct405
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-04-21T19:02:26.782571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.2901
Min length5

Characters and Unicode

Total characters92901
Distinct characters295
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique51 ?
Unique (%)0.5%

Sample

1st row새싹(새싹(일반))
2nd row용과(용과(일반))
3rd row기타(엽경채류(기타))
4th row상추(적포기)
5th row도라지(도라지(일반))
ValueCountFrequency (%)
표고버섯(생표고 276
 
2.7%
오이(백다다기 220
 
2.1%
기타(엽경채류(기타 187
 
1.8%
수박(수박(일반)(꼭지절단 178
 
1.7%
표고버섯(표고버섯(일반 148
 
1.4%
가지(가지(일반 146
 
1.4%
새송이(새송이(일반 131
 
1.3%
풋고추(청양 130
 
1.3%
시금치(시금치(일반 128
 
1.2%
호박(애호박 113
 
1.1%
Other values (399) 8618
83.9%
2024-04-21T19:02:27.986380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14063
 
15.1%
) 14063
 
15.1%
3516
 
3.8%
3465
 
3.7%
2978
 
3.2%
2883
 
3.1%
2465
 
2.7%
2146
 
2.3%
1350
 
1.5%
1294
 
1.4%
Other values (285) 44678
48.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64343
69.3%
Open Punctuation 14063
 
15.1%
Close Punctuation 14063
 
15.1%
Space Separator 275
 
0.3%
Other Punctuation 136
 
0.1%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3516
 
5.5%
3465
 
5.4%
2978
 
4.6%
2883
 
4.5%
2465
 
3.8%
2146
 
3.3%
1350
 
2.1%
1294
 
2.0%
1254
 
1.9%
1085
 
1.7%
Other values (279) 41907
65.1%
Decimal Number
ValueCountFrequency (%)
1 16
76.2%
8 5
 
23.8%
Open Punctuation
ValueCountFrequency (%)
( 14063
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14063
100.0%
Space Separator
ValueCountFrequency (%)
275
100.0%
Other Punctuation
ValueCountFrequency (%)
, 136
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64343
69.3%
Common 28558
30.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3516
 
5.5%
3465
 
5.4%
2978
 
4.6%
2883
 
4.5%
2465
 
3.8%
2146
 
3.3%
1350
 
2.1%
1294
 
2.0%
1254
 
1.9%
1085
 
1.7%
Other values (279) 41907
65.1%
Common
ValueCountFrequency (%)
( 14063
49.2%
) 14063
49.2%
275
 
1.0%
, 136
 
0.5%
1 16
 
0.1%
8 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64343
69.3%
ASCII 28558
30.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14063
49.2%
) 14063
49.2%
275
 
1.0%
, 136
 
0.5%
1 16
 
0.1%
8 5
 
< 0.1%
Hangul
ValueCountFrequency (%)
3516
 
5.5%
3465
 
5.4%
2978
 
4.6%
2883
 
4.5%
2465
 
3.8%
2146
 
3.3%
1350
 
2.1%
1294
 
2.0%
1254
 
1.9%
1085
 
1.7%
Other values (279) 41907
65.1%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6379 
상(2등
2559 
보통(3
 
432
4등
 
198
9등(등
 
178
Other values (5)
 
254

Length

Max length17
Median length16
Mean length16.0352
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row특(1등
3rd row특(1등
4th row특(1등
5th row상(2등

Common Values

ValueCountFrequency (%)
특(1등 6379
63.8%
상(2등 2559
25.6%
보통(3 432
 
4.3%
4등 198
 
2.0%
9등(등 178
 
1.8%
없음 100
 
1.0%
8등 46
 
0.5%
5등 46
 
0.5%
6등 37
 
0.4%
7등 25
 
0.2%

Length

2024-04-21T19:02:28.396768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T19:02:28.746660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6379
63.8%
상(2등 2559
25.6%
보통(3 432
 
4.3%
4등 198
 
2.0%
9등(등 178
 
1.8%
없음 100
 
1.0%
8등 46
 
0.5%
5등 46
 
0.5%
6등 37
 
0.4%
7등 25
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.427597
Minimum0.01
Maximum85
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T19:02:29.161771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile15
Maximum85
Range84.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.8207514
Coefficient of variation (CV)0.75000835
Kurtosis15.256558
Mean6.427597
Median Absolute Deviation (MAD)3
Skewness1.8260225
Sum64275.97
Variance23.239644
MonotonicityNot monotonic
2024-04-21T19:02:29.589667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2113
21.1%
4.0 2080
20.8%
2.0 1136
11.4%
5.0 827
 
8.3%
8.0 771
 
7.7%
1.0 479
 
4.8%
15.0 309
 
3.1%
3.0 304
 
3.0%
20.0 261
 
2.6%
0.5 232
 
2.3%
Other values (76) 1488
14.9%
ValueCountFrequency (%)
0.01 14
 
0.1%
0.02 1
 
< 0.1%
0.05 35
0.4%
0.06 9
 
0.1%
0.1 18
 
0.2%
0.12 5
 
0.1%
0.15 4
 
< 0.1%
0.16 11
 
0.1%
0.2 70
0.7%
0.25 4
 
< 0.1%
ValueCountFrequency (%)
85.0 2
 
< 0.1%
51.0 1
 
< 0.1%
40.0 1
 
< 0.1%
34.0 1
 
< 0.1%
25.0 4
 
< 0.1%
21.0 2
 
< 0.1%
20.0 261
2.6%
18.0 83
 
0.8%
17.5 1
 
< 0.1%
17.0 51
 
0.5%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-04-21T19:02:29.988596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-21T19:02:30.273372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct4485
Distinct (%)44.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21956.965
Minimum100
Maximum40008000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-04-21T19:02:30.603194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1500
Q15898.75
median12000
Q322000
95-th percentile54334.6
Maximum40008000
Range40007900
Interquartile range (IQR)16101.25

Descriptive statistics

Standard deviation400569.22
Coefficient of variation (CV)18.243378
Kurtosis9933.3634
Mean21956.965
Median Absolute Deviation (MAD)7304
Skewness99.501108
Sum2.1956965 × 108
Variance1.604557 × 1011
MonotonicityNot monotonic
2024-04-21T19:02:31.264609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 225
 
2.2%
4000 175
 
1.8%
8000 173
 
1.7%
12000 164
 
1.6%
13000 159
 
1.6%
5000 156
 
1.6%
6000 154
 
1.5%
2000 148
 
1.5%
3000 144
 
1.4%
15000 141
 
1.4%
Other values (4475) 8361
83.6%
ValueCountFrequency (%)
100 2
 
< 0.1%
200 5
 
0.1%
250 1
 
< 0.1%
300 23
0.2%
336 1
 
< 0.1%
350 8
 
0.1%
367 1
 
< 0.1%
400 13
0.1%
424 1
 
< 0.1%
429 1
 
< 0.1%
ValueCountFrequency (%)
40008000 1
 
< 0.1%
532057 1
 
< 0.1%
531200 1
 
< 0.1%
448177 1
 
< 0.1%
406200 1
 
< 0.1%
354200 1
 
< 0.1%
240000 7
0.1%
215000 1
 
< 0.1%
210000 2
 
< 0.1%
200000 4
< 0.1%

Interactions

2024-04-21T19:02:25.152072image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T19:02:24.636579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T19:02:25.412937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-21T19:02:24.879816image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-21T19:02:31.540473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.1080.000
단량0.1081.0000.000
평균가0.0000.0001.000
2024-04-21T19:02:31.776346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5810.054
평균가0.5811.0000.000
등급0.0540.0001.000

Missing values

2024-04-21T19:02:25.774788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-21T19:02:26.083843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
8221새싹(새싹(일반))특(1등0.05kg523
11041용과(용과(일반))특(1등5.0kg15368
2732기타(엽경채류(기타))특(1등4.0kg8000
7811상추(적포기)특(1등2.0kg10000
4312도라지(도라지(일반))상(2등10.0kg30000
15229풋고추(청초(일반))특(1등4.0kg6286
6648배추(기타)특(1등15.0kg10587
15134풋고추(청초(일반))특(1등4.0kg3000
10879오이(백다다기)4등15.0kg12333
6991부추(일반부추)특(1등0.5kg1385
품목등급단량단위평균가
4370동부(동부(일반))특(1등4.0kg21333
9329아욱(아욱(일반))특(1등0.7kg800
12382콩(호랑이콩)상(2등4.0kg23652
10081얼갈이배추(알배기)상(2등10.0kg7500
7144비름(기타)특(1등4.0kg3400
11265적채(적채(일반))특(1등10.0kg17000
3363느타리버섯(느타리버섯(일반))상(2등2.0kg17050
13487포도(샤인마스캇)특(1등4.0kg14753
6211박(여주(일반))특(1등4.0kg5000
7291사과(기꾸8)특(1등10.0kg58455

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
224미역(줄기미역)특(1등7.5kg1100019
69곡물제조(두부)특(1등3.0kg530017
191무청(건무청)특(1등10.0kg2000017
320숙주나물(숙주나물(일반))특(1등3.5kg450017
473콩나물(콩나물(일반))특(1등5.0kg750017
73곡물제조(순두부)특(1등16.0kg1780015
99꼬시래기(꼬시래기(일반))특(1등8.0kg1050015
365어묵,어분,어비(기타)특(1등15.0kg7200015
70곡물제조(두부)특(1등7.0kg750014
184무순(무순(일반))특(1등0.05kg30014