Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows632
Duplicate rows (%)6.3%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text1
Categorical2
Numeric2

Dataset

Description인천광역시 남촌농산물도매시장 월간 경락가격에 대한 데이터로 품목, 등급, 단량, 단위, 평균가등을 볼 수 있습니다.
Author인천광역시
URLhttps://data.incheon.go.kr/findData/publicDataDetail?dataId=15051664&srcSe=7661IVAWM27C61E190

Alerts

단위 has constant value ""Constant
Dataset has 632 (6.3%) duplicate rowsDuplicates
단량 is highly overall correlated with 평균가High correlation
평균가 is highly overall correlated with 단량High correlation
등급 is highly imbalanced (54.6%)Imbalance

Reproduction

Analysis started2024-01-28 15:45:30.252272
Analysis finished2024-01-28 15:45:31.000099
Duration0.75 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

품목
Text

Distinct403
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-29T00:45:31.132342image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length17
Mean length9.3083
Min length5

Characters and Unicode

Total characters93083
Distinct characters297
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique59 ?
Unique (%)0.6%

Sample

1st row로메인(통로메인)
2nd row고구마(호박고구마)
3rd row표고버섯(표고버섯(일반))
4th row파세리(향미나리)(파세리(일반))
5th row양송이(기타)
ValueCountFrequency (%)
표고버섯(생표고 284
 
2.8%
오이(백다다기 210
 
2.0%
수박(수박(일반)(꼭지절단 184
 
1.8%
기타(엽경채류(기타 181
 
1.8%
표고버섯(표고버섯(일반 156
 
1.5%
시금치(시금치(일반 154
 
1.5%
가지(가지(일반 141
 
1.4%
새송이(새송이(일반 133
 
1.3%
풋고추(청양 130
 
1.3%
밤(밤(일반 113
 
1.1%
Other values (397) 8580
83.6%
2024-01-29T00:45:31.440951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 14132
 
15.2%
) 14132
 
15.2%
3559
 
3.8%
3516
 
3.8%
3051
 
3.3%
2880
 
3.1%
2419
 
2.6%
2129
 
2.3%
1360
 
1.5%
1307
 
1.4%
Other values (287) 44598
47.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64407
69.2%
Open Punctuation 14132
 
15.2%
Close Punctuation 14132
 
15.2%
Space Separator 266
 
0.3%
Other Punctuation 125
 
0.1%
Decimal Number 21
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3559
 
5.5%
3516
 
5.5%
3051
 
4.7%
2880
 
4.5%
2419
 
3.8%
2129
 
3.3%
1360
 
2.1%
1307
 
2.0%
1258
 
2.0%
1108
 
1.7%
Other values (281) 41820
64.9%
Decimal Number
ValueCountFrequency (%)
1 15
71.4%
8 6
 
28.6%
Open Punctuation
ValueCountFrequency (%)
( 14132
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14132
100.0%
Space Separator
ValueCountFrequency (%)
266
100.0%
Other Punctuation
ValueCountFrequency (%)
, 125
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64407
69.2%
Common 28676
30.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3559
 
5.5%
3516
 
5.5%
3051
 
4.7%
2880
 
4.5%
2419
 
3.8%
2129
 
3.3%
1360
 
2.1%
1307
 
2.0%
1258
 
2.0%
1108
 
1.7%
Other values (281) 41820
64.9%
Common
ValueCountFrequency (%)
( 14132
49.3%
) 14132
49.3%
266
 
0.9%
, 125
 
0.4%
1 15
 
0.1%
8 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64407
69.2%
ASCII 28676
30.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 14132
49.3%
) 14132
49.3%
266
 
0.9%
, 125
 
0.4%
1 15
 
0.1%
8 6
 
< 0.1%
Hangul
ValueCountFrequency (%)
3559
 
5.5%
3516
 
5.5%
3051
 
4.7%
2880
 
4.5%
2419
 
3.8%
2129
 
3.3%
1360
 
2.1%
1307
 
2.0%
1258
 
2.0%
1108
 
1.7%
Other values (281) 41820
64.9%

등급
Categorical

IMBALANCE 

Distinct10
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
특(1등
6378 
상(2등
2583 
보통(3
 
416
4등
 
201
9등(등
 
183
Other values (5)
 
239

Length

Max length17
Median length16
Mean length16.0359
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row특(1등
2nd row특(1등
3rd row특(1등
4th row특(1등
5th row특(1등

Common Values

ValueCountFrequency (%)
특(1등 6378
63.8%
상(2등 2583
25.8%
보통(3 416
 
4.2%
4등 201
 
2.0%
9등(등 183
 
1.8%
없음 81
 
0.8%
5등 49
 
0.5%
8등 44
 
0.4%
6등 40
 
0.4%
7등 25
 
0.2%

Length

2024-01-29T00:45:31.780249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:31.872115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
특(1등 6378
63.8%
상(2등 2583
25.8%
보통(3 416
 
4.2%
4등 201
 
2.0%
9등(등 183
 
1.8%
없음 81
 
0.8%
5등 49
 
0.5%
8등 44
 
0.4%
6등 40
 
0.4%
7등 25
 
0.2%

단량
Real number (ℝ)

HIGH CORRELATION 

Distinct90
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.506306
Minimum0.01
Maximum102
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:32.002320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.01
5-th percentile0.5
Q13
median5
Q310
95-th percentile16
Maximum102
Range101.99
Interquartile range (IQR)7

Descriptive statistics

Standard deviation5.0374768
Coefficient of variation (CV)0.7742453
Kurtosis32.747612
Mean6.506306
Median Absolute Deviation (MAD)3
Skewness2.7626948
Sum65063.06
Variance25.376173
MonotonicityNot monotonic
2024-01-29T00:45:32.112211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10.0 2193
21.9%
4.0 2041
20.4%
2.0 1156
11.6%
5.0 779
 
7.8%
8.0 743
 
7.4%
1.0 464
 
4.6%
15.0 317
 
3.2%
3.0 314
 
3.1%
20.0 278
 
2.8%
0.5 248
 
2.5%
Other values (80) 1467
14.7%
ValueCountFrequency (%)
0.01 17
 
0.2%
0.02 1
 
< 0.1%
0.05 39
0.4%
0.06 11
 
0.1%
0.1 20
 
0.2%
0.12 6
 
0.1%
0.15 3
 
< 0.1%
0.16 10
 
0.1%
0.2 58
0.6%
0.25 4
 
< 0.1%
ValueCountFrequency (%)
102.0 1
 
< 0.1%
89.0 1
 
< 0.1%
85.0 2
 
< 0.1%
51.0 1
 
< 0.1%
40.0 3
 
< 0.1%
25.0 5
 
0.1%
21.0 2
 
< 0.1%
20.0 278
2.8%
19.0 1
 
< 0.1%
18.0 89
 
0.9%

단위
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
kg
10000 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkg
2nd rowkg
3rd rowkg
4th rowkg
5th rowkg

Common Values

ValueCountFrequency (%)
kg 10000
100.0%

Length

2024-01-29T00:45:32.205021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-29T00:45:32.271622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
kg 10000
100.0%

평균가
Real number (ℝ)

HIGH CORRELATION 

Distinct4512
Distinct (%)45.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18319.487
Minimum100
Maximum1062500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-01-29T00:45:32.351532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile1500
Q16000
median12000
Q322228.75
95-th percentile55000
Maximum1062500
Range1062400
Interquartile range (IQR)16228.75

Descriptive statistics

Standard deviation25682.765
Coefficient of variation (CV)1.4019369
Kurtosis340.52663
Mean18319.487
Median Absolute Deviation (MAD)7400
Skewness11.806945
Sum1.8319487 × 108
Variance6.5960441 × 108
MonotonicityNot monotonic
2024-01-29T00:45:32.467108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000 227
 
2.3%
4000 182
 
1.8%
8000 173
 
1.7%
15000 163
 
1.6%
13000 154
 
1.5%
3000 154
 
1.5%
5000 145
 
1.5%
7000 142
 
1.4%
6000 140
 
1.4%
12000 140
 
1.4%
Other values (4502) 8380
83.8%
ValueCountFrequency (%)
100 1
 
< 0.1%
150 1
 
< 0.1%
200 5
 
0.1%
250 1
 
< 0.1%
276 1
 
< 0.1%
300 26
0.3%
336 1
 
< 0.1%
350 10
 
0.1%
400 14
0.1%
424 1
 
< 0.1%
ValueCountFrequency (%)
1062500 1
 
< 0.1%
604200 1
 
< 0.1%
531200 1
 
< 0.1%
406200 3
 
< 0.1%
240000 5
0.1%
215000 1
 
< 0.1%
210000 1
 
< 0.1%
200000 4
< 0.1%
195000 9
0.1%
180000 2
 
< 0.1%

Interactions

2024-01-29T00:45:30.721412image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:30.576721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:30.799435image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-01-29T00:45:30.646680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-01-29T00:45:32.531708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등급단량평균가
등급1.0000.0340.000
단량0.0341.0000.965
평균가0.0000.9651.000
2024-01-29T00:45:32.600766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
단량평균가등급
단량1.0000.5820.017
평균가0.5821.0000.000
등급0.0170.0001.000

Missing values

2024-01-29T00:45:30.891779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-29T00:45:30.965489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

품목등급단량단위평균가
4952로메인(통로메인)특(1등2.0kg4000
1560고구마(호박고구마)특(1등10.0kg9760
14197표고버섯(표고버섯(일반))특(1등16.0kg68000
12885파세리(향미나리)(파세리(일반))특(1등4.0kg35000
9699양송이(기타)특(1등2.0kg12458
7084브로코리(녹색꽃양배추)(브로코리(일반))특(1등8.0kg12301
8022새송이(새송이(일반))상(2등2.0kg3988
10960오이(취청)특(1등10.0kg14628
13011파프리카(빨강파프리카)특(1등5.0kg18000
953강낭콩(강낭콩(일반))상(2등5.0kg13750
품목등급단량단위평균가
5381멜론(기타)상(2등8.0kg10000
15245풋고추(청초(일반))특(1등15.0kg20000
13398포도(마스캇베리에이)상(2등3.0kg10945
14850풋고추(오이맛고추)상(2등4.0kg8500
14737풋고추(애기초)특(1등6.0kg26000
15358피망(단고추)(청피망)상(2등10.0kg25111
4997마(기타)특(1등10.0kg50000
3374느타리버섯(느타리버섯(일반))상(2등2.0kg7121
8434속새(속새(일반))특(1등4.0kg20000
14560풋고추(롱그린)특(1등10.0kg59800

Duplicate rows

Most frequently occurring

품목등급단량단위평균가# duplicates
218미역(줄기미역)특(1등7.5kg1100018
69곡물제조(두부)특(1등7.0kg750017
183무순(무순(일반))특(1등0.05kg30017
352어묵,어분,어비(기타)특(1등3.0kg825017
72곡물제조(순두부)특(1등16.0kg1780016
65곡물제조(두부)특(1등0.5kg123015
309숙주나물(숙주나물(일반))특(1등3.5kg450015
41고구마순(생고구마순)특(1등2.0kg400014
101꼬시래기(꼬시래기(일반))특(1등8.0kg1050014
191무청(건무청)특(1등10.0kg2000014