Overview

Dataset statistics

Number of variables12
Number of observations2388
Missing cells14328
Missing cells (%)50.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory242.7 KiB
Average record size in memory104.1 B

Variable types

Categorical3
Text2
Numeric1
Unsupported6

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

용도 is highly imbalanced (91.1%)Imbalance
Unnamed: 6 has 2388 (100.0%) missing valuesMissing
Unnamed: 7 has 2388 (100.0%) missing valuesMissing
Unnamed: 8 has 2388 (100.0%) missing valuesMissing
Unnamed: 9 has 2388 (100.0%) missing valuesMissing
Unnamed: 10 has 2388 (100.0%) missing valuesMissing
Unnamed: 11 has 2388 (100.0%) missing valuesMissing
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-23 07:43:08.785148
Analysis finished2024-03-23 07:43:10.396607
Duration1.61 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2011
598 
2013
575 
2010
554 
2012
552 
2009
109 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2009
2nd row2009
3rd row2009
4th row2009
5th row2009

Common Values

ValueCountFrequency (%)
2011 598
25.0%
2013 575
24.1%
2010 554
23.2%
2012 552
23.1%
2009 109
 
4.6%

Length

2024-03-23T07:43:10.725724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:11.025052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2011 598
25.0%
2013 575
24.1%
2010 554
23.2%
2012 552
23.1%
2009 109
 
4.6%

시도
Text

Distinct90
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2024-03-23T07:43:11.543821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1235343
Min length7

Characters and Unicode

Total characters19399
Distinct characters87
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도 고성군
2nd row강원도 고성군
3rd row강원도 고성군
4th row강원도 고성군
5th row강원도 인제군
ValueCountFrequency (%)
전라남도 449
 
9.1%
경상북도 394
 
8.0%
전라북도 256
 
5.2%
경상남도 252
 
5.1%
경기도 245
 
5.0%
강원도 221
 
4.5%
충청북도 217
 
4.4%
충청남도 205
 
4.2%
북구 52
 
1.1%
논산시 43
 
0.9%
Other values (97) 2600
52.7%
2024-03-23T07:43:12.512086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2546
 
13.1%
2239
 
11.5%
1274
 
6.6%
1151
 
5.9%
995
 
5.1%
968
 
5.0%
919
 
4.7%
757
 
3.9%
705
 
3.6%
672
 
3.5%
Other values (77) 7173
37.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 16853
86.9%
Space Separator 2546
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%
Space Separator
ValueCountFrequency (%)
2546
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 16853
86.9%
Common 2546
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%
Common
ValueCountFrequency (%)
2546
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 16853
86.9%
ASCII 2546
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2546
100.0%
Hangul
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%

연산
Real number (ℝ)

Distinct9
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2009.1323
Minimum2005
Maximum2013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.1 KiB
2024-03-23T07:43:12.731384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2005
Q12008
median2009
Q32011
95-th percentile2012
Maximum2013
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.8817342
Coefficient of variation (CV)0.00093659045
Kurtosis-0.40265705
Mean2009.1323
Median Absolute Deviation (MAD)1
Skewness-0.45401308
Sum4797808
Variance3.5409234
MonotonicityNot monotonic
2024-03-23T07:43:12.977369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2009 491
20.6%
2010 447
18.7%
2011 425
17.8%
2008 386
16.2%
2012 204
8.5%
2007 168
 
7.0%
2005 140
 
5.9%
2006 117
 
4.9%
2013 10
 
0.4%
ValueCountFrequency (%)
2005 140
 
5.9%
2006 117
 
4.9%
2007 168
 
7.0%
2008 386
16.2%
2009 491
20.6%
2010 447
18.7%
2011 425
17.8%
2012 204
8.5%
2013 10
 
0.4%
ValueCountFrequency (%)
2013 10
 
0.4%
2012 204
8.5%
2011 425
17.8%
2010 447
18.7%
2009 491
20.6%
2008 386
16.2%
2007 168
 
7.0%
2006 117
 
4.9%
2005 140
 
5.9%

용도
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
정곡
2361 
대북
 
27

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 2361
98.9%
대북 27
 
1.1%

Length

2024-03-23T07:43:13.211335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:13.379811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 2361
98.9%
대북 27
 
1.1%

원산지
Categorical

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
국산
1275 
중국
455 
미국
415 
태국
232 
호주
 
5
Other values (3)
 
6

Length

Max length4
Median length2
Mean length2.001675
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row국산
2nd row미국
3rd row중국
4th row태국
5th row국산

Common Values

ValueCountFrequency (%)
국산 1275
53.4%
중국 455
 
19.1%
미국 415
 
17.4%
태국 232
 
9.7%
호주 5
 
0.2%
인도 3
 
0.1%
베트남 2
 
0.1%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:43:13.685538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:13.910340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 1275
53.4%
중국 455
 
19.1%
미국 415
 
17.4%
태국 232
 
9.7%
호주 5
 
0.2%
인도 3
 
0.1%
베트남 2
 
0.1%
파키스탄 1
 
< 0.1%
Distinct1969
Distinct (%)82.5%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2024-03-23T07:43:14.582862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.2039363
Min length3

Characters and Unicode

Total characters19591
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1808 ?
Unique (%)75.7%

Sample

1st row3,657,080
2nd row137,800
3rd row482,000
4th row15,000
5th row2,423,600
ValueCountFrequency (%)
100,000 27
 
1.1%
50,000 24
 
1.0%
200,000 20
 
0.8%
150,000 16
 
0.7%
30,000 15
 
0.6%
20,000 13
 
0.5%
120,000 13
 
0.5%
10,000 13
 
0.5%
80,000 10
 
0.4%
140,000 9
 
0.4%
Other values (1959) 2228
93.3%
2024-03-23T07:43:15.680951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5106
26.1%
, 2973
15.2%
2388
12.2%
2 1455
 
7.4%
1 1294
 
6.6%
4 1188
 
6.1%
6 1093
 
5.6%
8 975
 
5.0%
3 933
 
4.8%
5 863
 
4.4%
Other values (2) 1323
 
6.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 14230
72.6%
Other Punctuation 2973
 
15.2%
Space Separator 2388
 
12.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5106
35.9%
2 1455
 
10.2%
1 1294
 
9.1%
4 1188
 
8.3%
6 1093
 
7.7%
8 975
 
6.9%
3 933
 
6.6%
5 863
 
6.1%
9 687
 
4.8%
7 636
 
4.5%
Other Punctuation
ValueCountFrequency (%)
, 2973
100.0%
Space Separator
ValueCountFrequency (%)
2388
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 19591
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5106
26.1%
, 2973
15.2%
2388
12.2%
2 1455
 
7.4%
1 1294
 
6.6%
4 1188
 
6.1%
6 1093
 
5.6%
8 975
 
5.0%
3 933
 
4.8%
5 863
 
4.4%
Other values (2) 1323
 
6.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 19591
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5106
26.1%
, 2973
15.2%
2388
12.2%
2 1455
 
7.4%
1 1294
 
6.6%
4 1188
 
6.1%
6 1093
 
5.6%
8 975
 
5.0%
3 933
 
4.8%
5 863
 
4.4%
Other values (2) 1323
 
6.8%

Unnamed: 6
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Unnamed: 8
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Unnamed: 9
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Unnamed: 10
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Unnamed: 11
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing2388
Missing (%)100.0%
Memory size21.1 KiB

Interactions

2024-03-23T07:43:09.281304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T07:43:15.980291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신청년도시도연산용도원산지
신청년도1.0000.0000.6530.1400.134
시도0.0001.0000.0000.0000.233
연산0.6530.0001.0000.4690.525
용도0.1400.0000.4691.0000.100
원산지0.1340.2330.5250.1001.000
2024-03-23T07:43:16.151156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도신청년도원산지
용도1.0000.1710.075
신청년도0.1711.0000.082
원산지0.0750.0821.000
2024-03-23T07:43:16.336684image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연산신청년도용도원산지
연산1.0000.4750.3530.213
신청년도0.4751.0000.1710.082
용도0.3530.1711.0000.075
원산지0.2130.0820.0751.000

Missing values

2024-03-23T07:43:09.585122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T07:43:10.168951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

신청년도시도연산용도원산지검사수량(kg)Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11
02009강원도 고성군2008정곡국산3,657,080<NA><NA><NA><NA><NA><NA>
12009강원도 고성군2008정곡미국137,800<NA><NA><NA><NA><NA><NA>
22009강원도 고성군2008정곡중국482,000<NA><NA><NA><NA><NA><NA>
32009강원도 고성군2008정곡태국15,000<NA><NA><NA><NA><NA><NA>
42009강원도 인제군2008정곡국산2,423,600<NA><NA><NA><NA><NA><NA>
52009강원도 인제군2008정곡중국302,880<NA><NA><NA><NA><NA><NA>
62009강원도 인제군2008정곡태국200,000<NA><NA><NA><NA><NA><NA>
72009강원도 춘천시2008정곡국산583,440<NA><NA><NA><NA><NA><NA>
82009강원도 춘천시2008정곡중국504,000<NA><NA><NA><NA><NA><NA>
92009강원도 홍천군2007정곡중국56,760<NA><NA><NA><NA><NA><NA>
신청년도시도연산용도원산지검사수량(kg)Unnamed: 6Unnamed: 7Unnamed: 8Unnamed: 9Unnamed: 10Unnamed: 11
23782013충청북도 청주시 흥덕구2012정곡미국165,880<NA><NA><NA><NA><NA><NA>
23792013충청북도 청주시 흥덕구2012정곡중국462,880<NA><NA><NA><NA><NA><NA>
23802013충청북도 충주시2009정곡국산94,440<NA><NA><NA><NA><NA><NA>
23812013충청북도 충주시2010정곡국산56,040<NA><NA><NA><NA><NA><NA>
23822013충청북도 충주시2010정곡중국139,200<NA><NA><NA><NA><NA><NA>
23832013충청북도 충주시2011정곡중국423,560<NA><NA><NA><NA><NA><NA>
23842013충청북도 충주시2011정곡태국84,120<NA><NA><NA><NA><NA><NA>
23852013충청북도 충주시2012정곡국산2,040,600<NA><NA><NA><NA><NA><NA>
23862013충청북도 충주시2012정곡미국80,000<NA><NA><NA><NA><NA><NA>
23872013충청북도 충주시2012정곡중국165,000<NA><NA><NA><NA><NA><NA>