Overview

Dataset statistics

Number of variables7
Number of observations275
Missing cells305
Missing cells (%)15.8%
Duplicate rows3
Duplicate rows (%)1.1%
Total size in memory16.5 KiB
Average record size in memory61.5 B

Variable types

Numeric2
Boolean1
Text1
Unsupported1
Categorical2

Dataset

Description생산농가 패널의 경영규모, 재배기술, 투입, 산출, 비용 등 경영실태 조사분석관련 내부 관리시스템으로 질문번호, 답변번호, 기타여부, 답변텍스트, 이동할페이지, 이동할시작질문번호, 이동할끝질문번호를 제공합니다.
Author충청북도
URLhttps://www.data.go.kr/data/15050270/fileData.do

Alerts

Dataset has 3 (1.1%) duplicate rowsDuplicates
이동할시작질문번호 is highly overall correlated with 질문번호 and 3 other fieldsHigh correlation
기타여부 is highly overall correlated with 이동할시작질문번호 and 1 other fieldsHigh correlation
이동할끝질문번호 is highly overall correlated with 기타여부 and 1 other fieldsHigh correlation
질문번호 is highly overall correlated with 이동할시작질문번호High correlation
답변번호 is highly overall correlated with 이동할시작질문번호High correlation
기타여부 is highly imbalanced (50.3%)Imbalance
이동할시작질문번호 is highly imbalanced (94.8%)Imbalance
이동할끝질문번호 is highly imbalanced (93.9%)Imbalance
답변텍스트 has 30 (10.9%) missing valuesMissing
이동할페이지 has 275 (100.0%) missing valuesMissing
이동할페이지 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 23:48:09.859151
Analysis finished2023-12-12 23:48:10.534125
Duration0.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

질문번호
Real number (ℝ)

HIGH CORRELATION 

Distinct46
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.767273
Minimum1
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-13T08:48:10.589417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1.7
Q126.5
median42
Q349.5
95-th percentile77
Maximum81
Range80
Interquartile range (IQR)23

Descriptive statistics

Standard deviation19.831692
Coefficient of variation (CV)0.51155757
Kurtosis-0.1986765
Mean38.767273
Median Absolute Deviation (MAD)10
Skewness-0.050776399
Sum10661
Variance393.29601
MonotonicityNot monotonic
2023-12-13T08:48:10.699011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
45 15
 
5.5%
42 15
 
5.5%
1 14
 
5.1%
47 13
 
4.7%
23 12
 
4.4%
34 12
 
4.4%
32 10
 
3.6%
33 10
 
3.6%
46 10
 
3.6%
44 9
 
3.3%
Other values (36) 155
56.4%
ValueCountFrequency (%)
1 14
5.1%
2 5
 
1.8%
3 3
 
1.1%
4 5
 
1.8%
7 3
 
1.1%
9 4
 
1.5%
11 2
 
0.7%
17 2
 
0.7%
21 2
 
0.7%
22 5
 
1.8%
ValueCountFrequency (%)
81 4
1.5%
79 8
2.9%
77 4
1.5%
74 3
 
1.1%
69 5
1.8%
68 2
 
0.7%
65 4
1.5%
61 5
1.8%
60 2
 
0.7%
58 4
1.5%

답변번호
Real number (ℝ)

HIGH CORRELATION 

Distinct10
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB
2023-12-13T08:48:10.789394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.0875019
Coefficient of variation (CV)0.65234435
Kurtosis0.93894984
Mean3.2
Median Absolute Deviation (MAD)1
Skewness1.1309636
Sum880
Variance4.3576642
MonotonicityNot monotonic
2023-12-13T08:48:10.867508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1 64
23.3%
2 63
22.9%
3 48
17.5%
4 38
13.8%
5 28
10.2%
6 10
 
3.6%
7 9
 
3.3%
8 7
 
2.5%
9 5
 
1.8%
10 3
 
1.1%
ValueCountFrequency (%)
1 64
23.3%
2 63
22.9%
3 48
17.5%
4 38
13.8%
5 28
10.2%
6 10
 
3.6%
7 9
 
3.3%
8 7
 
2.5%
9 5
 
1.8%
10 3
 
1.1%
ValueCountFrequency (%)
10 3
 
1.1%
9 5
 
1.8%
8 7
 
2.5%
7 9
 
3.3%
6 10
 
3.6%
5 28
10.2%
4 38
13.8%
3 48
17.5%
2 63
22.9%
1 64
23.3%

기타여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size407.0 B
False
245 
True
30 
ValueCountFrequency (%)
False 245
89.1%
True 30
 
10.9%
2023-12-13T08:48:10.941373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

답변텍스트
Text

MISSING 

Distinct149
Distinct (%)60.8%
Missing30
Missing (%)10.9%
Memory size2.3 KiB
2023-12-13T08:48:11.223069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length18
Mean length6.8367347
Min length1

Characters and Unicode

Total characters1675
Distinct characters238
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique93 ?
Unique (%)38.0%

Sample

1st row개별농가
2nd row영농조합법인
3rd row농업회사법인
4th row마을기업 / 농촌공동체 회사
5th row사회적 기업 / 협동조합(농협, 원협, 능금조합은 해당되지 않음)
ValueCountFrequency (%)
14
 
3.2%
수용한다 12
 
2.8%
발생하였다 12
 
2.8%
8
 
1.8%
안한다 7
 
1.6%
한다 7
 
1.6%
아니오 7
 
1.6%
방제 7
 
1.6%
매우 5
 
1.1%
않았다 4
 
0.9%
Other values (197) 353
81.0%
2023-12-13T08:48:11.674935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
191
 
11.4%
64
 
3.8%
48
 
2.9%
. 43
 
2.6%
33
 
2.0%
29
 
1.7%
29
 
1.7%
28
 
1.7%
26
 
1.6%
25
 
1.5%
Other values (228) 1159
69.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1317
78.6%
Space Separator 191
 
11.4%
Other Punctuation 65
 
3.9%
Decimal Number 49
 
2.9%
Open Punctuation 15
 
0.9%
Close Punctuation 15
 
0.9%
Uppercase Letter 15
 
0.9%
Math Symbol 7
 
0.4%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
64
 
4.9%
48
 
3.6%
33
 
2.5%
29
 
2.2%
29
 
2.2%
28
 
2.1%
26
 
2.0%
25
 
1.9%
24
 
1.8%
22
 
1.7%
Other values (206) 989
75.1%
Decimal Number
ValueCountFrequency (%)
1 15
30.6%
5 14
28.6%
6 11
22.4%
2 6
 
12.2%
3 2
 
4.1%
0 1
 
2.0%
Uppercase Letter
ValueCountFrequency (%)
A 5
33.3%
P 3
20.0%
G 3
20.0%
C 2
 
13.3%
T 1
 
6.7%
V 1
 
6.7%
Other Punctuation
ValueCountFrequency (%)
. 43
66.2%
, 10
 
15.4%
/ 8
 
12.3%
· 4
 
6.2%
Math Symbol
ValueCountFrequency (%)
~ 6
85.7%
+ 1
 
14.3%
Space Separator
ValueCountFrequency (%)
191
100.0%
Open Punctuation
ValueCountFrequency (%)
( 15
100.0%
Close Punctuation
ValueCountFrequency (%)
) 15
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1317
78.6%
Common 343
 
20.5%
Latin 15
 
0.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
64
 
4.9%
48
 
3.6%
33
 
2.5%
29
 
2.2%
29
 
2.2%
28
 
2.1%
26
 
2.0%
25
 
1.9%
24
 
1.8%
22
 
1.7%
Other values (206) 989
75.1%
Common
ValueCountFrequency (%)
191
55.7%
. 43
 
12.5%
( 15
 
4.4%
1 15
 
4.4%
) 15
 
4.4%
5 14
 
4.1%
6 11
 
3.2%
, 10
 
2.9%
/ 8
 
2.3%
2 6
 
1.7%
Other values (6) 15
 
4.4%
Latin
ValueCountFrequency (%)
A 5
33.3%
P 3
20.0%
G 3
20.0%
C 2
 
13.3%
T 1
 
6.7%
V 1
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1317
78.6%
ASCII 354
 
21.1%
None 4
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
191
54.0%
. 43
 
12.1%
( 15
 
4.2%
1 15
 
4.2%
) 15
 
4.2%
5 14
 
4.0%
6 11
 
3.1%
, 10
 
2.8%
/ 8
 
2.3%
2 6
 
1.7%
Other values (11) 26
 
7.3%
Hangul
ValueCountFrequency (%)
64
 
4.9%
48
 
3.6%
33
 
2.5%
29
 
2.2%
29
 
2.2%
28
 
2.1%
26
 
2.0%
25
 
1.9%
24
 
1.8%
22
 
1.7%
Other values (206) 989
75.1%
None
ValueCountFrequency (%)
· 4
100.0%

이동할페이지
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing275
Missing (%)100.0%
Memory size2.5 KiB

이동할시작질문번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
<NA>
272 
36
 
1
45
 
1
49
 
1

Length

Max length4
Median length4
Mean length3.9781818
Min length2

Unique

Unique3 ?
Unique (%)1.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 272
98.9%
36 1
 
0.4%
45 1
 
0.4%
49 1
 
0.4%

Length

2023-12-13T08:48:11.794487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:48:11.876688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 272
98.9%
36 1
 
0.4%
45 1
 
0.4%
49 1
 
0.4%

이동할끝질문번호
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size2.3 KiB
<NA>
272 
50
 
2
36
 
1

Length

Max length4
Median length4
Mean length3.9781818
Min length2

Unique

Unique1 ?
Unique (%)0.4%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 272
98.9%
50 2
 
0.7%
36 1
 
0.4%

Length

2023-12-13T08:48:11.964397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:48:12.044997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 272
98.9%
50 2
 
0.7%
36 1
 
0.4%

Interactions

2023-12-13T08:48:10.252714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:10.101650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:10.325367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:10.181914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:48:12.092519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
질문번호답변번호기타여부이동할시작질문번호이동할끝질문번호
질문번호1.0000.0000.0001.0000.000
답변번호0.0001.0000.5541.0000.000
기타여부0.0000.5541.000NaNNaN
이동할시작질문번호1.0001.000NaN1.0001.000
이동할끝질문번호0.0000.000NaN1.0001.000
2023-12-13T08:48:12.391955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
이동할시작질문번호기타여부이동할끝질문번호
이동할시작질문번호1.0001.0001.000
기타여부1.0001.0001.000
이동할끝질문번호1.0001.0001.000
2023-12-13T08:48:12.462054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
질문번호답변번호기타여부이동할시작질문번호이동할끝질문번호
질문번호1.000-0.1070.0001.0000.000
답변번호-0.1071.0000.4211.0000.000
기타여부0.0000.4211.0001.0001.000
이동할시작질문번호1.0001.0001.0001.0001.000
이동할끝질문번호0.0000.0001.0001.0001.000

Missing values

2023-12-13T08:48:10.414846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:48:10.499057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

질문번호답변번호기타여부답변텍스트이동할페이지이동할시작질문번호이동할끝질문번호
021N개별농가<NA><NA><NA>
122N영농조합법인<NA><NA><NA>
223N농업회사법인<NA><NA><NA>
324N마을기업 / 농촌공동체 회사<NA><NA><NA>
425N사회적 기업 / 협동조합(농협, 원협, 능금조합은 해당되지 않음)<NA><NA><NA>
571N손 적화<NA><NA><NA>
672N약제 적화<NA><NA><NA>
773Y<NA><NA><NA><NA>
891N꽃가루(인력)<NA><NA><NA>
992N꽃가루(기계)<NA><NA><NA>
질문번호답변번호기타여부답변텍스트이동할페이지이동할시작질문번호이동할끝질문번호
265791N생산없음<NA><NA><NA>
266792N<NA><NA><NA>
267793N<NA><NA><NA>
268794N한과<NA><NA><NA>
269795N과자<NA><NA><NA>
270796Y<NA><NA><NA><NA>
271811N무농약농산물<NA><NA><NA>
272812N유기농산물<NA><NA><NA>
273813NGAP<NA><NA><NA>
274814Y<NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

질문번호답변번호기타여부답변텍스트이동할시작질문번호이동할끝질문번호# duplicates
0311N<NA><NA>2
1312N아니오<NA><NA>2
2352N안한다<NA><NA>2