Overview

Dataset statistics

Number of variables7
Number of observations1004
Missing cells679
Missing cells (%)9.7%
Duplicate rows2
Duplicate rows (%)0.2%
Total size in memory58.0 KiB
Average record size in memory59.1 B

Variable types

Numeric2
Categorical3
Text1
Boolean1

Dataset

Description생산농가 패널의 경영규모, 재배기술, 투입, 산출, 비용 등 경영실태 조사분석관련 내부 관리시스템으로 질문번호, 행/열, 순번, 텍스트, 사용자입력여부, 단위, 답변유형(ques_t_cd와동일)을 제공합니다
Author충청북도
URLhttps://www.data.go.kr/data/15050272/fileData.do

Alerts

Dataset has 2 (0.2%) duplicate rowsDuplicates
행/열 is highly overall correlated with 사용자입력여부 and 2 other fieldsHigh correlation
답변유형 is highly overall correlated with 질문번호 and 3 other fieldsHigh correlation
단위 is highly overall correlated with 행/열High correlation
사용자입력여부 is highly overall correlated with 행/열 and 1 other fieldsHigh correlation
질문번호 is highly overall correlated with 답변유형High correlation
순번 is highly overall correlated with 답변유형High correlation
사용자입력여부 is highly imbalanced (92.5%)Imbalance
단위 is highly imbalanced (71.3%)Imbalance
답변유형 is highly imbalanced (94.0%)Imbalance
사용자입력여부 has 676 (67.3%) missing valuesMissing

Reproduction

Analysis started2023-12-11 23:30:23.548065
Analysis finished2023-12-11 23:30:24.841617
Duration1.29 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

질문번호
Real number (ℝ)

HIGH CORRELATION 

Distinct98
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.241036
Minimum2
Maximum105
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.0 KiB
2023-12-12T08:30:25.221021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5
Q121
median39
Q378
95-th percentile98
Maximum105
Range103
Interquartile range (IQR)57

Descriptive statistics

Standard deviation30.592124
Coefficient of variation (CV)0.64757521
Kurtosis-1.2798449
Mean47.241036
Median Absolute Deviation (MAD)25.5
Skewness0.27269615
Sum47430
Variance935.87804
MonotonicityNot monotonic
2023-12-12T08:30:25.421825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26 24
 
2.4%
25 24
 
2.4%
82 24
 
2.4%
28 23
 
2.3%
24 22
 
2.2%
40 22
 
2.2%
77 21
 
2.1%
27 21
 
2.1%
29 21
 
2.1%
85 20
 
2.0%
Other values (88) 782
77.9%
ValueCountFrequency (%)
2 10
1.0%
3 14
1.4%
4 14
1.4%
5 19
1.9%
6 11
1.1%
7 13
1.3%
8 13
1.3%
9 11
1.1%
10 15
1.5%
11 11
1.1%
ValueCountFrequency (%)
105 8
0.8%
104 12
1.2%
103 12
1.2%
102 6
0.6%
101 6
0.6%
100 2
 
0.2%
99 2
 
0.2%
98 6
0.6%
97 3
 
0.3%
96 3
 
0.3%

행/열
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
C
603 
R
401 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowR
2nd rowR
3rd rowR
4th rowR
5th rowR

Common Values

ValueCountFrequency (%)
C 603
60.1%
R 401
39.9%

Length

2023-12-12T08:30:25.580333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:30:25.696750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
c 603
60.1%
r 401
39.9%

순번
Real number (ℝ)

HIGH CORRELATION 

Distinct19
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.3286853
Minimum1
Maximum19
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size9.0 KiB
2023-12-12T08:30:25.798559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile8
Maximum19
Range18
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.3741575
Coefficient of variation (CV)0.71324182
Kurtosis6.1565359
Mean3.3286853
Median Absolute Deviation (MAD)1
Skewness1.8807347
Sum3342
Variance5.636624
MonotonicityNot monotonic
2023-12-12T08:30:25.906104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1 233
23.2%
2 223
22.2%
3 169
16.8%
4 129
12.8%
5 115
11.5%
6 44
 
4.4%
7 31
 
3.1%
8 24
 
2.4%
9 19
 
1.9%
10 4
 
0.4%
Other values (9) 13
 
1.3%
ValueCountFrequency (%)
1 233
23.2%
2 223
22.2%
3 169
16.8%
4 129
12.8%
5 115
11.5%
6 44
 
4.4%
7 31
 
3.1%
8 24
 
2.4%
9 19
 
1.9%
10 4
 
0.4%
ValueCountFrequency (%)
19 1
 
0.1%
18 1
 
0.1%
17 1
 
0.1%
16 1
 
0.1%
15 1
 
0.1%
14 1
 
0.1%
13 1
 
0.1%
12 3
0.3%
11 3
0.3%
10 4
0.4%
Distinct449
Distinct (%)44.9%
Missing3
Missing (%)0.3%
Memory size8.0 KiB
2023-12-12T08:30:26.323432image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length31
Mean length6.7192807
Min length1

Characters and Unicode

Total characters6726
Distinct characters334
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique221 ?
Unique (%)22.1%

Sample

1st row성명
2nd row성별
3rd row주소
4th row이메일주소
5th row출생연도
ValueCountFrequency (%)
매우 43
 
2.4%
보통이다 36
 
2.0%
않다 32
 
1.8%
그렇다 30
 
1.7%
그렇지 30
 
1.7%
있다 28
 
1.6%
금액 27
 
1.5%
27
 
1.5%
비목 23
 
1.3%
22
 
1.2%
Other values (518) 1476
83.2%
2023-12-12T08:30:26.888784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
774
 
11.5%
317
 
4.7%
157
 
2.3%
123
 
1.8%
110
 
1.6%
. 109
 
1.6%
99
 
1.5%
93
 
1.4%
93
 
1.4%
87
 
1.3%
Other values (324) 4764
70.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5377
79.9%
Space Separator 774
 
11.5%
Other Punctuation 174
 
2.6%
Control 93
 
1.4%
Decimal Number 91
 
1.4%
Open Punctuation 54
 
0.8%
Close Punctuation 54
 
0.8%
Other Number 36
 
0.5%
Lowercase Letter 33
 
0.5%
Dash Punctuation 26
 
0.4%
Other values (2) 14
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
317
 
5.9%
157
 
2.9%
123
 
2.3%
110
 
2.0%
99
 
1.8%
93
 
1.7%
87
 
1.6%
83
 
1.5%
82
 
1.5%
81
 
1.5%
Other values (294) 4145
77.1%
Decimal Number
ValueCountFrequency (%)
1 21
23.1%
2 18
19.8%
0 15
16.5%
5 13
14.3%
7 6
 
6.6%
6 5
 
5.5%
4 5
 
5.5%
3 4
 
4.4%
9 2
 
2.2%
8 2
 
2.2%
Other Punctuation
ValueCountFrequency (%)
. 109
62.6%
, 41
 
23.6%
· 16
 
9.2%
% 5
 
2.9%
/ 3
 
1.7%
Lowercase Letter
ValueCountFrequency (%)
g 15
45.5%
k 15
45.5%
m 3
 
9.1%
Other Number
ValueCountFrequency (%)
12
33.3%
12
33.3%
12
33.3%
Uppercase Letter
ValueCountFrequency (%)
X 5
38.5%
O 4
30.8%
P 4
30.8%
Space Separator
ValueCountFrequency (%)
774
100.0%
Control
ValueCountFrequency (%)
93
100.0%
Open Punctuation
ValueCountFrequency (%)
( 54
100.0%
Close Punctuation
ValueCountFrequency (%)
) 54
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 26
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5377
79.9%
Common 1303
 
19.4%
Latin 46
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
317
 
5.9%
157
 
2.9%
123
 
2.3%
110
 
2.0%
99
 
1.8%
93
 
1.7%
87
 
1.6%
83
 
1.5%
82
 
1.5%
81
 
1.5%
Other values (294) 4145
77.1%
Common
ValueCountFrequency (%)
774
59.4%
. 109
 
8.4%
93
 
7.1%
( 54
 
4.1%
) 54
 
4.1%
, 41
 
3.1%
- 26
 
2.0%
1 21
 
1.6%
2 18
 
1.4%
· 16
 
1.2%
Other values (14) 97
 
7.4%
Latin
ValueCountFrequency (%)
g 15
32.6%
k 15
32.6%
X 5
 
10.9%
O 4
 
8.7%
P 4
 
8.7%
m 3
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5377
79.9%
ASCII 1297
 
19.3%
Enclosed Alphanum 36
 
0.5%
None 16
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
774
59.7%
. 109
 
8.4%
93
 
7.2%
( 54
 
4.2%
) 54
 
4.2%
, 41
 
3.2%
- 26
 
2.0%
1 21
 
1.6%
2 18
 
1.4%
g 15
 
1.2%
Other values (16) 92
 
7.1%
Hangul
ValueCountFrequency (%)
317
 
5.9%
157
 
2.9%
123
 
2.3%
110
 
2.0%
99
 
1.8%
93
 
1.7%
87
 
1.6%
83
 
1.5%
82
 
1.5%
81
 
1.5%
Other values (294) 4145
77.1%
None
ValueCountFrequency (%)
· 16
100.0%
Enclosed Alphanum
ValueCountFrequency (%)
12
33.3%
12
33.3%
12
33.3%

사용자입력여부
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)0.6%
Missing676
Missing (%)67.3%
Memory size2.1 KiB
False
325 
True
 
3
(Missing)
676 
ValueCountFrequency (%)
False 325
32.4%
True 3
 
0.3%
(Missing) 676
67.3%
2023-12-12T08:30:27.021011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

단위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct28
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
<NA>
811 
%
 
49
 
45
kg
 
16
 
12
Other values (23)
 
71

Length

Max length9
Median length4
Mean length3.5388446
Min length1

Unique

Unique13 ?
Unique (%)1.3%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 811
80.8%
% 49
 
4.9%
45
 
4.5%
kg 16
 
1.6%
12
 
1.2%
11
 
1.1%
(원) 10
 
1.0%
10
 
1.0%
7
 
0.7%
(%) 5
 
0.5%
Other values (18) 28
 
2.8%

Length

2023-12-12T08:30:27.161978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 811
80.5%
55
 
5.5%
54
 
5.4%
kg 18
 
1.8%
12
 
1.2%
11
 
1.1%
10
 
1.0%
7
 
0.7%
원/kg 5
 
0.5%
5
 
0.5%
Other values (9) 20
 
2.0%

답변유형
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.0 KiB
<NA>
997 
5
 
7

Length

Max length4
Median length4
Mean length3.9790837
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 997
99.3%
5 7
 
0.7%

Length

2023-12-12T08:30:27.304495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:30:27.418497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 997
99.3%
5 7
 
0.7%

Interactions

2023-12-12T08:30:24.242670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:30:24.020288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:30:24.362240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:30:24.130754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:30:27.482460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
질문번호행/열순번사용자입력여부단위
질문번호1.0000.3160.3420.0000.717
행/열0.3161.0000.418NaNNaN
순번0.3420.4181.0000.0280.643
사용자입력여부0.000NaN0.0281.0000.000
단위0.717NaN0.6430.0001.000
2023-12-12T08:30:27.613530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
행/열답변유형단위사용자입력여부
행/열1.0001.0001.0001.000
답변유형1.0001.000NaN1.000
단위1.000NaN1.0000.000
사용자입력여부1.0001.0000.0001.000
2023-12-12T08:30:27.737398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
질문번호순번행/열사용자입력여부단위답변유형
질문번호1.000-0.0690.2420.0000.3361.000
순번-0.0691.0000.3200.0340.3701.000
행/열0.2420.3201.0001.0001.0001.000
사용자입력여부0.0000.0341.0001.0000.0001.000
단위0.3360.3701.0000.0001.000NaN
답변유형1.0001.0001.0001.000NaN1.000

Missing values

2023-12-12T08:30:24.519599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:30:24.643115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T08:30:24.759135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

질문번호행/열순번텍스트사용자입력여부단위답변유형
03R1성명<NA><NA><NA>
13R2성별<NA><NA><NA>
23R3주소<NA><NA><NA>
33R4이메일주소<NA><NA><NA>
43R5출생연도<NA><NA><NA>
53R6전화번호<NA><NA><NA>
63R7휴대전화번호<NA><NA><NA>
73R8영농경력<NA><NA><NA>
83R9사과 재배경력<NA><NA><NA>
94C1품종명N<NA><NA>
질문번호행/열순번텍스트사용자입력여부단위답변유형
994104C4희망적이다<NA><NA><NA>
995104C5매우 희망적이다<NA><NA><NA>
996105R1올해 전망<NA><NA><NA>
997105R25년 후 전망<NA><NA><NA>
998105R310년 후 전망<NA><NA><NA>
999105C1매우 부정적이다<NA><NA><NA>
1000105C2부정적이다<NA><NA><NA>
1001105C3보통이다<NA><NA><NA>
1002105C4희망적이다<NA><NA><NA>
1003105C5매우 희망적이다<NA><NA><NA>

Duplicate rows

Most frequently occurring

질문번호행/열순번텍스트사용자입력여부단위답변유형# duplicates
025C3보통이다<NA><NA><NA>2
126C3보통이다<NA><NA><NA>2