gimi9 Pandas Profiling

Dataset statistics

Number of variables	6
Number of observations	54
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	2.8 KiB
Average record size in memory	52.4 B

Variable types

Numeric	2
Categorical	3
Text	1

Dataset

Description	경상남도 농작물진단처방 분석항목 데이터입니다.
Author	경상남도
URL	https://www.data.go.kr/data/15049542/fileData.do

Alerts

`분류코드(소)` is highly overall correlated with `비용` and 2 other fields	High correlation
`비용` is highly overall correlated with `분류코드(소)` and 3 other fields	High correlation
`분류코드(대)` is highly overall correlated with `분류코드(소)` and 3 other fields	High correlation
`분석기준` is highly overall correlated with `분류코드(소)` and 3 other fields	High correlation
`단위` is highly overall correlated with `비용` and 2 other fields	High correlation
`분류코드(소)` has unique values	Unique

Reproduction

Analysis started	2023-12-12 02:53:01.527434
Analysis finished	2023-12-12 02:53:03.039543
Duration	1.51 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

분류코드(소)
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	54
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	27.5

Minimum	1
Maximum	54
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	618.0 B

Quantile statistics

Minimum	1
5-th percentile	3.65
Q1	14.25
median	27.5
Q3	40.75
95-th percentile	51.35
Maximum	54
Range	53
Interquartile range (IQR)	26.5

Descriptive statistics

Standard deviation	15.732133
Coefficient of variation (CV)	0.57207755
Kurtosis	-1.2
Mean	27.5
Median Absolute Deviation (MAD)	13.5
Skewness	0
Sum	1485
Variance	247.5
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	1.9%
42	1	1.9%
31	1	1.9%
32	1	1.9%
33	1	1.9%
34	1	1.9%
35	1	1.9%
36	1	1.9%
37	1	1.9%
38	1	1.9%
Other values (44)	44	81.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	1.9%
2	1	1.9%
3	1	1.9%
4	1	1.9%
5	1	1.9%
6	1	1.9%
7	1	1.9%
8	1	1.9%
9	1	1.9%
10	1	1.9%

Value	Count	Frequency (%)
54	1	1.9%
53	1	1.9%
52	1	1.9%
51	1	1.9%
50	1	1.9%
49	1	1.9%
48	1	1.9%
47	1	1.9%
46	1	1.9%
45	1	1.9%

분류코드(대)
Categorical

HIGH CORRELATION

Distinct	5
Distinct (%)	9.3%
Missing	0
Missing (%)	0.0%
Memory size	564.0 B

C	16
E	12
A	10
B	8
D	8

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	A
2nd row	A
3rd row	A
4th row	A
5th row	A

Common Values

Value	Count	Frequency (%)
C	16	29.6%
E	12	22.2%
A	10	18.5%
B	8	14.8%
D	8	14.8%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
c	16	29.6%
e	12	22.2%
a	10	18.5%
b	8	14.8%
d	8	14.8%

항목명
Text

Distinct	38
Distinct (%)	70.4%
Missing	0
Missing (%)	0.0%
Memory size	564.0 B

Length

Max length	20
Median length	12
Mean length	5.4259259
Min length	1

Characters and Unicode

Total characters	293
Distinct characters	79
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	29 ?
Unique (%)	53.7%

Sample

1st row	수소이온(pH)
2nd row	전기전도도(EC)
3rd row	질산성질소(NO3-N)
4th row	칼륨(K)
5th row	칼슘(Ca)

Value	Count	Frequency (%)
수은	3	5.4%
납	3	5.4%
니켈	3	5.4%
카드뮴	3	5.4%
비소	3	5.4%
구리	3	5.4%
아연	3	5.4%
6가크롬	2	3.6%
전기전도도(ec	2	3.6%
v/v	2	3.6%
Other values (29)	29	51.8%

Most occurring characters

Value	Count	Frequency (%)
(	21	7.2%
)	21	7.2%
소	9	3.1%
도	9	3.1%
성	9	3.1%
환	8	2.7%
치	8	2.7%
C	8	2.7%
수	7	2.4%
슘	6	2.0%
Other values (69)	187	63.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	180	61.4%
Uppercase Letter	31	10.6%
Open Punctuation	23	7.8%
Close Punctuation	23	7.8%
Lowercase Letter	15	5.1%
Decimal Number	11	3.8%
Other Punctuation	6	2.0%
Dash Punctuation	2	0.7%
Space Separator	2	0.7%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
소	9	5.0%
도	9	5.0%
성	9	5.0%
환	8	4.4%
치	8	4.4%
수	7	3.9%
슘	6	3.3%
륨	6	3.3%
전	6	3.3%
칼	6	3.3%
Other values (40)	106	58.9%

Uppercase Letter

Value	Count	Frequency (%)
C	8	25.8%
O	5	16.1%
H	4	12.9%
N	4	12.9%
M	3	9.7%
E	3	9.7%
K	2	6.5%
P	1	3.2%
S	1	3.2%

Decimal Number

Value	Count	Frequency (%)
5	3	27.3%
1	2	18.2%
3	2	18.2%
6	2	18.2%
2	1	9.1%
4	1	9.1%

Lowercase Letter

Value	Count	Frequency (%)
v	4	26.7%
a	4	26.7%
p	3	20.0%
l	2	13.3%
g	2	13.3%

Other Punctuation

Value	Count	Frequency (%)
:	2	33.3%
/	2	33.3%
,	2	33.3%

Open Punctuation

Value	Count	Frequency (%)
(	21	91.3%
[	2	8.7%

Close Punctuation

Value	Count	Frequency (%)
)	21	91.3%
]	2	8.7%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Space Separator

Value	Count	Frequency (%)
	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	180	61.4%
Common	67	22.9%
Latin	46	15.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
소	9	5.0%
도	9	5.0%
성	9	5.0%
환	8	4.4%
치	8	4.4%
수	7	3.9%
슘	6	3.3%
륨	6	3.3%
전	6	3.3%
칼	6	3.3%
Other values (40)	106	58.9%

Common

Value	Count	Frequency (%)
(	21	31.3%
)	21	31.3%
5	3	4.5%
1	2	3.0%
3	2	3.0%
:	2	3.0%
/	2	3.0%
,	2	3.0%
-	2	3.0%
6	2	3.0%
Other values (5)	8	11.9%

Latin

Value	Count	Frequency (%)
C	8	17.4%
O	5	10.9%
H	4	8.7%
v	4	8.7%
N	4	8.7%
a	4	8.7%
M	3	6.5%
E	3	6.5%
p	3	6.5%
K	2	4.3%
Other values (4)	6	13.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	180	61.4%
ASCII	113	38.6%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
(	21	18.6%
)	21	18.6%
C	8	7.1%
O	5	4.4%
H	4	3.5%
v	4	3.5%
N	4	3.5%
a	4	3.5%
M	3	2.7%
5	3	2.7%
Other values (19)	36	31.9%

Hangul

Value	Count	Frequency (%)
소	9	5.0%
도	9	5.0%
성	9	5.0%
환	8	4.4%
치	8	4.4%
수	7	3.9%
슘	6	3.3%
륨	6	3.3%
전	6	3.3%
칼	6	3.3%
Other values (40)	106	58.9%

비용
Real number (ℝ)

HIGH CORRELATION

Distinct	8
Distinct (%)	14.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	12072.222

Minimum	7500
Maximum	43800
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	618.0 B

Quantile statistics

Minimum	7500
5-th percentile	8500
Q1	9900
median	11700
Q3	11700
95-th percentile	16000
Maximum	43800
Range	36300
Interquartile range (IQR)	1800

Descriptive statistics

Standard deviation	5253.4774
Coefficient of variation (CV)	0.4351707
Kurtosis	25.288801
Mean	12072.222
Median Absolute Deviation (MAD)	1800
Skewness	4.4127823
Sum	651900
Variance	27599025
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=8)

Value	Count	Frequency (%)
11700	16	29.6%
9900	15	27.8%
8500	10	18.5%
16000	9	16.7%
21600	1	1.9%
14300	1	1.9%
7500	1	1.9%
43800	1	1.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
7500	1	1.9%
8500	10	18.5%
9900	15	27.8%
11700	16	29.6%
14300	1	1.9%
16000	9	16.7%
21600	1	1.9%
43800	1	1.9%

Value	Count	Frequency (%)
43800	1	1.9%
21600	1	1.9%
16000	9	16.7%
14300	1	1.9%
11700	16	29.6%
9900	15	27.8%
8500	10	18.5%
7500	1	1.9%

분석기준
Categorical

HIGH CORRELATION

Distinct	5
Distinct (%)	9.3%
Missing	0
Missing (%)	0.0%
Memory size	564.0 B

토양오염공정시험기준	16
농촌진흥청토양및식물체분석법	15
농촌진흥청고시 제2017-19	13
수질오염공정시험기준	8
STD. Method	2

Length

Max length	16
Median length	14
Mean length	12.592593
Min length	10

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	수질오염공정시험기준
2nd row	수질오염공정시험기준
3rd row	수질오염공정시험기준
4th row	수질오염공정시험기준
5th row	수질오염공정시험기준

Common Values

Value	Count	Frequency (%)
토양오염공정시험기준	16	29.6%
농촌진흥청토양및식물체분석법	15	27.8%
농촌진흥청고시 제2017-19	13	24.1%
수질오염공정시험기준	8	14.8%
STD. Method	2	3.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
토양오염공정시험기준	16	23.2%
농촌진흥청토양및식물체분석법	15	21.7%
농촌진흥청고시	13	18.8%
제2017-19	13	18.8%
수질오염공정시험기준	8	11.6%
std	2	2.9%
method	2	2.9%

단위
Categorical

HIGH CORRELATION

Distinct	8
Distinct (%)	14.8%
Missing	0
Missing (%)	0.0%
Memory size	564.0 B

mg/kg	25
mg/L	9
<NA>	4
cmolc/L	4
cmolc/kg	4
Other values (3)	8

Length

Max length	8
Median length	7
Mean length	4.7592593
Min length	1

Unique

Unique	1 ?
Unique (%)	1.9%

Sample

1st row	<NA>
2nd row	dS/m
3rd row	mg/L
4th row	mg/L
5th row	mg/L

Common Values

Value	Count	Frequency (%)
mg/kg	25	46.3%
mg/L	9	16.7%
<NA>	4	7.4%
cmolc/L	4	7.4%
cmolc/kg	4	7.4%
%	4	7.4%
dS/m	3	5.6%
g/kg	1	1.9%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
mg/kg	25	46.3%
mg/l	9	16.7%
na	4	7.4%
cmolc/l	4	7.4%
cmolc/kg	4	7.4%
	4	7.4%
ds/m	3	5.6%
g/kg	1	1.9%

분류코드(소)
비용

비용
분류코드(소)

비용
분류코드(소)

Heatmap
Table

	분류코드(소)	분류코드(대)	항목명	비용	분석기준	단위
분류코드(소)	1.000	0.989	0.000	0.923	0.983	0.746
분류코드(대)	0.989	1.000	0.000	0.904	0.968	0.756
항목명	0.000	0.000	1.000	0.862	0.874	1.000
비용	0.923	0.904	0.862	1.000	0.929	0.838
분석기준	0.983	0.968	0.874	0.929	1.000	0.757
단위	0.746	0.756	1.000	0.838	0.757	1.000

Heatmap
Table

	분류코드(대)	단위	분석기준
분류코드(대)	1.000	0.603	0.746
단위	0.603	1.000	0.605
분석기준	0.746	0.605	1.000

Heatmap
Table

	분류코드(소)	비용	분류코드(대)	분석기준	단위
분류코드(소)	1.000	0.809	0.806	0.770	0.483
비용	0.809	1.000	0.550	0.611	0.724
분류코드(대)	0.806	0.550	1.000	0.746	0.603
분석기준	0.770	0.611	0.746	1.000	0.605
단위	0.483	0.724	0.603	0.605	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	분류코드(소)	분류코드(대)	항목명	비용	분석기준	단위
0	1	A	수소이온(pH)	8500	수질오염공정시험기준	<NA>
1	2	A	전기전도도(EC)	8500	수질오염공정시험기준	dS/m
2	3	A	질산성질소(NO3-N)	8500	수질오염공정시험기준	mg/L
3	4	A	칼륨(K)	8500	수질오염공정시험기준	mg/L
4	5	A	칼슘(Ca)	8500	수질오염공정시험기준	mg/L
5	6	A	마그네슘(Mg)	8500	수질오염공정시험기준	mg/L
6	7	A	나트륨(Na)	8500	수질오염공정시험기준	mg/L
7	8	A	염소이온(Cl-)	8500	수질오염공정시험기준	mg/L
8	9	A	황산이온(SO4)	8500	STD. Method	mg/L
9	10	A	중탄산(HCO3)	8500	STD. Method	mg/L

	분류코드(소)	분류코드(대)	항목명	비용	분석기준	단위
44	45	E	비소	16000	농촌진흥청고시 제2017-19	mg/kg
45	46	E	니켈	16000	농촌진흥청고시 제2017-19	mg/kg
46	47	E	카드뮴	16000	농촌진흥청고시 제2017-19	mg/kg
47	48	E	구리	16000	농촌진흥청고시 제2017-19	mg/kg
48	49	E	크롬	16000	농촌진흥청고시 제2017-19	mg/kg
49	50	E	수은	16000	농촌진흥청고시 제2017-19	mg/kg
50	51	E	아연	16000	농촌진흥청고시 제2017-19	mg/kg
51	52	E	납	16000	농촌진흥청고시 제2017-19	mg/kg
52	53	E	부숙도(콤백법)	43800	농촌진흥청고시 제2017-19	<NA>
53	54	B	염소Cl	9900	농촌진흥청고시 제2017-19	%

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Open Punctuation

Close Punctuation

Dash Punctuation

Space Separator

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample