gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	420
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	10.4 KiB
Average record size in memory	25.3 B

Variable types

Numeric	1
Text	1
Categorical	1

Dataset

Description	창원시 빅데이터시스템의 민원통계분석용 긍정, 부정 등 키워드 목록입니다. 항목은 연번, 키워드, 구분(불용어, 긍정) 의 목록입니다.
Author	경상남도 창원시
URL	https://www.data.go.kr/data/15063986/fileData.do

Alerts

`연번` is highly overall correlated with `구분`	High correlation
`구분` is highly overall correlated with `연번`	High correlation
`연번` has unique values	Unique
`키워드` has unique values	Unique

Reproduction

Analysis started	2023-12-12 15:15:16.452726
Analysis finished	2023-12-12 15:15:16.868327
Duration	0.42 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

연번
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	420
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	428.43571

Minimum	1
Maximum	849
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	3.8 KiB

Quantile statistics

Minimum	1
5-th percentile	24.95
Q1	215.75
median	437.5
Q3	631.25
95-th percentile	808.1
Maximum	849
Range	848
Interquartile range (IQR)	415.5

Descriptive statistics

Standard deviation	254.73686
Coefficient of variation (CV)	0.59457428
Kurtosis	-1.1385714
Mean	428.43571
Median Absolute Deviation (MAD)	197.5
Skewness	-0.13580887
Sum	179943
Variance	64890.867
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	0.2%
534	1	0.2%
595	1	0.2%
594	1	0.2%
593	1	0.2%
592	1	0.2%
591	1	0.2%
570	1	0.2%
569	1	0.2%
567	1	0.2%
Other values (410)	410	97.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.2%
2	1	0.2%
3	1	0.2%
4	1	0.2%
5	1	0.2%
6	1	0.2%
7	1	0.2%
8	1	0.2%
9	1	0.2%
10	1	0.2%

Value	Count	Frequency (%)
849	1	0.2%
848	1	0.2%
847	1	0.2%
846	1	0.2%
845	1	0.2%
841	1	0.2%
837	1	0.2%
836	1	0.2%
835	1	0.2%
834	1	0.2%

키워드
Text

UNIQUE

Distinct	420
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

Length

Max length	7
Median length	2
Mean length	2.797619
Min length	2

Characters and Unicode

Total characters	1175
Distinct characters	334
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	420 ?
Unique (%)	100.0%

Sample

1st row	보호구역
2nd row	주정차
3rd row	자기집
4th row	매장
5th row	블랙

Value	Count	Frequency (%)
보호구역	1	0.2%
분수대	1	0.2%
전세	1	0.2%
탐색	1	0.2%
누수	1	0.2%
층주택	1	0.2%
경화	1	0.2%
신설	1	0.2%
소유	1	0.2%
개체	1	0.2%
Other values (410)	410	97.6%

Most occurring characters

Value	Count	Frequency (%)
지	26	2.2%
동	23	2.0%
주	20	1.7%
차	18	1.5%
산	18	1.5%
사	16	1.4%
도	16	1.4%
장	15	1.3%
자	14	1.2%
가	13	1.1%
Other values (324)	996	84.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	1175	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
지	26	2.2%
동	23	2.0%
주	20	1.7%
차	18	1.5%
산	18	1.5%
사	16	1.4%
도	16	1.4%
장	15	1.3%
자	14	1.2%
가	13	1.1%
Other values (324)	996	84.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	1175	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
지	26	2.2%
동	23	2.0%
주	20	1.7%
차	18	1.5%
산	18	1.5%
사	16	1.4%
도	16	1.4%
장	15	1.3%
자	14	1.2%
가	13	1.1%
Other values (324)	996	84.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	1175	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
지	26	2.2%
동	23	2.0%
주	20	1.7%
차	18	1.5%
산	18	1.5%
사	16	1.4%
도	16	1.4%
장	15	1.3%
자	14	1.2%
가	13	1.1%
Other values (324)	996	84.8%

구분
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	0.7%
Missing	0
Missing (%)	0.0%
Memory size	3.4 KiB

긍정	207
불용어	175
부정	38

Length

Max length	3
Median length	2
Mean length	2.4166667
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	불용어
2nd row	불용어
3rd row	불용어
4th row	불용어
5th row	불용어

Common Values

Value	Count	Frequency (%)
긍정	207	49.3%
불용어	175	41.7%
부정	38	9.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
긍정	207	49.3%
불용어	175	41.7%
부정	38	9.0%

연번

연번

Phik (φk)
Auto

Heatmap
Table

	연번	구분
연번	1.000	0.852
구분	0.852	1.000

Heatmap
Table

	연번	구분
연번	1.000	0.764
구분	0.764	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	연번	키워드	구분
0	1	보호구역	불용어
1	2	주정차	불용어
2	3	자기집	불용어
3	4	매장	불용어
4	5	블랙	불용어
5	6	합성	불용어
6	7	인도	불용어
7	8	마산합포구	불용어
8	9	불법주정차	불용어
9	10	다리	불용어

	연번	키워드	구분
410	834	가수	부정
411	835	창원홀	부정
412	836	공연	부정
413	837	트로트	부정
414	841	노점상	부정
415	845	삼계	부정
416	846	입간판	부정
417	847	경남은행	부정
418	848	상가	부정
419	849	야간	부정

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample