Overview

Dataset statistics

Number of variables3
Number of observations420
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.4 KiB
Average record size in memory25.3 B

Variable types

Numeric1
Text1
Categorical1

Dataset

Description창원시 빅데이터시스템의 민원통계분석용 긍정, 부정 등 키워드 목록입니다. 항목은 연번, 키워드, 구분(불용어, 긍정) 의 목록입니다.
Author경상남도 창원시
URLhttps://www.data.go.kr/data/15063986/fileData.do

Alerts

연번 is highly overall correlated with 구분High correlation
구분 is highly overall correlated with 연번High correlation
연번 has unique valuesUnique
키워드 has unique valuesUnique

Reproduction

Analysis started2023-12-12 15:15:16.452726
Analysis finished2023-12-12 15:15:16.868327
Duration0.42 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct420
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean428.43571
Minimum1
Maximum849
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.8 KiB
2023-12-13T00:15:16.946263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile24.95
Q1215.75
median437.5
Q3631.25
95-th percentile808.1
Maximum849
Range848
Interquartile range (IQR)415.5

Descriptive statistics

Standard deviation254.73686
Coefficient of variation (CV)0.59457428
Kurtosis-1.1385714
Mean428.43571
Median Absolute Deviation (MAD)197.5
Skewness-0.13580887
Sum179943
Variance64890.867
MonotonicityStrictly increasing
2023-12-13T00:15:17.098823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.2%
534 1
 
0.2%
595 1
 
0.2%
594 1
 
0.2%
593 1
 
0.2%
592 1
 
0.2%
591 1
 
0.2%
570 1
 
0.2%
569 1
 
0.2%
567 1
 
0.2%
Other values (410) 410
97.6%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
849 1
0.2%
848 1
0.2%
847 1
0.2%
846 1
0.2%
845 1
0.2%
841 1
0.2%
837 1
0.2%
836 1
0.2%
835 1
0.2%
834 1
0.2%

키워드
Text

UNIQUE 

Distinct420
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
2023-12-13T00:15:17.488499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.797619
Min length2

Characters and Unicode

Total characters1175
Distinct characters334
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique420 ?
Unique (%)100.0%

Sample

1st row보호구역
2nd row주정차
3rd row자기집
4th row매장
5th row블랙
ValueCountFrequency (%)
보호구역 1
 
0.2%
분수대 1
 
0.2%
전세 1
 
0.2%
탐색 1
 
0.2%
누수 1
 
0.2%
층주택 1
 
0.2%
경화 1
 
0.2%
신설 1
 
0.2%
소유 1
 
0.2%
개체 1
 
0.2%
Other values (410) 410
97.6%
2023-12-13T00:15:17.955527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
2.2%
23
 
2.0%
20
 
1.7%
18
 
1.5%
18
 
1.5%
16
 
1.4%
16
 
1.4%
15
 
1.3%
14
 
1.2%
13
 
1.1%
Other values (324) 996
84.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1175
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
2.2%
23
 
2.0%
20
 
1.7%
18
 
1.5%
18
 
1.5%
16
 
1.4%
16
 
1.4%
15
 
1.3%
14
 
1.2%
13
 
1.1%
Other values (324) 996
84.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1175
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
2.2%
23
 
2.0%
20
 
1.7%
18
 
1.5%
18
 
1.5%
16
 
1.4%
16
 
1.4%
15
 
1.3%
14
 
1.2%
13
 
1.1%
Other values (324) 996
84.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1175
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
2.2%
23
 
2.0%
20
 
1.7%
18
 
1.5%
18
 
1.5%
16
 
1.4%
16
 
1.4%
15
 
1.3%
14
 
1.2%
13
 
1.1%
Other values (324) 996
84.8%

구분
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size3.4 KiB
긍정
207 
불용어
175 
부정
38 

Length

Max length3
Median length2
Mean length2.4166667
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row불용어
2nd row불용어
3rd row불용어
4th row불용어
5th row불용어

Common Values

ValueCountFrequency (%)
긍정 207
49.3%
불용어 175
41.7%
부정 38
 
9.0%

Length

2023-12-13T00:15:18.091151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T00:15:18.194645image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
긍정 207
49.3%
불용어 175
41.7%
부정 38
 
9.0%

Interactions

2023-12-13T00:15:16.650921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T00:15:18.267835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구분
연번1.0000.852
구분0.8521.000
2023-12-13T00:15:18.340910image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번구분
연번1.0000.764
구분0.7641.000

Missing values

2023-12-13T00:15:16.748952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T00:15:16.827110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

연번키워드구분
01보호구역불용어
12주정차불용어
23자기집불용어
34매장불용어
45블랙불용어
56합성불용어
67인도불용어
78마산합포구불용어
89불법주정차불용어
910다리불용어
연번키워드구분
410834가수부정
411835창원홀부정
412836공연부정
413837트로트부정
414841노점상부정
415845삼계부정
416846입간판부정
417847경남은행부정
418848상가부정
419849야간부정