Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells19837
Missing cells (%)33.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Numeric1
Text3
Categorical2

Dataset

Description창원시 빅데이터시스템의 민원통계분석용 긍정, 부정 등 키워드 목록입니다. 항목은 연번, 키워드, 구분(불용어, 긍정) 의 목록입니다.
Author경상남도 창원시
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15063986

Alerts

연번 is highly overall correlated with TYPE and 1 other fieldsHigh correlation
TYPE is highly overall correlated with 연번High correlation
긍부정구분 is highly overall correlated with 연번High correlation
TYPE is highly imbalanced (94.8%)Imbalance
긍부정구분 is highly imbalanced (96.7%)Imbalance
WORD has 9890 (98.9%) missing valuesMissing
단어 has 9947 (99.5%) missing valuesMissing
연번 has unique valuesUnique
KEYWORD has unique valuesUnique

Reproduction

Analysis started2023-12-10 23:22:38.702028
Analysis finished2023-12-10 23:22:39.815826
Duration1.11 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean431124.37
Minimum19
Maximum1078995
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-11T08:22:39.884944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile13663.05
Q1119448.75
median374967
Q3709924.25
95-th percentile972240.5
Maximum1078995
Range1078976
Interquartile range (IQR)590475.5

Descriptive statistics

Standard deviation327707.04
Coefficient of variation (CV)0.76012182
Kurtosis-1.1976801
Mean431124.37
Median Absolute Deviation (MAD)283603.5
Skewness0.33501269
Sum4.3112437 × 109
Variance1.0739191 × 1011
MonotonicityNot monotonic
2023-12-11T08:22:40.013146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
269389 1
 
< 0.1%
382596 1
 
< 0.1%
87018 1
 
< 0.1%
43619 1
 
< 0.1%
11452 1
 
< 0.1%
193262 1
 
< 0.1%
537998 1
 
< 0.1%
215896 1
 
< 0.1%
42454 1
 
< 0.1%
724469 1
 
< 0.1%
Other values (9990) 9990
99.9%
ValueCountFrequency (%)
19 1
< 0.1%
20 1
< 0.1%
50 1
< 0.1%
59 1
< 0.1%
60 1
< 0.1%
69 1
< 0.1%
75 1
< 0.1%
88 1
< 0.1%
92 1
< 0.1%
95 1
< 0.1%
ValueCountFrequency (%)
1078995 1
< 0.1%
1078983 1
< 0.1%
1078897 1
< 0.1%
1078361 1
< 0.1%
1078360 1
< 0.1%
1078337 1
< 0.1%
1078304 1
< 0.1%
1078262 1
< 0.1%
1078145 1
< 0.1%
1078140 1
< 0.1%

KEYWORD
Text

UNIQUE 

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-11T08:22:40.287819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length18
Mean length4.5212
Min length2

Characters and Unicode

Total characters45212
Distinct characters1170
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row인력부족
2nd row회차로변
3rd row좋치않
4th row토석채취허
5th row개판오분직전
ValueCountFrequency (%)
2
 
< 0.1%
2
 
< 0.1%
아무 2
 
< 0.1%
혼자사 1
 
< 0.1%
도로소통 1
 
< 0.1%
창워시 1
 
< 0.1%
설치를해야된다 1
 
< 0.1%
뒷길 1
 
< 0.1%
동선리 1
 
< 0.1%
양덕동메트로시티 1
 
< 0.1%
Other values (9993) 9993
99.9%
2023-12-11T08:22:40.667872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
949
 
2.1%
841
 
1.9%
781
 
1.7%
729
 
1.6%
682
 
1.5%
560
 
1.2%
532
 
1.2%
530
 
1.2%
488
 
1.1%
475
 
1.1%
Other values (1160) 38645
85.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 45206
> 99.9%
Space Separator 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
949
 
2.1%
841
 
1.9%
781
 
1.7%
729
 
1.6%
682
 
1.5%
560
 
1.2%
532
 
1.2%
530
 
1.2%
488
 
1.1%
475
 
1.1%
Other values (1159) 38639
85.5%
Space Separator
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 45206
> 99.9%
Common 6
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
949
 
2.1%
841
 
1.9%
781
 
1.7%
729
 
1.6%
682
 
1.5%
560
 
1.2%
532
 
1.2%
530
 
1.2%
488
 
1.1%
475
 
1.1%
Other values (1159) 38639
85.5%
Common
ValueCountFrequency (%)
6
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 45206
> 99.9%
ASCII 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
949
 
2.1%
841
 
1.9%
781
 
1.7%
729
 
1.6%
682
 
1.5%
560
 
1.2%
532
 
1.2%
530
 
1.2%
488
 
1.1%
475
 
1.1%
Other values (1159) 38639
85.5%
ASCII
ValueCountFrequency (%)
6
100.0%

WORD
Text

MISSING 

Distinct110
Distinct (%)100.0%
Missing9890
Missing (%)98.9%
Memory size156.2 KiB
2023-12-11T08:22:40.934882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length2
Mean length2.5
Min length2

Characters and Unicode

Total characters275
Distinct characters186
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique110 ?
Unique (%)100.0%

Sample

1st rowQKF
2nd row시도새올
3rd row고객만족도
4th row어물
5th row아니다
ValueCountFrequency (%)
이런건 1
 
0.9%
호기심 1
 
0.9%
정직 1
 
0.9%
위반 1
 
0.9%
창의적 1
 
0.9%
업체 1
 
0.9%
때문 1
 
0.9%
답변 1
 
0.9%
공개 1
 
0.9%
판사 1
 
0.9%
Other values (100) 100
90.9%
2023-12-11T08:22:41.344503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5
 
1.8%
5
 
1.8%
5
 
1.8%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
3
 
1.1%
Other values (176) 233
84.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 269
97.8%
Lowercase Letter 3
 
1.1%
Uppercase Letter 3
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5
 
1.9%
5
 
1.9%
5
 
1.9%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
3
 
1.1%
Other values (170) 227
84.4%
Lowercase Letter
ValueCountFrequency (%)
o 1
33.3%
m 1
33.3%
c 1
33.3%
Uppercase Letter
ValueCountFrequency (%)
Q 1
33.3%
K 1
33.3%
F 1
33.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 269
97.8%
Latin 6
 
2.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5
 
1.9%
5
 
1.9%
5
 
1.9%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
3
 
1.1%
Other values (170) 227
84.4%
Latin
ValueCountFrequency (%)
o 1
16.7%
m 1
16.7%
c 1
16.7%
Q 1
16.7%
K 1
16.7%
F 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 269
97.8%
ASCII 6
 
2.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5
 
1.9%
5
 
1.9%
5
 
1.9%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
4
 
1.5%
3
 
1.1%
Other values (170) 227
84.4%
ASCII
ValueCountFrequency (%)
o 1
16.7%
m 1
16.7%
c 1
16.7%
Q 1
16.7%
K 1
16.7%
F 1
16.7%

TYPE
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9890 
불용어
 
55
부정
 
30
긍정
 
25

Length

Max length4
Median length4
Mean length3.9835
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9890
98.9%
불용어 55
 
0.5%
부정 30
 
0.3%
긍정 25
 
0.2%

Length

2023-12-11T08:22:41.810642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:22:41.922199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9890
98.9%
불용어 55
 
0.5%
부정 30
 
0.3%
긍정 25
 
0.2%

긍부정구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
9947 
부정
 
29
긍정
 
24

Length

Max length4
Median length4
Mean length3.9894
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 9947
99.5%
부정 29
 
0.3%
긍정 24
 
0.2%

Length

2023-12-11T08:22:42.058664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:22:42.191134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 9947
99.5%
부정 29
 
0.3%
긍정 24
 
0.2%

단어
Text

MISSING 

Distinct53
Distinct (%)100.0%
Missing9947
Missing (%)99.5%
Memory size156.2 KiB
2023-12-11T08:22:42.431153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length3
Median length2
Mean length2.0943396
Min length2

Characters and Unicode

Total characters111
Distinct characters88
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique53 ?
Unique (%)100.0%

Sample

1st row시비
2nd row즐거움
3rd row거짓말
4th row불편함
5th row곤혹
ValueCountFrequency (%)
대단 1
 
1.9%
안심 1
 
1.9%
진정 1
 
1.9%
다행 1
 
1.9%
활약 1
 
1.9%
지적 1
 
1.9%
합리 1
 
1.9%
해결 1
 
1.9%
온화 1
 
1.9%
지루 1
 
1.9%
Other values (43) 43
81.1%
2023-12-11T08:22:42.780406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
Other values (78) 87
78.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 111
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
Other values (78) 87
78.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 111
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
Other values (78) 87
78.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 111
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3
 
2.7%
3
 
2.7%
3
 
2.7%
3
 
2.7%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
Other values (78) 87
78.4%

Interactions

2023-12-11T08:22:39.472384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T08:22:42.884137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번TYPE긍부정구분단어
연번1.000NaNNaNNaN
TYPENaN1.0000.0241.000
긍부정구분NaN0.0241.0001.000
단어NaN1.0001.0001.000
2023-12-11T08:22:42.980945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
긍부정구분TYPE
긍부정구분1.0000.028
TYPE0.0281.000
2023-12-11T08:22:43.079080image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번TYPE긍부정구분
연번1.0001.0001.000
TYPE1.0001.0000.028
긍부정구분1.0000.0281.000

Missing values

2023-12-11T08:22:39.564502image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T08:22:39.652519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T08:22:39.749277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번KEYWORDWORDTYPE긍부정구분단어
38929269389인력부족<NA><NA><NA><NA>
68667661996회차로변<NA><NA><NA><NA>
47225363889좋치않<NA><NA><NA><NA>
74254735582토석채취허<NA><NA><NA><NA>
1913281755개판오분직전<NA><NA><NA><NA>
74094733856방문에정<NA><NA><NA><NA>
73033721648마산조각공원앞<NA><NA><NA><NA>
68935666819연극제<NA><NA><NA><NA>
38389264373창원종합운동장옆<NA><NA><NA><NA>
76798792338끊겼다하<NA><NA><NA><NA>
연번KEYWORDWORDTYPE긍부정구분단어
82973881097처리하는점에대하<NA><NA><NA><NA>
46712361057박점숙<NA><NA><NA><NA>
73809731290들껑거리는소음<NA><NA><NA><NA>
26152142817안돌아오고있음<NA><NA><NA><NA>
39482274345문제있는부분<NA><NA><NA><NA>
31287199068어쭙고싶습니다<NA><NA><NA><NA>
1195238836느꼇습니다<NA><NA><NA><NA>
81578865780주차해놓은거<NA><NA><NA><NA>
41994307437한림리치빌<NA><NA><NA><NA>
945721073604하겠되<NA><NA><NA><NA>