Overview

Dataset statistics

Number of variables5
Number of observations1140
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory48.0 KiB
Average record size in memory43.1 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description김해시에서 통계기반 도시현황 파악을 위해 개발한 통계지수 중 하나로서, 통계연도, 시도명, 시군구명, 남자 흡연율(퍼센트), 현재 흡연율(퍼센트)로 구성되어 있습니다. 김해시 중심의 통계지수로서, 데이터 수집, 가공 등의 어려움으로 김해시 외 지역의 정보는 누락될 수 있습니다.
Author경상남도 김해시
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15110156

Alerts

남자 흡연율(퍼센트) is highly overall correlated with 현재 흡연율(퍼센트)High correlation
현재 흡연율(퍼센트) is highly overall correlated with 남자 흡연율(퍼센트)High correlation

Reproduction

Analysis started2023-12-10 23:17:18.876193
Analysis finished2023-12-10 23:17:19.739481
Duration0.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

통계연도
Categorical

Distinct5
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size9.0 KiB
2017
228 
2018
228 
2019
228 
2020
228 
2021
228 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2017
2nd row2017
3rd row2017
4th row2017
5th row2017

Common Values

ValueCountFrequency (%)
2017 228
20.0%
2018 228
20.0%
2019 228
20.0%
2020 228
20.0%
2021 228
20.0%

Length

2023-12-11T08:17:19.836653image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T08:17:19.944937image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2017 228
20.0%
2018 228
20.0%
2019 228
20.0%
2020 228
20.0%
2021 228
20.0%

시도명
Categorical

Distinct16
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size9.0 KiB
경기도
155 
서울특별시
125 
경상북도
115 
전라남도
110 
강원도
90 
Other values (11)
545 

Length

Max length7
Median length5
Mean length4.1359649
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도
2nd row강원도
3rd row강원도
4th row강원도
5th row강원도

Common Values

ValueCountFrequency (%)
경기도 155
13.6%
서울특별시 125
11.0%
경상북도 115
10.1%
전라남도 110
9.6%
강원도 90
7.9%
경상남도 90
7.9%
부산광역시 80
7.0%
충청남도 75
6.6%
전라북도 70
 
6.1%
충청북도 55
 
4.8%
Other values (6) 175
15.4%

Length

2023-12-11T08:17:20.095268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경기도 155
13.6%
서울특별시 125
11.0%
경상북도 115
10.1%
전라남도 110
9.6%
강원도 90
7.9%
경상남도 90
7.9%
부산광역시 80
7.0%
충청남도 75
6.6%
전라북도 70
 
6.1%
충청북도 55
 
4.8%
Other values (6) 175
15.4%
Distinct206
Distinct (%)18.1%
Missing0
Missing (%)0.0%
Memory size9.0 KiB
2023-12-11T08:17:20.502572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length2.9342105
Min length2

Characters and Unicode

Total characters3345
Distinct characters132
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강릉시
2nd row고성군
3rd row동해시
4th row삼척시
5th row속초시
ValueCountFrequency (%)
중구 30
 
2.6%
동구 30
 
2.6%
서구 25
 
2.2%
북구 20
 
1.8%
남구 20
 
1.8%
고성군 10
 
0.9%
강서구 10
 
0.9%
아산시 5
 
0.4%
태안군 5
 
0.4%
청양군 5
 
0.4%
Other values (196) 980
86.0%
2023-12-11T08:17:21.071710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
425
 
12.7%
390
 
11.7%
370
 
11.1%
110
 
3.3%
100
 
3.0%
90
 
2.7%
90
 
2.7%
85
 
2.5%
80
 
2.4%
65
 
1.9%
Other values (122) 1540
46.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 3345
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
425
 
12.7%
390
 
11.7%
370
 
11.1%
110
 
3.3%
100
 
3.0%
90
 
2.7%
90
 
2.7%
85
 
2.5%
80
 
2.4%
65
 
1.9%
Other values (122) 1540
46.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 3345
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
425
 
12.7%
390
 
11.7%
370
 
11.1%
110
 
3.3%
100
 
3.0%
90
 
2.7%
90
 
2.7%
85
 
2.5%
80
 
2.4%
65
 
1.9%
Other values (122) 1540
46.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 3345
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
425
 
12.7%
390
 
11.7%
370
 
11.1%
110
 
3.3%
100
 
3.0%
90
 
2.7%
90
 
2.7%
85
 
2.5%
80
 
2.4%
65
 
1.9%
Other values (122) 1540
46.0%

남자 흡연율(퍼센트)
Real number (ℝ)

HIGH CORRELATION 

Distinct261
Distinct (%)22.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.249386
Minimum20.9
Maximum55.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.1 KiB
2023-12-11T08:17:21.239125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20.9
5-th percentile28.895
Q134.7
median38.3
Q341.8
95-th percentile47.6
Maximum55.6
Range34.7
Interquartile range (IQR)7.1

Descriptive statistics

Standard deviation5.6442524
Coefficient of variation (CV)0.14756452
Kurtosis0.086157958
Mean38.249386
Median Absolute Deviation (MAD)3.6
Skewness-0.022863285
Sum43604.3
Variance31.857585
MonotonicityNot monotonic
2023-12-11T08:17:21.393593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38.0 16
 
1.4%
41.0 16
 
1.4%
37.8 14
 
1.2%
34.1 12
 
1.1%
39.2 12
 
1.1%
42.0 12
 
1.1%
38.4 12
 
1.1%
36.0 12
 
1.1%
40.4 12
 
1.1%
35.5 12
 
1.1%
Other values (251) 1010
88.6%
ValueCountFrequency (%)
20.9 1
 
0.1%
21.1 1
 
0.1%
21.8 1
 
0.1%
22.3 2
0.2%
22.4 1
 
0.1%
23.2 1
 
0.1%
23.8 1
 
0.1%
24.8 3
0.3%
25.0 2
0.2%
25.1 1
 
0.1%
ValueCountFrequency (%)
55.6 1
0.1%
55.4 1
0.1%
55.3 1
0.1%
54.6 1
0.1%
53.9 1
0.1%
53.5 1
0.1%
52.9 1
0.1%
52.6 1
0.1%
52.4 1
0.1%
52.3 1
0.1%

현재 흡연율(퍼센트)
Real number (ℝ)

HIGH CORRELATION 

Distinct157
Distinct (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.504123
Minimum10.5
Maximum30.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size10.1 KiB
2023-12-11T08:17:21.846058image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10.5
5-th percentile15.5
Q118.475
median20.5
Q322.425
95-th percentile25.6
Maximum30.2
Range19.7
Interquartile range (IQR)3.95

Descriptive statistics

Standard deviation3.0609625
Coefficient of variation (CV)0.14928522
Kurtosis0.021345034
Mean20.504123
Median Absolute Deviation (MAD)2
Skewness0.0012038759
Sum23374.7
Variance9.3694913
MonotonicityNot monotonic
2023-12-11T08:17:22.017349image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20.4 23
 
2.0%
21.0 21
 
1.8%
20.6 20
 
1.8%
19.8 20
 
1.8%
21.6 20
 
1.8%
22.4 19
 
1.7%
22.3 19
 
1.7%
20.3 19
 
1.7%
20.7 18
 
1.6%
21.1 17
 
1.5%
Other values (147) 944
82.8%
ValueCountFrequency (%)
10.5 1
 
0.1%
11.3 1
 
0.1%
11.4 1
 
0.1%
11.8 1
 
0.1%
12.1 1
 
0.1%
12.2 1
 
0.1%
12.3 1
 
0.1%
12.6 1
 
0.1%
13.1 3
0.3%
13.2 1
 
0.1%
ValueCountFrequency (%)
30.2 1
 
0.1%
29.3 1
 
0.1%
28.8 2
0.2%
28.3 3
0.3%
28.2 1
 
0.1%
28.1 1
 
0.1%
27.9 1
 
0.1%
27.7 2
0.2%
27.6 2
0.2%
27.5 3
0.3%

Interactions

2023-12-11T08:17:19.324870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:17:19.121841image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:17:19.454712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T08:17:19.223954image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T08:17:22.136311image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통계연도시도명남자 흡연율(퍼센트)현재 흡연율(퍼센트)
통계연도1.0000.0000.3890.376
시도명0.0001.0000.3630.307
남자 흡연율(퍼센트)0.3890.3631.0000.961
현재 흡연율(퍼센트)0.3760.3070.9611.000
2023-12-11T08:17:22.240562image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
통계연도시도명
통계연도1.0000.000
시도명0.0001.000
2023-12-11T08:17:22.333665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
남자 흡연율(퍼센트)현재 흡연율(퍼센트)통계연도시도명
남자 흡연율(퍼센트)1.0000.9650.1710.150
현재 흡연율(퍼센트)0.9651.0000.1650.125
통계연도0.1710.1651.0000.000
시도명0.1500.1250.0001.000

Missing values

2023-12-11T08:17:19.585495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T08:17:19.691292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

통계연도시도명시군구명남자 흡연율(퍼센트)현재 흡연율(퍼센트)
02017강원도강릉시38.820.6
12017강원도고성군37.519.6
22017강원도동해시41.422.2
32017강원도삼척시38.520.9
42017강원도속초시43.722.8
52017강원도양구군44.022.1
62017강원도양양군48.225.2
72017강원도영월군44.023.0
82017강원도원주시43.725.0
92017강원도인제군44.623.8
통계연도시도명시군구명남자 흡연율(퍼센트)현재 흡연율(퍼센트)
11302021인천광역시중구35.019.7
11312021인천광역시강화군31.317.8
11322021인천광역시계양구35.218.7
11332021인천광역시남동구36.219.2
11342021인천광역시부평구29.216.8
11352021인천광역시연수구30.316.9
11362021인천광역시옹진군43.322.8
11372021인천광역시미추홀구47.827.6
11382021제주특별자치도제주시35.419.4
11392021제주특별자치도서귀포시38.021.6