gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	34
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	982.0 B
Average record size in memory	28.9 B

Variable types

Text	1
Categorical	1
Numeric	1

Dataset

Description	부산광역시남구외국인현황_20210131
Author	부산광역시 남구
URL	http://data.busan.go.kr/dataSet/detail.nm?contentId=10&publicdatapk=15009663

Reproduction

Analysis started	2023-12-10 17:48:52.968748
Analysis finished	2023-12-10 17:48:53.703044
Duration	0.73 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

동명
Text

Distinct	17
Distinct (%)	50.0%
Missing	0
Missing (%)	0.0%
Memory size	404.0 B

Length

Max length	4
Median length	4
Mean length	3.8823529
Min length	3

Characters and Unicode

Total characters	132
Distinct characters	18
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	대연1동
2nd row	대연1동
3rd row	대연3동
4th row	대연3동
5th row	대연4동

Value	Count	Frequency (%)
대연1동	2	5.9%
용당동	2	5.9%
문현3동	2	5.9%
문현2동	2	5.9%
문현1동	2	5.9%
우암동	2	5.9%
감만2동	2	5.9%
감만1동	2	5.9%
용호4동	2	5.9%
대연3동	2	5.9%
Other values (7)	14	41.2%

Most occurring characters

Value	Count	Frequency (%)
동	34	25.8%
대	10	7.6%
용	10	7.6%
연	10	7.6%
문	8	6.1%
호	8	6.1%
1	8	6.1%
현	8	6.1%
4	6	4.5%
3	6	4.5%
Other values (8)	24	18.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	102	77.3%
Decimal Number	30	22.7%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	34	33.3%
대	10	9.8%
용	10	9.8%
연	10	9.8%
문	8	7.8%
호	8	7.8%
현	8	7.8%
감	4	3.9%
만	4	3.9%
당	2	2.0%
Other values (2)	4	3.9%

Decimal Number

Value	Count	Frequency (%)
1	8	26.7%
4	6	20.0%
3	6	20.0%
2	6	20.0%
5	2	6.7%
6	2	6.7%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	102	77.3%
Common	30	22.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	34	33.3%
대	10	9.8%
용	10	9.8%
연	10	9.8%
문	8	7.8%
호	8	7.8%
현	8	7.8%
감	4	3.9%
만	4	3.9%
당	2	2.0%
Other values (2)	4	3.9%

Common

Value	Count	Frequency (%)
1	8	26.7%
4	6	20.0%
3	6	20.0%
2	6	20.0%
5	2	6.7%
6	2	6.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	102	77.3%
ASCII	30	22.7%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
동	34	33.3%
대	10	9.8%
용	10	9.8%
연	10	9.8%
문	8	7.8%
호	8	7.8%
현	8	7.8%
감	4	3.9%
만	4	3.9%
당	2	2.0%
Other values (2)	4	3.9%

ASCII

Value	Count	Frequency (%)
1	8	26.7%
4	6	20.0%
3	6	20.0%
2	6	20.0%
5	2	6.7%
6	2	6.7%

남성_여성
Categorical

Distinct	2
Distinct (%)	5.9%
Missing	0
Missing (%)	0.0%
Memory size	404.0 B

남	17
여	17

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	남
2nd row	여
3rd row	남
4th row	여
5th row	남

Common Values

Value	Count	Frequency (%)
남	17	50.0%
여	17	50.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
남	17	50.0%
여	17	50.0%

외국인인구수
Real number (ℝ)

Distinct	30
Distinct (%)	88.2%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	145.82353

Minimum	4
Maximum	1158
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	438.0 B

Quantile statistics

Minimum	4
5-th percentile	11.95
Q1	25.25
median	45
Q3	84.5
95-th percentile	822.95
Maximum	1158
Range	1154
Interquartile range (IQR)	59.25

Descriptive statistics

Standard deviation	280.89149
Coefficient of variation (CV)	1.9262426
Kurtosis	7.6626788
Mean	145.82353
Median Absolute Deviation (MAD)	21
Skewness	2.8772852
Sum	4958
Variance	78900.029
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=30)

Value	Count	Frequency (%)
25	2	5.9%
85	2	5.9%
55	2	5.9%
37	2	5.9%
691	1	2.9%
83	1	2.9%
23	1	2.9%
47	1	2.9%
27	1	2.9%
29	1	2.9%
Other values (20)	20	58.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
4	1	2.9%
10	1	2.9%
13	1	2.9%
14	1	2.9%
21	1	2.9%
22	1	2.9%
23	1	2.9%
25	2	5.9%
26	1	2.9%
27	1	2.9%

Value	Count	Frequency (%)
1158	1	2.9%
1068	1	2.9%
691	1	2.9%
468	1	2.9%
209	1	2.9%
202	1	2.9%
86	1	2.9%
85	2	5.9%
83	1	2.9%
65	1	2.9%

외국인인구수

외국인인구수

Phik (φk)
Auto

Heatmap
Table

	동명	남성_여성	외국인인구수
동명	1.000	0.000	0.893
남성_여성	0.000	1.000	0.000
외국인인구수	0.893	0.000	1.000

Heatmap
Table

	외국인인구수	남성_여성
외국인인구수	1.000	0.000
남성_여성	0.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	동명	남성_여성	외국인인구수
0	대연1동	남	691
1	대연1동	여	468
2	대연3동	남	1158
3	대연3동	여	1068
4	대연4동	남	13
5	대연4동	여	33
6	대연5동	남	85
7	대연5동	여	86
8	대연6동	남	55
9	대연6동	여	50

	동명	남성_여성	외국인인구수
24	우암동	남	14
25	우암동	여	37
26	문현1동	남	10
27	문현1동	여	29
28	문현2동	남	25
29	문현2동	여	37
30	문현3동	남	27
31	문현3동	여	47
32	문현4동	남	25
33	문현4동	여	23

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample