gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	250
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	6.2 KiB
Average record size in memory	25.5 B

Variable types

Numeric	1
Categorical	1
Text	1

Dataset

Description	JDC 지정면세점_입점 브랜드 현황(`15년 11월 기준)
Author	제주국제자유도시개발센터
URL	https://www.data.go.kr/data/15044052/fileData.do

Alerts

`연번` is highly overall correlated with `품종`	High correlation
`품종` is highly overall correlated with `연번`	High correlation
`연번` has unique values	Unique

Reproduction

Analysis started	2023-12-12 13:11:51.963979
Analysis finished	2023-12-12 13:11:52.388693
Duration	0.42 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

연번
Real number (ℝ)

HIGH CORRELATION UNIQUE

Distinct	250
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	125.5

Minimum	1
Maximum	250
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	2.3 KiB

Quantile statistics

Minimum	1
5-th percentile	13.45
Q1	63.25
median	125.5
Q3	187.75
95-th percentile	237.55
Maximum	250
Range	249
Interquartile range (IQR)	124.5

Descriptive statistics

Standard deviation	72.312977
Coefficient of variation (CV)	0.57619902
Kurtosis	-1.2
Mean	125.5
Median Absolute Deviation (MAD)	62.5
Skewness	0
Sum	31375
Variance	5229.1667
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1	1	0.4%
173	1	0.4%
160	1	0.4%
161	1	0.4%
162	1	0.4%
163	1	0.4%
164	1	0.4%
165	1	0.4%
166	1	0.4%
167	1	0.4%
Other values (240)	240	96.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.4%
2	1	0.4%
3	1	0.4%
4	1	0.4%
5	1	0.4%
6	1	0.4%
7	1	0.4%
8	1	0.4%
9	1	0.4%
10	1	0.4%

Value	Count	Frequency (%)
250	1	0.4%
249	1	0.4%
248	1	0.4%
247	1	0.4%
246	1	0.4%
245	1	0.4%
244	1	0.4%
243	1	0.4%
242	1	0.4%
241	1	0.4%

품종
Categorical

HIGH CORRELATION

Distinct	12
Distinct (%)	4.8%
Missing	0
Missing (%)	0.0%
Memory size	2.1 KiB

선글라스	42
주류	36
화장품	35
패션	33
담배	27
Other values (7)	77

Length

Max length	4
Median length	2
Mean length	2.628
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	담배
2nd row	담배
3rd row	담배
4th row	담배
5th row	담배

Common Values

Value	Count	Frequency (%)
선글라스	42	16.8%
주류	36	14.4%
화장품	35	14.0%
패션	33	13.2%
담배	27	10.8%
시계	25	10.0%
향수	19	7.6%
액세서리	13	5.2%
초콜렛	12	4.8%
문구	3	1.2%
Other values (2)	5	2.0%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
선글라스	42	16.8%
주류	36	14.4%
화장품	35	14.0%
패션	33	13.2%
담배	27	10.8%
시계	25	10.0%
향수	19	7.6%
액세서리	13	5.2%
초콜렛	12	4.8%
문구	3	1.2%
Other values (2)	5	2.0%

브랜드
Text

Distinct	220
Distinct (%)	88.0%
Missing	0
Missing (%)	0.0%
Memory size	2.1 KiB

Length

Max length	17
Median length	14
Mean length	7.732
Min length	3

Characters and Unicode

Total characters	1933
Distinct characters	96
Distinct categories	8 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	195 ?
Unique (%)	78.0%

Sample

1st row	PIANISSIMO
2nd row	Caster
3rd row	Mevius
4th row	LARK
5th row	VIRGINIA SLIMS

Value	Count	Frequency (%)
kenzo	3	0.9%
kors	3	0.9%
the	3	0.9%
	3	0.9%
lanvin	3	0.9%
davidoff	3	0.9%
c.k	3	0.9%
gucci	3	0.9%
shiseido	2	0.6%
burberry	2	0.6%
Other values (271)	298	91.4%

Most occurring characters

Value	Count	Frequency (%)
a	105	5.4%
e	103	5.3%
i	91	4.7%
	78	4.0%
A	75	3.9%
S	71	3.7%
E	67	3.5%
r	66	3.4%
o	66	3.4%
n	65	3.4%
Other values (86)	1146	59.3%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	892	46.1%
Uppercase Letter	890	46.0%
Space Separator	78	4.0%
Other Letter	35	1.8%
Other Punctuation	29	1.5%
Decimal Number	7	0.4%
Dash Punctuation	1	0.1%
Math Symbol	1	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
카	2	5.7%
정	2	5.7%
베	2	5.7%
루	1	2.9%
블	1	2.9%
천	1	2.9%
년	1	2.9%
치	1	2.9%
관	1	2.9%
성	1	2.9%
Other values (22)	22	62.9%

Lowercase Letter

Value	Count	Frequency (%)
a	105	11.8%
e	103	11.5%
i	91	10.2%
r	66	7.4%
o	66	7.4%
n	65	7.3%
l	59	6.6%
s	51	5.7%
u	33	3.7%
c	33	3.7%
Other values (16)	220	24.7%

Uppercase Letter

Value	Count	Frequency (%)
A	75	8.4%
S	71	8.0%
E	67	7.5%
O	59	6.6%
I	58	6.5%
L	58	6.5%
N	53	6.0%
R	50	5.6%
T	43	4.8%
C	43	4.8%
Other values (16)	313	35.2%

Decimal Number

Value	Count	Frequency (%)
5	3	42.9%
3	1	14.3%
7	1	14.3%
9	1	14.3%
2	1	14.3%

Other Punctuation

Value	Count	Frequency (%)
.	19	65.5%
'	5	17.2%
&	4	13.8%
/	1	3.4%

Space Separator

Value	Count	Frequency (%)
	78	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Math Symbol

Value	Count	Frequency (%)
+	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	1782	92.2%
Common	116	6.0%
Hangul	35	1.8%

Most frequent character per script

Latin

Value	Count	Frequency (%)
a	105	5.9%
e	103	5.8%
i	91	5.1%
A	75	4.2%
S	71	4.0%
E	67	3.8%
r	66	3.7%
o	66	3.7%
n	65	3.6%
l	59	3.3%
Other values (42)	1014	56.9%

Hangul

Value	Count	Frequency (%)
카	2	5.7%
정	2	5.7%
베	2	5.7%
루	1	2.9%
블	1	2.9%
천	1	2.9%
년	1	2.9%
치	1	2.9%
관	1	2.9%
성	1	2.9%
Other values (22)	22	62.9%

Common

Value	Count	Frequency (%)
	78	67.2%
.	19	16.4%
'	5	4.3%
&	4	3.4%
5	3	2.6%
3	1	0.9%
7	1	0.9%
/	1	0.9%
-	1	0.9%
+	1	0.9%
Other values (2)	2	1.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1898	98.2%
Hangul	35	1.8%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
a	105	5.5%
e	103	5.4%
i	91	4.8%
	78	4.1%
A	75	4.0%
S	71	3.7%
E	67	3.5%
r	66	3.5%
o	66	3.5%
n	65	3.4%
Other values (54)	1111	58.5%

Hangul

Value	Count	Frequency (%)
카	2	5.7%
정	2	5.7%
베	2	5.7%
루	1	2.9%
블	1	2.9%
천	1	2.9%
년	1	2.9%
치	1	2.9%
관	1	2.9%
성	1	2.9%
Other values (22)	22	62.9%

연번

연번

Phik (φk)
Auto

Heatmap
Table

	연번	품종
연번	1.000	0.930
품종	0.930	1.000

Heatmap
Table

	연번	품종
연번	1.000	0.747
품종	0.747	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	연번	품종	브랜드
0	1	담배	PIANISSIMO
1	2	담배	Caster
2	3	담배	Mevius
3	4	담배	LARK
4	5	담배	VIRGINIA SLIMS
5	6	담배	Marlboro
6	7	담배	Parliament
7	8	담배	Davidoff
8	9	담배	LUCKY STRIKE
9	10	담배	SE 555

	연번	품종	브랜드
240	241	화장품	Clinique
241	242	화장품	Estee Lauder
242	243	화장품	Lab
243	244	화장품	Mac
244	245	화장품	Origins
245	246	화장품	YSL
246	247	화장품	Elizabeth Arden
247	248	화장품	SK-ll
248	249	화장품	Make Up For Ever
249	250	화장품	L'OREAL PARIS

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Lowercase Letter

Uppercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Math Symbol

Most occurring scripts

Most frequent character per script

Latin

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Interactions

Correlations

Missing values

Sample