gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	152
Missing cells	20
Missing cells (%)	2.6%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	6.2 KiB
Average record size in memory	41.9 B

Variable types

Text	3
Categorical	1
DateTime	1

Dataset

Description	대전광역시 시설관리공단에서 운영중인 대전역 앞 지하도 상가(동구 중앙로 지하 200)의 점포에 대한 정보이력(일렬번호, 점포이름, 전화번호, 구분, 등록일) 제공
Author	대전광역시시설관리공단
URL	https://www.data.go.kr/data/15123949/fileData.do

Alerts

`구분` is highly imbalanced (54.1%)	Imbalance
`전화번호` has 20 (13.2%) missing values	Missing
`일렬번호` has unique values	Unique

Reproduction

Analysis started	2023-12-12 12:45:03.956881
Analysis finished	2023-12-12 12:45:04.519407
Duration	0.56 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

일렬번호
Text

UNIQUE

Distinct	152
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

Length

Max length	13
Median length	13
Mean length	13
Min length	13

Characters and Unicode

Total characters	1976
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	152 ?
Unique (%)	100.0%

Sample

1st row	MH20190611068
2nd row	MH20190611069
3rd row	MH20190611070
4th row	MH20190611071
5th row	MH20190611072

Value	Count	Frequency (%)
mh20190611068	1	0.7%
mh20190611173	1	0.7%
mh20190612182	1	0.7%
mh20190611167	1	0.7%
mh20190611168	1	0.7%
mh20190611169	1	0.7%
mh20190611170	1	0.7%
mh20190611171	1	0.7%
mh20190611172	1	0.7%
mh20190611175	1	0.7%
Other values (142)	142	93.4%

Most occurring characters

Value	Count	Frequency (%)
1	534	27.0%
0	366	18.5%
2	222	11.2%
9	189	9.6%
6	182	9.2%
M	152	7.7%
H	152	7.7%
8	49	2.5%
7	39	2.0%
3	35	1.8%
Other values (2)	56	2.8%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1672	84.6%
Uppercase Letter	304	15.4%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	534	31.9%
0	366	21.9%
2	222	13.3%
9	189	11.3%
6	182	10.9%
8	49	2.9%
7	39	2.3%
3	35	2.1%
4	29	1.7%
5	27	1.6%

Uppercase Letter

Value	Count	Frequency (%)
M	152	50.0%
H	152	50.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1672	84.6%
Latin	304	15.4%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	534	31.9%
0	366	21.9%
2	222	13.3%
9	189	11.3%
6	182	10.9%
8	49	2.9%
7	39	2.3%
3	35	2.1%
4	29	1.7%
5	27	1.6%

Latin

Value	Count	Frequency (%)
M	152	50.0%
H	152	50.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1976	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	534	27.0%
0	366	18.5%
2	222	11.2%
9	189	9.6%
6	182	9.2%
M	152	7.7%
H	152	7.7%
8	49	2.5%
7	39	2.0%
3	35	1.8%
Other values (2)	56	2.8%

점포이름
Text

Distinct	74
Distinct (%)	48.7%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

Length

Max length	9
Median length	6.5
Mean length	4.0592105
Min length	1

Characters and Unicode

Total characters	617
Distinct characters	143
Distinct categories	6 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	28 ?
Unique (%)	18.4%

Sample

1st row	몽실
2nd row	몽실
3rd row	마모스
4th row	흙비
5th row	흙비

Value	Count	Frequency (%)
기린전자통신	11	7.1%
삼광전자	5	3.2%
밤블비	5	3.2%
크로커다일	4	2.6%
올포유	4	2.6%
짱	4	2.6%
여성크로커다일	4	2.6%
청바지코너	3	1.9%
호키랜드	3	1.9%
예시점포	3	1.9%
Other values (65)	108	70.1%

Most occurring characters

Value	Count	Frequency (%)
통	30	4.9%
신	30	4.9%
자	25	4.1%
전	24	3.9%
스	14	2.3%
크	13	2.1%
린	13	2.1%
기	12	1.9%
지	12	1.9%
포	11	1.8%
Other values (133)	433	70.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	595	96.4%
Decimal Number	10	1.6%
Uppercase Letter	8	1.3%
Space Separator	2	0.3%
Close Punctuation	1	0.2%
Open Punctuation	1	0.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
통	30	5.0%
신	30	5.0%
자	25	4.2%
전	24	4.0%
스	14	2.4%
크	13	2.2%
린	13	2.2%
기	12	2.0%
지	12	2.0%
포	11	1.8%
Other values (120)	411	69.1%

Decimal Number

Value	Count	Frequency (%)
2	5	50.0%
6	1	10.0%
1	1	10.0%
0	1	10.0%
5	1	10.0%
4	1	10.0%

Uppercase Letter

Value	Count	Frequency (%)
N	2	25.0%
E	2	25.0%
W	2	25.0%
T	2	25.0%

Space Separator

Value	Count	Frequency (%)
	2	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	595	96.4%
Common	14	2.3%
Latin	8	1.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
통	30	5.0%
신	30	5.0%
자	25	4.2%
전	24	4.0%
스	14	2.4%
크	13	2.2%
린	13	2.2%
기	12	2.0%
지	12	2.0%
포	11	1.8%
Other values (120)	411	69.1%

Common

Value	Count	Frequency (%)
2	5	35.7%
	2	14.3%
6	1	7.1%
1	1	7.1%
0	1	7.1%
5	1	7.1%
4	1	7.1%
)	1	7.1%
(	1	7.1%

Latin

Value	Count	Frequency (%)
N	2	25.0%
E	2	25.0%
W	2	25.0%
T	2	25.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	595	96.4%
ASCII	22	3.6%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
통	30	5.0%
신	30	5.0%
자	25	4.2%
전	24	4.0%
스	14	2.4%
크	13	2.2%
린	13	2.2%
기	12	2.0%
지	12	2.0%
포	11	1.8%
Other values (120)	411	69.1%

ASCII

Value	Count	Frequency (%)
2	5	22.7%
	2	9.1%
N	2	9.1%
E	2	9.1%
W	2	9.1%
T	2	9.1%
6	1	4.5%
1	1	4.5%
0	1	4.5%
5	1	4.5%
Other values (3)	3	13.6%

전화번호
Text

MISSING

Distinct	60
Distinct (%)	45.5%
Missing	20
Missing (%)	13.2%
Memory size	1.3 KiB

Length

Max length	10
Median length	8
Mean length	7.9166667
Min length	1

Characters and Unicode

Total characters	1045
Distinct characters	12
Distinct categories	3 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	19 ?
Unique (%)	14.4%

Sample

1st row	221-0893
2nd row	221-0893
3rd row	242-4075
4th row	222-5636
5th row	222-5636

Value	Count	Frequency (%)
633-3701	11	8.3%
255-7838	5	3.8%
222-6999	5	3.8%
222-8525	4	3.0%
252-0036	4	3.0%
257-9064	4	3.0%
256-0842	3	2.3%
221-9867	3	2.3%
256-2990	3	2.3%
257-1754	3	2.3%
Other values (50)	87	65.9%

Most occurring characters

Value	Count	Frequency (%)
2	223	21.3%
5	136	13.0%
-	131	12.5%
6	90	8.6%
3	89	8.5%
7	81	7.8%
0	70	6.7%
1	59	5.6%
4	59	5.6%
9	54	5.2%
Other values (2)	53	5.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	912	87.3%
Dash Punctuation	131	12.5%
Math Symbol	2	0.2%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
2	223	24.5%
5	136	14.9%
6	90	9.9%
3	89	9.8%
7	81	8.9%
0	70	7.7%
1	59	6.5%
4	59	6.5%
9	54	5.9%
8	51	5.6%

Dash Punctuation

Value	Count	Frequency (%)
-	131	100.0%

Math Symbol

Value	Count	Frequency (%)
~	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	1045	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
2	223	21.3%
5	136	13.0%
-	131	12.5%
6	90	8.6%
3	89	8.5%
7	81	7.8%
0	70	6.7%
1	59	5.6%
4	59	5.6%
9	54	5.2%
Other values (2)	53	5.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1045	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
2	223	21.3%
5	136	13.0%
-	131	12.5%
6	90	8.6%
3	89	8.5%
7	81	7.8%
0	70	6.7%
1	59	5.6%
4	59	5.6%
9	54	5.2%
Other values (2)	53	5.1%

구분
Categorical

IMBALANCE

Distinct	6
Distinct (%)	3.9%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

3	112
5	28
2	7
1	3
4	1

Length

Max length	4
Median length	1
Mean length	1.0197368
Min length	1

Unique

Unique	2 ?
Unique (%)	1.3%

Sample

1st row	3
2nd row	3
3rd row	3
4th row	3
5th row	3

Common Values

Value	Count	Frequency (%)
3	112	73.7%
5	28	18.4%
2	7	4.6%
1	3	2.0%
4	1	0.7%
<NA>	1	0.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
3	112	73.7%
5	28	18.4%
2	7	4.6%
1	3	2.0%
4	1	0.7%
na	1	0.7%

등록일
Date

Distinct	12
Distinct (%)	7.9%
Missing	0
Missing (%)	0.0%
Memory size	1.3 KiB

Minimum	2019-06-11 00:00:00
Maximum	2019-07-31 00:00:00

Histogram

Histogram with fixed size bins (bins=12)

Phik (φk)

Heatmap
Table

	점포이름	전화번호	구분	등록일
점포이름	1.000	1.000	0.999	0.827
전화번호	1.000	1.000	1.000	0.298
구분	0.999	1.000	1.000	0.375
등록일	0.827	0.298	0.375	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	일렬번호	점포이름	전화번호	구분	등록일
0	MH20190611068	몽실	221-0893	3	2019-06-11
1	MH20190611069	몽실	221-0893	3	2019-06-11
2	MH20190611070	마모스	242-4075	3	2019-06-11
3	MH20190611071	흙비	222-5636	3	2019-06-11
4	MH20190611072	흙비	222-5636	3	2019-06-11
5	MH20190611073	해풍사	256-2039	3	2019-06-11
6	MH20190611074	보라	252-1601	3	2019-06-11
7	MH20190611075	보라	<NA>	3	2019-06-11
8	MH20190611076	탐나라	252-4857	3	2019-06-11
9	MH20190611077	탐나라	252-4857	3	2019-06-11

	일렬번호	점포이름	전화번호	구분	등록일
142	MH20190626238	나열16호	<NA>	2	2019-06-26
143	MH20190627244	크로커다일	256-2990	3	2019-06-27
144	MH20190628262	테스트점포	<NA>	3	2019-06-28
145	MH20190628265	테스트점포	<NA>	3	2019-06-28
146	MH20190628284	예시점포	<NA>	3	2019-06-28
147	MH20190628285	예시점포2	<NA>	3	2019-06-28
148	MH20190705288	예시점포2	<NA>	3	2019-07-05
149	MH20190705289	예시점포	<NA>	3	2019-07-05
150	MH20190708297	예시점포	<NA>	3	2019-07-08
151	MH20190731301	비네아	<NA>	3	2019-07-31

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Uppercase Letter

Space Separator

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Latin

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Dash Punctuation

Math Symbol

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample