gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	1001
Missing cells	3
Missing cells (%)	0.1%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	23.6 KiB
Average record size in memory	24.1 B

Variable types

DateTime	1
Text	2

Dataset

Description	한국주택금융공사 주택연금부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터) 기산일자,보증번호,등록일시에 관한 데이터가 포함되어있습니다.
Author	한국주택금융공사
URL	https://www.data.go.kr/data/15073005/fileData.do

Reproduction

Analysis started	2023-12-12 16:23:43.145014
Analysis finished	2023-12-12 16:23:43.760026
Duration	0.62 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

기산일자
Date

Distinct	235
Distinct (%)	23.5%
Missing	1
Missing (%)	0.1%
Memory size	7.9 KiB

Minimum	2017-01-26 00:00:00
Maximum	2020-10-14 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

보증번호
Text

Distinct	945
Distinct (%)	94.4%
Missing	0
Missing (%)	0.0%
Memory size	7.9 KiB

Length

Max length	16
Median length	14
Mean length	14.001998
Min length	14

Characters and Unicode

Total characters	14016
Distinct characters	27
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	909 ?
Unique (%)	90.8%

Sample

1st row	RTNA2014000084
2nd row	RTAC2019000175
3rd row	RTAD2019000401
4th row	RTLA2020000099
5th row	RTNA2020000171

Value	Count	Frequency (%)
rtna2018000176	7	0.7%
rtpa2016000197	5	0.5%
rqad2010000072	4	0.4%
rtqa2012000017	4	0.4%
rtqa2014000095	3	0.3%
rtqa2017000212	3	0.3%
rtac2019000098	3	0.3%
rtpa2018000235	3	0.3%
rtqa2013000027	3	0.3%
rtpb2017000071	3	0.3%
Other values (936)	964	96.2%

Most occurring characters

Value	Count	Frequency (%)
0	4533	32.3%
2	1441	10.3%
1	1421	10.1%
R	1002	7.1%
A	941	6.7%
T	883	6.3%
6	440	3.1%
3	413	2.9%
4	396	2.8%
5	369	2.6%
Other values (17)	2177	15.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	10012	71.4%
Uppercase Letter	4000	28.5%
Dash Punctuation	2	< 0.1%
Space Separator	1	< 0.1%
Other Punctuation	1	< 0.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
R	1002	25.1%
A	941	23.5%
T	883	22.1%
B	275	6.9%
H	234	5.9%
Q	179	4.5%
D	174	4.3%
O	80	2.0%
C	73	1.8%
P	49	1.2%
Other values (4)	110	2.8%

Decimal Number

Value	Count	Frequency (%)
0	4533	45.3%
2	1441	14.4%
1	1421	14.2%
6	440	4.4%
3	413	4.1%
4	396	4.0%
5	369	3.7%
8	365	3.6%
7	358	3.6%
9	276	2.8%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Space Separator

Value	Count	Frequency (%)
	1	100.0%

Other Punctuation

Value	Count	Frequency (%)
:	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	10016	71.5%
Latin	4000	28.5%

Most frequent character per script

Latin

Value	Count	Frequency (%)
R	1002	25.1%
A	941	23.5%
T	883	22.1%
B	275	6.9%
H	234	5.9%
Q	179	4.5%
D	174	4.3%
O	80	2.0%
C	73	1.8%
P	49	1.2%
Other values (4)	110	2.8%

Common

Value	Count	Frequency (%)
0	4533	45.3%
2	1441	14.4%
1	1421	14.2%
6	440	4.4%
3	413	4.1%
4	396	4.0%
5	369	3.7%
8	365	3.6%
7	358	3.6%
9	276	2.8%
Other values (3)	4	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	14016	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	4533	32.3%
2	1441	10.3%
1	1421	10.1%
R	1002	7.1%
A	941	6.7%
T	883	6.3%
6	440	3.1%
3	413	2.9%
4	396	2.8%
5	369	2.6%
Other values (17)	2177	15.5%

등록일시
Text

Distinct	297
Distinct (%)	29.7%
Missing	2
Missing (%)	0.2%
Memory size	7.9 KiB

Length

Max length	40
Median length	16
Mean length	16.002002
Min length	7

Characters and Unicode

Total characters	15986
Distinct characters	84
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	284 ?
Unique (%)	28.4%

Sample

1st row	2020-10-16 14:18
2nd row	2020-10-15 9:19
3rd row	2020-10-07 16:13
4th row	2020-10-05 13:05
5th row	2020-09-23 14:30

Value	Count	Frequency (%)
2019-03-05	447	22.1%
13:40	445	22.0%
13:35	246	12.2%
2017-02-23	244	12.1%
2019-12-23	15	0.7%
2020-06-17	6	0.3%
9:24	6	0.3%
9:25	5	0.2%
2018-06-18	5	0.2%
2017-06-19	5	0.2%
Other values (448)	595	29.5%

Most occurring characters

Value	Count	Frequency (%)
0	3129	19.6%
1	2199	13.8%
-	1976	12.4%
3	1832	11.5%
2	1793	11.2%
	1030	6.4%
:	988	6.2%
5	852	5.3%
9	668	4.2%
4	609	3.8%
Other values (74)	910	5.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	11802	73.8%
Dash Punctuation	1976	12.4%
Space Separator	1030	6.4%
Other Punctuation	988	6.2%
Other Letter	176	1.1%
Close Punctuation	10	0.1%
Open Punctuation	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
지	12	6.8%
기	12	6.8%
산	10	5.7%
사	9	5.1%
일	9	5.1%
래	8	4.5%
거	8	4.5%
요	8	4.5%
청	8	4.5%
부	5	2.8%
Other values (59)	87	49.4%

Decimal Number

Value	Count	Frequency (%)
0	3129	26.5%
1	2199	18.6%
3	1832	15.5%
2	1793	15.2%
5	852	7.2%
9	668	5.7%
4	609	5.2%
7	417	3.5%
8	183	1.6%
6	120	1.0%

Dash Punctuation

Value	Count	Frequency (%)
-	1976	100.0%

Space Separator

Value	Count	Frequency (%)
	1030	100.0%

Other Punctuation

Value	Count	Frequency (%)
:	988	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	10	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	15810	98.9%
Hangul	176	1.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
지	12	6.8%
기	12	6.8%
산	10	5.7%
사	9	5.1%
일	9	5.1%
래	8	4.5%
거	8	4.5%
요	8	4.5%
청	8	4.5%
부	5	2.8%
Other values (59)	87	49.4%

Common

Value	Count	Frequency (%)
0	3129	19.8%
1	2199	13.9%
-	1976	12.5%
3	1832	11.6%
2	1793	11.3%
	1030	6.5%
:	988	6.2%
5	852	5.4%
9	668	4.2%
4	609	3.9%
Other values (5)	734	4.6%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	15810	98.9%
Hangul	176	1.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	3129	19.8%
1	2199	13.9%
-	1976	12.5%
3	1832	11.6%
2	1793	11.3%
	1030	6.5%
:	988	6.2%
5	852	5.4%
9	668	4.2%
4	609	3.9%
Other values (5)	734	4.6%

Hangul

Value	Count	Frequency (%)
지	12	6.8%
기	12	6.8%
산	10	5.7%
사	9	5.1%
일	9	5.1%
래	8	4.5%
거	8	4.5%
요	8	4.5%
청	8	4.5%
부	5	2.8%
Other values (59)	87	49.4%

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	기산일자	보증번호	등록일시
0	2020-09-22	RTNA2014000084	2020-10-16 14:18
1	2020-10-14	RTAC2019000175	2020-10-15 9:19
2	2020-10-05	RTAD2019000401	2020-10-07 16:13
3	2020-09-29	RTLA2020000099	2020-10-05 13:05
4	2020-09-14	RTNA2020000171	2020-09-23 14:30
5	2020-09-11	RTAB2016000317	2020-09-17 9:47
6	2020-09-11	RTPA2020000215	2020-09-14 15:40
7	2020-08-28	RTAD2017000890	2020-08-31 10:47
8	2020-08-20	RTOA2017000049	2020-08-21 16:06
9	2020-08-07	RTAC2012000604	2020-08-19 15:44

	기산일자	보증번호	등록일시
991	2017-02-22	RTHA2012000063	2017-02-23 13:35
992	2017-02-22	RTHO2016000633	2017-02-23 13:35
993	2017-02-22	RTBA2016000020	2017-02-23 13:35
994	2017-02-22	RTHO2011000222	2017-02-23 13:35
995	2017-02-22	RQAD2011000199	2017-02-23 13:35
996	2017-02-22	RTMA2011000097	2017-02-23 13:35
997	2017-02-22	RTAB2013000185	2017-02-23 13:35
998	2017-02-22	RQAD2011000683	2017-02-23 13:35
999	2017-02-22	RQAD2014000425	2017-02-23 13:35
1000	2017-02-22	RTAA2014000004	2017-02-23 13:35

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Dash Punctuation

Space Separator

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Dash Punctuation

Space Separator

Other Punctuation

Close Punctuation

Open Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Missing values

Sample