gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	100
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	3.4 KiB
Average record size in memory	35.3 B

Variable types

Text	1
Categorical	1
Numeric	1
DateTime	1

Dataset

Description	한국주택금융공사 주택연금부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author	한국주택금융공사
URL	https://www.data.go.kr/data/15073020/fileData.do

Alerts

SEQ is highly imbalanced (65.6%) Imbalance

Reproduction

Analysis started	2023-12-12 22:20:53.139642
Analysis finished	2023-12-12 22:20:53.522705
Duration	0.38 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

GUARNT_NO
Text

Distinct	86
Distinct (%)	86.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

Length

Max length	14
Median length	14
Mean length	14
Min length	14

Characters and Unicode

Total characters	1400
Distinct characters	24
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	74 ?
Unique (%)	74.0%

Sample

1st row	RTPA2020000519
2nd row	RTHO2020000496
3rd row	RTHB2020000590
4th row	RTOB2020000067
5th row	RTPB2020000155

Value	Count	Frequency (%)
rtna2020000220	4	4.0%
rtma2020000256	2	2.0%
rtma2020000251	2	2.0%
rtha2020000742	2	2.0%
rtad2020000616	2	2.0%
rtab2020000727	2	2.0%
rtba2020000625	2	2.0%
rtpb2020000155	2	2.0%
rtac2020000662	2	2.0%
rtad2020000688	2	2.0%
Other values (76)	78	78.0%

Most occurring characters

Value	Count	Frequency (%)
0	518	37.0%
2	242	17.3%
R	100	7.1%
T	95	6.8%
A	83	5.9%
6	58	4.1%
1	37	2.6%
7	34	2.4%
B	33	2.4%
4	31	2.2%
Other values (14)	169	12.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1000	71.4%
Uppercase Letter	400	28.6%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
R	100	25.0%
T	95	23.8%
A	83	20.8%
B	33	8.2%
H	24	6.0%
D	14	3.5%
O	11	2.8%
C	9	2.2%
P	9	2.2%
Q	7	1.8%
Other values (4)	15	3.8%

Decimal Number

Value	Count	Frequency (%)
0	518	51.8%
2	242	24.2%
6	58	5.8%
1	37	3.7%
7	34	3.4%
4	31	3.1%
5	29	2.9%
9	24	2.4%
8	16	1.6%
3	11	1.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	1000	71.4%
Latin	400	28.6%

Most frequent character per script

Latin

Value	Count	Frequency (%)
R	100	25.0%
T	95	23.8%
A	83	20.8%
B	33	8.2%
H	24	6.0%
D	14	3.5%
O	11	2.8%
C	9	2.2%
P	9	2.2%
Q	7	1.8%
Other values (4)	15	3.8%

Common

Value	Count	Frequency (%)
0	518	51.8%
2	242	24.2%
6	58	5.8%
1	37	3.7%
7	34	3.4%
4	31	3.1%
5	29	2.9%
9	24	2.4%
8	16	1.6%
3	11	1.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	1400	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	518	37.0%
2	242	17.3%
R	100	7.1%
T	95	6.8%
A	83	5.9%
6	58	4.1%
1	37	2.6%
7	34	2.4%
B	33	2.4%
4	31	2.2%
Other values (14)	169	12.1%

SEQ
Categorical

IMBALANCE

Distinct	4
Distinct (%)	4.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

1	86
2	12
4	1
3	1

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	2 ?
Unique (%)	2.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	2

Common Values

Value	Count	Frequency (%)
1	86	86.0%
2	12	12.0%
4	1	1.0%
3	1	1.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	86	86.0%
2	12	12.0%
4	1	1.0%
3	1	1.0%

REG_ENO
Real number (ℝ)

Distinct	43
Distinct (%)	43.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1702.7

Minimum	1174
Maximum	2003
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.0 KiB

Quantile statistics

Minimum	1174
5-th percentile	1304
Q1	1557
median	1690
Q3	1917
95-th percentile	1982.25
Maximum	2003
Range	829
Interquartile range (IQR)	360

Descriptive statistics

Standard deviation	221.72011
Coefficient of variation (CV)	0.13021678
Kurtosis	-0.3826661
Mean	1702.7
Median Absolute Deviation (MAD)	152.5
Skewness	-0.50995131
Sum	170270
Variance	49159.808
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=43)

Value	Count	Frequency (%)
1656	8	8.0%
1970	6	6.0%
1557	5	5.0%
1689	5	5.0%
1956	5	5.0%
1753	5	5.0%
1917	4	4.0%
1174	4	4.0%
1691	4	4.0%
1385	4	4.0%
Other values (33)	50	50.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1174	4	4.0%
1304	2	2.0%
1371	3	3.0%
1385	4	4.0%
1406	2	2.0%
1475	3	3.0%
1521	1	1.0%
1554	2	2.0%
1557	5	5.0%
1569	1	1.0%

Value	Count	Frequency (%)
2003	1	1.0%
2001	2	2.0%
2000	1	1.0%
1987	1	1.0%
1982	1	1.0%
1980	2	2.0%
1977	1	1.0%
1970	6	6.0%
1968	2	2.0%
1956	5	5.0%

REG_TS
Date

Distinct	87
Distinct (%)	87.0%
Missing	0
Missing (%)	0.0%
Memory size	932.0 B

Minimum	2020-09-17 15:34:12
Maximum	2020-10-22 15:51:00

Histogram

Histogram with fixed size bins (bins=50)

REG_ENO

REG_ENO

Phik (φk)
Auto

Heatmap
Table

	GUARNT_NO	SEQ	REG_ENO	REG_TS
GUARNT_NO	1.000	0.000	1.000	1.000
SEQ	0.000	1.000	0.000	0.000
REG_ENO	1.000	0.000	1.000	1.000
REG_TS	1.000	0.000	1.000	1.000

Heatmap
Table

	REG_ENO	SEQ
REG_ENO	1.000	0.000
SEQ	0.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	GUARNT_NO	SEQ	REG_ENO	REG_TS
0	RTPA2020000519	1	1982	2020/10/22 15:51:00
1	RTHO2020000496	1	1917	2020/10/22 13:53:17
2	RTHB2020000590	1	1371	2020/10/22 14:31:27
3	RTOB2020000067	1	1620	2020/10/22 11:45:18
4	RTPB2020000155	2	1980	2020/10/22 10:58:30
5	RTAD2020000688	2	1656	2020/10/22 10:34:13
6	RTPB2020000155	1	1980	2020/10/22 10:58:30
7	RTHO2020000499	1	1691	2020/10/22 09:51:06
8	RTAD2020000688	1	1656	2020/10/22 10:34:13
9	RTPA2020000531	1	1889	2020/10/22 09:26:53

	GUARNT_NO	SEQ	REG_ENO	REG_TS
90	RTAB2020000727	1	1385	2020/09/21 17:35:17
91	RTAB2020000724	1	1689	2020/09/23 11:53:16
92	RTAA2020000554	1	1799	2020/09/18 14:58:14
93	RTAC2020000673	1	1788	2020/09/18 14:55:00
94	RTBA2020000607	1	1720	2020/09/22 11:23:05
95	RTAC2020000662	2	1753	2020/09/21 16:49:15
96	RTBB2020000178	1	1304	2020/09/18 14:16:57
97	RTAC2020000662	1	1753	2020/09/21 16:49:15
98	RTQA2020000271	1	1874	2020/09/18 10:27:21
99	RTHA2020000645	1	1569	2020/09/17 15:34:12

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Decimal Number

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample