gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	163
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	4.3 KiB
Average record size in memory	26.8 B

Variable types

Text	1
Numeric	1
Categorical	1

Dataset

Description	한국주택금융공사 유동화자산부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author	한국주택금융공사
URL	https://www.data.go.kr/data/15072825/fileData.do

Alerts

MSPRTC_SEQ is highly imbalanced (66.7%) Imbalance

Reproduction

Analysis started	2023-12-12 16:25:52.483902
Analysis finished	2023-12-12 16:25:52.831712
Duration	0.35 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

DBTR_JUMIN_NO
Text

Distinct	97
Distinct (%)	59.5%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

Length

Max length	13
Median length	13
Mean length	13
Min length	13

Characters and Unicode

Total characters	2119
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	62 ?
Unique (%)	38.0%

Sample

1st row	8408252******
2nd row	8302211******
3rd row	8002052******
4th row	8002052******
5th row	7911132******

Value	Count	Frequency (%)
6602101	11	6.7%
6408201	5	3.1%
7911132	5	3.1%
7307202	5	3.1%
5612201	4	2.5%
6508122	4	2.5%
6903231	4	2.5%
6312222	3	1.8%
7010171	3	1.8%
6101161	3	1.8%
Other values (87)	116	71.2%

Most occurring characters

Value	Count	Frequency (%)
*	978	46.2%
1	280	13.2%
0	227	10.7%
2	181	8.5%
6	121	5.7%
7	96	4.5%
5	67	3.2%
3	59	2.8%
8	46	2.2%
4	39	1.8%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	1141	53.8%
Other Punctuation	978	46.2%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	280	24.5%
0	227	19.9%
2	181	15.9%
6	121	10.6%
7	96	8.4%
5	67	5.9%
3	59	5.2%
8	46	4.0%
4	39	3.4%
9	25	2.2%

Other Punctuation

Value	Count	Frequency (%)
*	978	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	2119	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
*	978	46.2%
1	280	13.2%
0	227	10.7%
2	181	8.5%
6	121	5.7%
7	96	4.5%
5	67	3.2%
3	59	2.8%
8	46	2.2%
4	39	1.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2119	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
*	978	46.2%
1	280	13.2%
0	227	10.7%
2	181	8.5%
6	121	5.7%
7	96	4.5%
5	67	3.2%
3	59	2.8%
8	46	2.2%
4	39	1.8%

ASSET_NO
Real number (ℝ)

Distinct	11
Distinct (%)	6.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	3.2515337

Minimum	1
Maximum	11
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.6 KiB

Quantile statistics

Minimum	1
5-th percentile	1
Q1	2
median	3
Q3	4
95-th percentile	6
Maximum	11
Range	10
Interquartile range (IQR)	2

Descriptive statistics

Standard deviation	1.7474622
Coefficient of variation (CV)	0.53742706
Kurtosis	3.3914823
Mean	3.2515337
Median Absolute Deviation (MAD)	1
Skewness	1.4418556
Sum	530
Variance	3.0536242
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=11)

Value	Count	Frequency (%)
3	48	29.4%
2	38	23.3%
4	28	17.2%
1	20	12.3%
5	14	8.6%
6	8	4.9%
7	2	1.2%
8	2	1.2%
11	1	0.6%
10	1	0.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	20	12.3%
2	38	23.3%
3	48	29.4%
4	28	17.2%
5	14	8.6%
6	8	4.9%
7	2	1.2%
8	2	1.2%
9	1	0.6%
10	1	0.6%

Value	Count	Frequency (%)
11	1	0.6%
10	1	0.6%
9	1	0.6%
8	2	1.2%
7	2	1.2%
6	8	4.9%
5	14	8.6%
4	28	17.2%
3	48	29.4%
2	38	23.3%

MSPRTC_SEQ
Categorical

IMBALANCE

Distinct	2
Distinct (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	153
2	10

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	2
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
1	153	93.9%
2	10	6.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	153	93.9%
2	10	6.1%

ASSET_NO

ASSET_NO

Phik (φk)
Auto

Heatmap
Table

	DBTR_JUMIN_NO	ASSET_NO	MSPRTC_SEQ
DBTR_JUMIN_NO	1.000	0.000	0.000
ASSET_NO	0.000	1.000	0.000
MSPRTC_SEQ	0.000	0.000	1.000

Heatmap
Table

	ASSET_NO	MSPRTC_SEQ
ASSET_NO	1.000	0.000
MSPRTC_SEQ	0.000	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	DBTR_JUMIN_NO	ASSET_NO	MSPRTC_SEQ
0	8408252******	2	1
1	8302211******	2	1
2	8002052******	6	2
3	8002052******	5	1
4	7911132******	7	1
5	7911132******	6	1
6	7911132******	5	1
7	7911132******	4	1
8	7911132******	3	1
9	7907091******	1	1

	DBTR_JUMIN_NO	ASSET_NO	MSPRTC_SEQ
153	5210191******	1	1
154	5202282******	1	1
155	5108271******	2	1
156	5104021******	1	1
157	5007201******	3	1
158	4901231******	5	1
159	4803181******	3	1
160	4306101******	3	1
161	4111152******	2	1
162	4111152******	1	1

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Other Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample