Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows162
Duplicate rows (%)16.2%
Total size in memory40.2 KiB
Average record size in memory41.1 B

Variable types

Categorical3
Numeric1
Boolean1

Dataset

Description한국주택금융공사 유동화자산부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15073299/fileData.do

Alerts

ARTRGT_NOTICE_YN has constant value ""Constant
Dataset has 162 (16.2%) duplicate rowsDuplicates
TREAT_ORG_CD is highly overall correlated with HOLD_CDHigh correlation
HOLD_CD is highly overall correlated with LOAN_TREAT_DY and 2 other fieldsHigh correlation
LIQD_PLAN_CD is highly overall correlated with LOAN_TREAT_DY and 1 other fieldsHigh correlation
LOAN_TREAT_DY is highly overall correlated with LIQD_PLAN_CD and 1 other fieldsHigh correlation
LIQD_PLAN_CD is highly imbalanced (71.4%)Imbalance

Reproduction

Analysis started2023-12-12 11:03:14.730752
Analysis finished2023-12-12 11:03:15.400136
Duration0.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

LIQD_PLAN_CD
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct21
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
KHFCMB2020S-34
708 
KHFCMB2020S-33
243 
KHFCMB2019S-08
 
7
KHFCMB2019S-03
 
6
KHFCMB2018S-30
 
5
Other values (16)
 
31

Length

Max length14
Median length14
Mean length14
Min length14

Unique

Unique7 ?
Unique (%)0.7%

Sample

1st rowKHFCMB2020S-34
2nd rowKHFCMB2020S-34
3rd rowKHFCMB2020S-34
4th rowKHFCMB2020S-34
5th rowKHFCMB2020S-34

Common Values

ValueCountFrequency (%)
KHFCMB2020S-34 708
70.8%
KHFCMB2020S-33 243
 
24.3%
KHFCMB2019S-08 7
 
0.7%
KHFCMB2019S-03 6
 
0.6%
KHFCMB2018S-30 5
 
0.5%
KHFCMB2019S-05 4
 
0.4%
KHFCMB2019S-24 4
 
0.4%
KHFCMB2019S-12 3
 
0.3%
KHFCMB2019S-13 3
 
0.3%
KHFCMB2019S-19 2
 
0.2%
Other values (11) 15
 
1.5%

Length

2023-12-12T20:03:15.502970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
khfcmb2020s-34 708
70.8%
khfcmb2020s-33 243
 
24.3%
khfcmb2019s-08 7
 
0.7%
khfcmb2019s-03 6
 
0.6%
khfcmb2018s-30 5
 
0.5%
khfcmb2019s-05 4
 
0.4%
khfcmb2019s-24 4
 
0.4%
khfcmb2019s-12 3
 
0.3%
khfcmb2019s-13 3
 
0.3%
khfcmb2019s-07 2
 
0.2%
Other values (11) 15
 
1.5%

HOLD_CD
Categorical

HIGH CORRELATION 

Distinct44
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
B004-2020-0099
177 
B081-2020-0100
145 
B081-2020-0101
115 
B088-2020-0105
113 
B10-2020-0081
91 
Other values (39)
359 

Length

Max length14
Median length14
Mean length13.909
Min length13

Unique

Unique17 ?
Unique (%)1.7%

Sample

1st rowB081-2020-0101
2nd rowB081-2020-0101
3rd rowB081-2020-0101
4th rowB081-2020-0101
5th rowB081-2020-0101

Common Values

ValueCountFrequency (%)
B004-2020-0099 177
17.7%
B081-2020-0100 145
14.5%
B081-2020-0101 115
11.5%
B088-2020-0105 113
11.3%
B10-2020-0081 91
9.1%
B023-2020-0037 62
 
6.2%
B004-2020-0098 60
 
6.0%
B003-2020-0078 54
 
5.4%
B020-2020-0101 28
 
2.8%
B003-2020-0077 22
 
2.2%
Other values (34) 133
13.3%

Length

2023-12-12T20:03:15.674968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b004-2020-0099 177
17.7%
b081-2020-0100 145
14.5%
b081-2020-0101 115
11.5%
b088-2020-0105 113
11.3%
b10-2020-0081 91
9.1%
b023-2020-0037 62
 
6.2%
b004-2020-0098 60
 
6.0%
b003-2020-0078 54
 
5.4%
b020-2020-0101 28
 
2.8%
b003-2020-0077 22
 
2.2%
Other values (34) 133
13.3%

TREAT_ORG_CD
Categorical

HIGH CORRELATION 

Distinct10
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
B081
299 
B004
238 
B088
123 
B003
95 
B010
91 
Other values (5)
154 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowB081
2nd rowB081
3rd rowB081
4th rowB081
5th rowB081

Common Values

ValueCountFrequency (%)
B081 299
29.9%
B004 238
23.8%
B088 123
12.3%
B003 95
 
9.5%
B010 91
 
9.1%
B023 66
 
6.6%
B020 56
 
5.6%
B031 21
 
2.1%
B039 10
 
1.0%
B007 1
 
0.1%

Length

2023-12-12T20:03:15.843387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:03:16.003142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
b081 299
29.9%
b004 238
23.8%
b088 123
12.3%
b003 95
 
9.5%
b010 91
 
9.1%
b023 66
 
6.6%
b020 56
 
5.6%
b031 21
 
2.1%
b039 10
 
1.0%
b007 1
 
0.1%

LOAN_TREAT_DY
Real number (ℝ)

HIGH CORRELATION 

Distinct182
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20199127
Minimum20150327
Maximum20200907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T20:03:16.205852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20150327
5-th percentile20190621
Q120200331
median20200518
Q320200529
95-th percentile20200807
Maximum20200907
Range50580
Interquartile range (IQR)198

Descriptive statistics

Standard deviation6397.0291
Coefficient of variation (CV)0.00031669829
Kurtosis39.37088
Mean20199127
Median Absolute Deviation (MAD)83
Skewness-5.9652355
Sum2.0199127 × 1010
Variance40921982
MonotonicityNot monotonic
2023-12-12T20:03:16.478451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20200529 95
 
9.5%
20200515 51
 
5.1%
20200520 46
 
4.6%
20200508 35
 
3.5%
20200522 35
 
3.5%
20200511 26
 
2.6%
20200306 25
 
2.5%
20200518 25
 
2.5%
20200331 24
 
2.4%
20200528 23
 
2.3%
Other values (172) 615
61.5%
ValueCountFrequency (%)
20150327 1
0.1%
20150407 1
0.1%
20150410 1
0.1%
20150414 1
0.1%
20150415 1
0.1%
20150417 2
0.2%
20150420 1
0.1%
20150422 1
0.1%
20150427 1
0.1%
20150429 1
0.1%
ValueCountFrequency (%)
20200907 1
 
0.1%
20200904 3
0.3%
20200903 1
 
0.1%
20200902 1
 
0.1%
20200901 3
0.3%
20200831 5
0.5%
20200828 4
0.4%
20200827 5
0.5%
20200826 2
 
0.2%
20200825 2
 
0.2%

ARTRGT_NOTICE_YN
Boolean

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
1000 
ValueCountFrequency (%)
False 1000
100.0%
2023-12-12T20:03:16.714503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-12T20:03:15.031994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:03:16.828376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DY
LIQD_PLAN_CD1.0001.0000.7420.911
HOLD_CD1.0001.0001.0000.982
TREAT_ORG_CD0.7421.0001.0000.485
LOAN_TREAT_DY0.9110.9820.4851.000
2023-12-12T20:03:16.990478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
TREAT_ORG_CDHOLD_CDLIQD_PLAN_CD
TREAT_ORG_CD1.0000.9830.382
HOLD_CD0.9831.0000.988
LIQD_PLAN_CD0.3820.9881.000
2023-12-12T20:03:17.162773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
LOAN_TREAT_DYLIQD_PLAN_CDHOLD_CDTREAT_ORG_CD
LOAN_TREAT_DY1.0000.7410.8850.219
LIQD_PLAN_CD0.7411.0000.9880.382
HOLD_CD0.8850.9881.0000.983
TREAT_ORG_CD0.2190.3820.9831.000

Missing values

2023-12-12T20:03:15.207821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:03:15.332983image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYARTRGT_NOTICE_YN
0KHFCMB2020S-34B081-2020-0101B08120200512N
1KHFCMB2020S-34B081-2020-0101B08120200318N
2KHFCMB2020S-34B081-2020-0101B08120200306N
3KHFCMB2020S-34B081-2020-0101B08120200306N
4KHFCMB2020S-34B081-2020-0101B08120200520N
5KHFCMB2020S-34B081-2020-0101B08120200520N
6KHFCMB2020S-34B081-2020-0101B08120200521N
7KHFCMB2020S-34B081-2020-0101B08120200520N
8KHFCMB2020S-34B081-2020-0101B08120200515N
9KHFCMB2020S-34B081-2020-0101B08120200221N
LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYARTRGT_NOTICE_YN
990KHFCMB2020S-33B088-2020-0105B08820200529N
991KHFCMB2020S-33B088-2020-0105B08820200529N
992KHFCMB2020S-33B088-2020-0105B08820200304N
993KHFCMB2020S-33B088-2020-0105B08820200407N
994KHFCMB2020S-33B088-2020-0105B08820200525N
995KHFCMB2020S-33B088-2020-0105B08820200601N
996KHFCMB2020S-33B088-2020-0105B08820200529N
997KHFCMB2020S-33B088-2020-0105B08820200302N
998KHFCMB2020S-33B088-2020-0105B08820200601N
999KHFCMB2020S-33B088-2020-0105B08820200604N

Duplicate rows

Most frequently occurring

LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYARTRGT_NOTICE_YN# duplicates
117KHFCMB2020S-34B081-2020-0100B08120200529N38
37KHFCMB2020S-33B088-2020-0105B08820200529N29
82KHFCMB2020S-34B004-2020-0099B00420200508N25
87KHFCMB2020S-34B004-2020-0099B00420200515N21
83KHFCMB2020S-34B004-2020-0099B00420200511N16
32KHFCMB2020S-33B088-2020-0105B08820200522N14
79KHFCMB2020S-34B004-2020-0099B00420200504N14
90KHFCMB2020S-34B004-2020-0099B00420200520N14
7KHFCMB2020S-33B003-2020-0077B00320200331N12
38KHFCMB2020S-33B088-2020-0105B08820200601N12