Overview

Dataset statistics

Number of variables13
Number of observations1000
Missing cells6772
Missing cells (%)52.1%
Duplicate rows49
Duplicate rows (%)4.9%
Total size in memory110.5 KiB
Average record size in memory113.1 B

Variable types

Categorical4
Numeric3
Unsupported5
Boolean1

Dataset

Description한국주택금융공사 유동화자산부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15073321/fileData.do

Alerts

LIQD_PLAN_CD has constant value ""Constant
REG_DVCD has constant value ""Constant
Dataset has 49 (4.9%) duplicate rowsDuplicates
TREAT_ORG_CD is highly overall correlated with HOLD_CDHigh correlation
HOLD_CD is highly overall correlated with TREAT_ORG_CDHigh correlation
OWNRSHP_PSV_DY is highly overall correlated with OWNRSHP_PSV_5YR_YNHigh correlation
OWNRSHP_PSV_5YR_YN is highly overall correlated with OWNRSHP_PSV_DYHigh correlation
OWNRSHP_PSV_5YR_YN is highly imbalanced (74.0%)Imbalance
PAY_REQ_YN has 1000 (100.0%) missing valuesMissing
LND_SETUP_DY has 1000 (100.0%) missing valuesMissing
LND_REGIST_YN has 1000 (100.0%) missing valuesMissing
OWNRSHP_PSV_DY has 886 (88.6%) missing valuesMissing
OWNRSHP_PSV_5YR_YN has 886 (88.6%) missing valuesMissing
FMLY_150ABOVE_YN has 1000 (100.0%) missing valuesMissing
LRGT_UNREG_RSN_CD has 1000 (100.0%) missing valuesMissing
PAY_REQ_YN is an unsupported type, check if it needs cleaning or further analysisUnsupported
LND_SETUP_DY is an unsupported type, check if it needs cleaning or further analysisUnsupported
LND_REGIST_YN is an unsupported type, check if it needs cleaning or further analysisUnsupported
FMLY_150ABOVE_YN is an unsupported type, check if it needs cleaning or further analysisUnsupported
LRGT_UNREG_RSN_CD is an unsupported type, check if it needs cleaning or further analysisUnsupported
MORT_SETUP_AMT has 25 (2.5%) zerosZeros

Reproduction

Analysis started2023-12-11 23:09:01.273696
Analysis finished2023-12-11 23:09:02.619159
Duration1.35 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

LIQD_PLAN_CD
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
KHFCMB2020S-34
1000 

Length

Max length14
Median length14
Mean length14
Min length14

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKHFCMB2020S-34
2nd rowKHFCMB2020S-34
3rd rowKHFCMB2020S-34
4th rowKHFCMB2020S-34
5th rowKHFCMB2020S-34

Common Values

ValueCountFrequency (%)
KHFCMB2020S-34 1000
100.0%

Length

2023-12-12T08:09:02.668649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:09:02.742627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
khfcmb2020s-34 1000
100.0%

HOLD_CD
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
B081-2020-0101
233 
B004-2020-0099
199 
B020-2020-0101
115 
B004-2020-0098
106 
B081-2020-0100
105 
Other values (10)
242 

Length

Max length14
Median length14
Mean length13.898
Min length13

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowB081-2020-0101
2nd rowB004-2020-0099
3rd rowB004-2020-0099
4th rowB081-2020-0100
5th rowB031-2020-0017

Common Values

ValueCountFrequency (%)
B081-2020-0101 233
23.3%
B004-2020-0099 199
19.9%
B020-2020-0101 115
11.5%
B004-2020-0098 106
10.6%
B081-2020-0100 105
10.5%
B10-2020-0081 93
 
9.3%
B003-2020-0078 30
 
3.0%
B081-2020-0102 24
 
2.4%
B020-2020-0100 23
 
2.3%
B004-2020-0100 23
 
2.3%
Other values (5) 49
 
4.9%

Length

2023-12-12T08:09:02.817551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
b081-2020-0101 233
23.3%
b004-2020-0099 199
19.9%
b020-2020-0101 115
11.5%
b004-2020-0098 106
10.6%
b081-2020-0100 105
10.5%
b10-2020-0081 93
 
9.3%
b003-2020-0078 30
 
3.0%
b081-2020-0102 24
 
2.4%
b020-2020-0100 23
 
2.3%
b004-2020-0100 23
 
2.3%
Other values (5) 49
 
4.9%

TREAT_ORG_CD
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
B081
362 
B004
328 
B020
156 
B010
102 
B003
37 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB081
2nd rowB004
3rd rowB004
4th rowB081
5th rowB031

Common Values

ValueCountFrequency (%)
B081 362
36.2%
B004 328
32.8%
B020 156
15.6%
B010 102
 
10.2%
B003 37
 
3.7%
B031 15
 
1.5%

Length

2023-12-12T08:09:02.914557image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:09:03.008928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
b081 362
36.2%
b004 328
32.8%
b020 156
15.6%
b010 102
 
10.2%
b003 37
 
3.7%
b031 15
 
1.5%

LOAN_TREAT_DY
Real number (ℝ)

Distinct136
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20199989
Minimum20171227
Maximum20200824
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T08:09:03.114561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20171227
5-th percentile20200131
Q120200302
median20200320
Q320200427
95-th percentile20200603
Maximum20200824
Range29597
Interquartile range (IQR)125

Descriptive statistics

Standard deviation2020.6152
Coefficient of variation (CV)0.00010003051
Kurtosis54.007955
Mean20199989
Median Absolute Deviation (MAD)92
Skewness-6.32985
Sum2.0199989 × 1010
Variance4082886
MonotonicityNot monotonic
2023-12-12T08:09:03.235826image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20200306 59
 
5.9%
20200302 53
 
5.3%
20200316 30
 
3.0%
20200320 28
 
2.8%
20200228 26
 
2.6%
20200227 25
 
2.5%
20200305 25
 
2.5%
20200504 24
 
2.4%
20200327 24
 
2.4%
20200313 23
 
2.3%
Other values (126) 683
68.3%
ValueCountFrequency (%)
20171227 1
 
0.1%
20190621 2
 
0.2%
20190701 1
 
0.1%
20190705 1
 
0.1%
20190709 2
 
0.2%
20190711 1
 
0.1%
20190719 1
 
0.1%
20190801 1
 
0.1%
20190806 6
0.6%
20190808 3
0.3%
ValueCountFrequency (%)
20200824 1
 
0.1%
20200819 1
 
0.1%
20200814 1
 
0.1%
20200728 1
 
0.1%
20200724 6
0.6%
20200723 1
 
0.1%
20200722 2
 
0.2%
20200721 1
 
0.1%
20200710 1
 
0.1%
20200630 1
 
0.1%

MORT_SETUP_AMT
Real number (ℝ)

ZEROS 

Distinct214
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0278584 × 108
Minimum0
Maximum3.85 × 108
Zeros25
Zeros (%)2.5%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T08:09:03.359751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile62590000
Q11.63845 × 108
median2.134 × 108
Q32.53 × 108
95-th percentile3.21255 × 108
Maximum3.85 × 108
Range3.85 × 108
Interquartile range (IQR)89155000

Descriptive statistics

Standard deviation73427369
Coefficient of variation (CV)0.36209318
Kurtosis0.29984601
Mean2.0278584 × 108
Median Absolute Deviation (MAD)48400000
Skewness-0.53427131
Sum2.0278584 × 1011
Variance5.3915785 × 1015
MonotonicityNot monotonic
2023-12-12T08:09:03.479806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
220000000 84
 
8.4%
165000000 44
 
4.4%
330000000 42
 
4.2%
242000000 32
 
3.2%
0 25
 
2.5%
143000000 24
 
2.4%
198000000 23
 
2.3%
132000000 23
 
2.3%
187000000 22
 
2.2%
275000000 22
 
2.2%
Other values (204) 659
65.9%
ValueCountFrequency (%)
0 25
2.5%
22000000 2
 
0.2%
33000000 5
 
0.5%
44000000 5
 
0.5%
49500000 1
 
0.1%
52800000 1
 
0.1%
55000000 10
 
1.0%
60500000 1
 
0.1%
62700000 1
 
0.1%
63800000 1
 
0.1%
ValueCountFrequency (%)
385000000 1
 
0.1%
330000000 42
4.2%
328900000 1
 
0.1%
327800000 1
 
0.1%
326700000 1
 
0.1%
325600000 1
 
0.1%
324500000 2
 
0.2%
322300000 1
 
0.1%
321200000 2
 
0.2%
320100000 1
 
0.1%

REG_DVCD
Categorical

CONSTANT 

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
1000 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 1000
100.0%

Length

2023-12-12T08:09:03.587360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T08:09:03.660588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 1000
100.0%

PAY_REQ_YN
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

LND_SETUP_DY
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

LND_REGIST_YN
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

OWNRSHP_PSV_DY
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct70
Distinct (%)61.4%
Missing886
Missing (%)88.6%
Infinite0
Infinite (%)0.0%
Mean20188212
Minimum20020607
Maximum20200529
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T08:09:03.759171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20020607
5-th percentile20150804
Q120180600
median20200206
Q320200302
95-th percentile20200504
Maximum20200529
Range179922
Interquartile range (IQR)19702

Descriptive statistics

Standard deviation25551.075
Coefficient of variation (CV)0.0012656433
Kurtosis19.281681
Mean20188212
Median Absolute Deviation (MAD)120.5
Skewness-3.9122328
Sum2.3014562 × 109
Variance6.5285742 × 108
MonotonicityNot monotonic
2023-12-12T08:09:03.901044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20200227 10
 
1.0%
20200529 4
 
0.4%
20200126 4
 
0.4%
20200221 3
 
0.3%
20180517 3
 
0.3%
20200228 3
 
0.3%
20200324 3
 
0.3%
20200130 3
 
0.3%
20200323 3
 
0.3%
20170609 3
 
0.3%
Other values (60) 75
 
7.5%
(Missing) 886
88.6%
ValueCountFrequency (%)
20020607 1
 
0.1%
20091111 2
0.2%
20110314 1
 
0.1%
20130415 1
 
0.1%
20150612 1
 
0.1%
20150908 1
 
0.1%
20170323 2
0.2%
20170609 3
0.3%
20170731 1
 
0.1%
20170825 1
 
0.1%
ValueCountFrequency (%)
20200529 4
0.4%
20200506 1
 
0.1%
20200505 1
 
0.1%
20200504 1
 
0.1%
20200430 2
0.2%
20200429 1
 
0.1%
20200425 1
 
0.1%
20200423 1
 
0.1%
20200404 1
 
0.1%
20200402 1
 
0.1%

OWNRSHP_PSV_5YR_YN
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)1.8%
Missing886
Missing (%)88.6%
Memory size2.1 KiB
True
109 
False
 
5
(Missing)
886 
ValueCountFrequency (%)
True 109
 
10.9%
False 5
 
0.5%
(Missing) 886
88.6%
2023-12-12T08:09:03.993194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

FMLY_150ABOVE_YN
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

LRGT_UNREG_RSN_CD
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1000
Missing (%)100.0%
Memory size8.9 KiB

Interactions

2023-12-12T08:09:02.103744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:01.569900image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:01.829847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:02.172321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:01.655376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:01.923568image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:02.245373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:01.752538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T08:09:02.018712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T08:09:04.044185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
HOLD_CDTREAT_ORG_CDLOAN_TREAT_DYMORT_SETUP_AMTOWNRSHP_PSV_DYOWNRSHP_PSV_5YR_YN
HOLD_CD1.0001.0000.5140.3430.0000.000
TREAT_ORG_CD1.0001.0000.5820.2940.0000.000
LOAN_TREAT_DY0.5140.5821.0000.0000.0000.000
MORT_SETUP_AMT0.3430.2940.0001.0000.0950.174
OWNRSHP_PSV_DY0.0000.0000.0000.0951.0001.000
OWNRSHP_PSV_5YR_YN0.0000.0000.0000.1741.0001.000
2023-12-12T08:09:04.126385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
TREAT_ORG_CDHOLD_CDOWNRSHP_PSV_5YR_YN
TREAT_ORG_CD1.0000.9950.000
HOLD_CD0.9951.0000.000
OWNRSHP_PSV_5YR_YN0.0000.0001.000
2023-12-12T08:09:04.199958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
LOAN_TREAT_DYMORT_SETUP_AMTOWNRSHP_PSV_DYHOLD_CDTREAT_ORG_CDOWNRSHP_PSV_5YR_YN
LOAN_TREAT_DY1.000-0.199-0.2220.3270.2950.000
MORT_SETUP_AMT-0.1991.0000.0040.1340.1590.167
OWNRSHP_PSV_DY-0.2220.0041.0000.0000.0970.977
HOLD_CD0.3270.1340.0001.0000.9950.000
TREAT_ORG_CD0.2950.1590.0970.9951.0000.000
OWNRSHP_PSV_5YR_YN0.0000.1670.9770.0000.0001.000

Missing values

2023-12-12T08:09:02.353217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T08:09:02.489698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T08:09:02.580595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYMORT_SETUP_AMTREG_DVCDPAY_REQ_YNLND_SETUP_DYLND_REGIST_YNOWNRSHP_PSV_DYOWNRSHP_PSV_5YR_YNFMLY_150ABOVE_YNLRGT_UNREG_RSN_CD
0KHFCMB2020S-34B081-2020-0101B081202003163300000002<NA><NA><NA><NA><NA><NA><NA>
1KHFCMB2020S-34B004-2020-0099B004202004133300000002<NA><NA><NA><NA><NA><NA><NA>
2KHFCMB2020S-34B004-2020-0099B004202003022178000002<NA><NA><NA><NA><NA><NA><NA>
3KHFCMB2020S-34B081-2020-0100B081202003062200000002<NA><NA><NA><NA><NA><NA><NA>
4KHFCMB2020S-34B031-2020-0017B03120200421440000002<NA><NA><NA><NA><NA><NA><NA>
5KHFCMB2020S-34B004-2020-0099B004202002272200000002<NA><NA><NA><NA><NA><NA><NA>
6KHFCMB2020S-34B081-2020-0101B081202003093102000002<NA><NA><NA><NA><NA><NA><NA>
7KHFCMB2020S-34B020-2020-0101B020201908062580000002<NA><NA><NA><NA><NA><NA><NA>
8KHFCMB2020S-34B081-2020-0101B081202004012200000002<NA><NA><NA><NA><NA><NA><NA>
9KHFCMB2020S-34B004-2020-0099B004202003202420000002<NA><NA><NA><NA><NA><NA><NA>
LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYMORT_SETUP_AMTREG_DVCDPAY_REQ_YNLND_SETUP_DYLND_REGIST_YNOWNRSHP_PSV_DYOWNRSHP_PSV_5YR_YNFMLY_150ABOVE_YNLRGT_UNREG_RSN_CD
990KHFCMB2020S-34B081-2020-0101B081202003311650000002<NA><NA><NA><NA><NA><NA><NA>
991KHFCMB2020S-34B004-2020-0099B004202004171100000002<NA><NA><NA><NA><NA><NA><NA>
992KHFCMB2020S-34B004-2020-0099B004202003042200000002<NA><NA><NA><NA><NA><NA><NA>
993KHFCMB2020S-34B004-2020-0099B004202005071914000002<NA><NA><NA>20170609Y<NA><NA>
994KHFCMB2020S-34B004-2020-0099B004202003241100000002<NA><NA><NA><NA><NA><NA><NA>
995KHFCMB2020S-34B004-2020-0099B004202003102200000002<NA><NA><NA><NA><NA><NA><NA>
996KHFCMB2020S-34B081-2020-0100B081202003111980000002<NA><NA><NA><NA><NA><NA><NA>
997KHFCMB2020S-34B004-2020-0099B004202003161859000002<NA><NA><NA><NA><NA><NA><NA>
998KHFCMB2020S-34B10-2020-0082B010202005071636800002<NA><NA><NA><NA><NA><NA><NA>
999KHFCMB2020S-34B081-2020-0101B081202002282530000002<NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

LIQD_PLAN_CDHOLD_CDTREAT_ORG_CDLOAN_TREAT_DYMORT_SETUP_AMTREG_DVCDOWNRSHP_PSV_DYOWNRSHP_PSV_5YR_YN# duplicates
26KHFCMB2020S-34B081-2020-0100B081202003062200000002<NA><NA>7
9KHFCMB2020S-34B004-2020-0099B004202003062200000002<NA><NA>4
41KHFCMB2020S-34B081-2020-0101B081202004023300000002<NA><NA>4
7KHFCMB2020S-34B004-2020-0099B004202003052200000002<NA><NA>3
10KHFCMB2020S-34B004-2020-0099B004202003092200000002<NA><NA>3
24KHFCMB2020S-34B081-2020-0100B081202003062167000002<NA><NA>3
34KHFCMB2020S-34B081-2020-0101B081202003062200000002<NA><NA>3
38KHFCMB2020S-34B081-2020-0101B081202003133300000002<NA><NA>3
0KHFCMB2020S-34B004-2020-0098B004202002202200000002<NA><NA>2
1KHFCMB2020S-34B004-2020-0098B004202003022420000002<NA><NA>2