Overview

Dataset statistics

Number of variables6
Number of observations1000
Missing cells650
Missing cells (%)10.8%
Duplicate rows44
Duplicate rows (%)4.4%
Total size in memory51.9 KiB
Average record size in memory53.1 B

Variable types

Numeric4
Categorical1
DateTime1

Dataset

Description한국주택금융공사 유동화자산부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15073197/fileData.do

Alerts

Dataset has 44 (4.4%) duplicate rowsDuplicates
BASIS_DY is highly overall correlated with SEQ and 2 other fieldsHigh correlation
SEQ is highly overall correlated with BASIS_DY and 2 other fieldsHigh correlation
DISPOS_DY is highly overall correlated with FMLY_RELTN_CDHigh correlation
TELGRM_MAKE_DY is highly overall correlated with BASIS_DY and 2 other fieldsHigh correlation
FMLY_RELTN_CD is highly overall correlated with BASIS_DY and 3 other fieldsHigh correlation
FMLY_RELTN_CD is highly imbalanced (98.6%)Imbalance
DISPOS_DY has 631 (63.1%) missing valuesMissing
TELGRM_MAKE_DY has 19 (1.9%) missing valuesMissing

Reproduction

Analysis started2023-12-12 16:43:04.412216
Analysis finished2023-12-12 16:43:07.137232
Duration2.73 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

BASIS_DY
Real number (ℝ)

HIGH CORRELATION 

Distinct272
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20117750
Minimum20100902
Maximum20150105
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T01:43:07.252060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20100902
5-th percentile20101025
Q120110207
median20111219
Q320130624
95-th percentile20131127
Maximum20150105
Range49203
Interquartile range (IQR)20417

Descriptive statistics

Standard deviation11598.707
Coefficient of variation (CV)0.00057654094
Kurtosis-1.3183842
Mean20117750
Median Absolute Deviation (MAD)10010
Skewness0.056871662
Sum2.011775 × 1010
Variance1.3453 × 108
MonotonicityNot monotonic
2023-12-13T01:43:07.454453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20101125 28
 
2.8%
20130417 24
 
2.4%
20110210 20
 
2.0%
20110113 18
 
1.8%
20110120 17
 
1.7%
20131002 16
 
1.6%
20130722 14
 
1.4%
20130624 14
 
1.4%
20101118 12
 
1.2%
20110127 12
 
1.2%
Other values (262) 825
82.5%
ValueCountFrequency (%)
20100902 2
 
0.2%
20100906 5
0.5%
20100909 11
1.1%
20100913 8
0.8%
20100930 5
0.5%
20101004 5
0.5%
20101011 4
 
0.4%
20101018 6
0.6%
20101025 6
0.6%
20101028 8
0.8%
ValueCountFrequency (%)
20150105 2
 
0.2%
20141229 2
 
0.2%
20141222 5
0.5%
20141203 3
0.3%
20140811 3
0.3%
20140101 1
 
0.1%
20131230 2
 
0.2%
20131227 2
 
0.2%
20131223 2
 
0.2%
20131220 6
0.6%

SEQ
Real number (ℝ)

HIGH CORRELATION 

Distinct83
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.505
Minimum0
Maximum928
Zeros2
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T01:43:07.632332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median6
Q315
95-th percentile51
Maximum928
Range928
Interquartile range (IQR)12

Descriptive statistics

Standard deviation79.996183
Coefficient of variation (CV)4.1013168
Kurtosis105.2114
Mean19.505
Median Absolute Deviation (MAD)4
Skewness10.027157
Sum19505
Variance6399.3894
MonotonicityNot monotonic
2023-12-13T01:43:07.834115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 144
14.4%
3 97
 
9.7%
4 90
 
9.0%
5 78
 
7.8%
1 77
 
7.7%
7 57
 
5.7%
6 52
 
5.2%
8 34
 
3.4%
10 31
 
3.1%
15 28
 
2.8%
Other values (73) 312
31.2%
ValueCountFrequency (%)
0 2
 
0.2%
1 77
7.7%
2 144
14.4%
3 97
9.7%
4 90
9.0%
5 78
7.8%
6 52
 
5.2%
7 57
 
5.7%
8 34
 
3.4%
9 27
 
2.7%
ValueCountFrequency (%)
928 1
0.1%
927 1
0.1%
926 1
0.1%
925 1
0.1%
924 1
0.1%
923 1
0.1%
568 1
0.1%
567 1
0.1%
566 1
0.1%
565 1
0.1%

FMLY_RELTN_CD
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
<NA>
998 
5
 
1
1
 
1

Length

Max length4
Median length4
Mean length3.994
Min length1

Unique

Unique2 ?
Unique (%)0.2%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 998
99.8%
5 1
 
0.1%
1 1
 
0.1%

Length

2023-12-13T01:43:07.976803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:43:08.098593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 998
99.8%
5 1
 
0.1%
1 1
 
0.1%

DISPOS_DY
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct250
Distinct (%)67.8%
Missing631
Missing (%)63.1%
Infinite0
Infinite (%)0.0%
Mean19936563
Minimum84.844
Maximum20130628
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T01:43:08.271721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum84.844
5-th percentile19950342
Q120020325
median20060801
Q320080509
95-th percentile20110412
Maximum20130628
Range20130543
Interquartile range (IQR)60184

Descriptive statistics

Standard deviation1474528.6
Coefficient of variation (CV)0.073961022
Kurtosis181.58045
Mean19936563
Median Absolute Deviation (MAD)29571
Skewness-13.50542
Sum7.3565918 × 109
Variance2.1742346 × 1012
MonotonicityNot monotonic
2023-12-13T01:43:08.490192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20070907.0 42
 
4.2%
19950829.0 7
 
0.7%
20110831.0 6
 
0.6%
20061113.0 4
 
0.4%
20040503.0 4
 
0.4%
20031230.0 4
 
0.4%
20060823.0 4
 
0.4%
20081210.0 4
 
0.4%
20060214.0 4
 
0.4%
20061020.0 3
 
0.3%
Other values (240) 287
28.7%
(Missing) 631
63.1%
ValueCountFrequency (%)
84.844 1
0.1%
99.44 1
0.1%
19920814.0 1
0.1%
19920929.0 1
0.1%
19921007.0 1
0.1%
19921231.0 1
0.1%
19930307.0 2
0.2%
19930701.0 2
0.2%
19931112.0 1
0.1%
19940218.0 1
0.1%
ValueCountFrequency (%)
20130628.0 1
 
0.1%
20120913.0 1
 
0.1%
20120705.0 1
 
0.1%
20111216.0 1
 
0.1%
20111208.0 2
 
0.2%
20111031.0 1
 
0.1%
20111028.0 2
 
0.2%
20111017.0 1
 
0.1%
20110920.0 1
 
0.1%
20110831.0 6
0.6%

REG_DT
Date

Distinct633
Distinct (%)63.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2010-09-07 11:59:43
Maximum2015-01-06 11:23:03
2023-12-13T01:43:08.677148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:08.848835image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

TELGRM_MAKE_DY
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct255
Distinct (%)26.0%
Missing19
Missing (%)1.9%
Infinite0
Infinite (%)0.0%
Mean20117641
Minimum20100907
Maximum20141231
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T01:43:09.015647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20100907
5-th percentile20101101
Q120110208
median20111201
Q320130618
95-th percentile20131126
Maximum20141231
Range40324
Interquartile range (IQR)20410

Descriptive statistics

Standard deviation11456.704
Coefficient of variation (CV)0.00056948546
Kurtosis-1.3797833
Mean20117641
Median Absolute Deviation (MAD)9994
Skewness0.049704113
Sum1.9735406 × 1010
Variance1.3125607 × 108
MonotonicityNot monotonic
2023-12-13T01:43:09.526872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20101129 34
 
3.4%
20110215 27
 
2.7%
20130418 24
 
2.4%
20100914 19
 
1.9%
20110117 18
 
1.8%
20110124 17
 
1.7%
20131004 16
 
1.6%
20130723 14
 
1.4%
20130625 14
 
1.4%
20110111 12
 
1.2%
Other values (245) 786
78.6%
(Missing) 19
 
1.9%
ValueCountFrequency (%)
20100907 2
 
0.2%
20100909 5
 
0.5%
20100914 19
1.9%
20101004 5
 
0.5%
20101007 5
 
0.5%
20101022 6
 
0.6%
20101026 6
 
0.6%
20101101 8
0.8%
20101115 8
0.8%
20101116 2
 
0.2%
ValueCountFrequency (%)
20141231 2
 
0.2%
20141223 5
0.5%
20141204 3
0.3%
20140812 3
0.3%
20140103 1
 
0.1%
20131230 2
 
0.2%
20131227 6
0.6%
20131224 2
 
0.2%
20131219 2
 
0.2%
20131217 2
 
0.2%

Interactions

2023-12-13T01:43:06.303796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:04.665895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.231723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.773258image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:06.465367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:04.808507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.391657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.928984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:06.583902image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:04.928406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.509918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:06.044972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:06.683223image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.067182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:05.630852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T01:43:06.176945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T01:43:09.636928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BASIS_DYSEQFMLY_RELTN_CDDISPOS_DYTELGRM_MAKE_DY
BASIS_DY1.0000.208NaN0.0000.986
SEQ0.2081.000NaN0.0000.225
FMLY_RELTN_CDNaNNaN1.000NaNNaN
DISPOS_DY0.0000.000NaN1.000NaN
TELGRM_MAKE_DY0.9860.225NaNNaN1.000
2023-12-13T01:43:09.759767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
BASIS_DYSEQDISPOS_DYTELGRM_MAKE_DYFMLY_RELTN_CD
BASIS_DY1.000-0.7000.1881.0001.000
SEQ-0.7001.000-0.187-0.6951.000
DISPOS_DY0.188-0.1871.0000.2021.000
TELGRM_MAKE_DY1.000-0.6950.2021.0001.000
FMLY_RELTN_CD1.0001.0001.0001.0001.000

Missing values

2023-12-13T01:43:06.843738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:43:06.960529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T01:43:07.068222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

BASIS_DYSEQFMLY_RELTN_CDDISPOS_DYREG_DTTELGRM_MAKE_DY
0201501052<NA><NA>2015/01/06 11:23:03<NA>
1201501052<NA><NA>2015/01/06 11:23:02<NA>
2201412291<NA><NA>2014/12/31 11:04:3220141231
3201412291<NA><NA>2014/12/31 11:04:3120141231
4201412221<NA>20100531.02014/12/23 11:07:2120141223
5201412221<NA><NA>2014/12/23 11:07:0020141223
6201412221<NA>20070131.02014/12/23 11:07:0020141223
7201412221<NA>19980311.02014/12/23 11:07:0020141223
8201412221<NA><NA>2014/12/23 11:06:5920141223
9201412032<NA><NA>2014/12/04 11:13:3520141204
BASIS_DYSEQFMLY_RELTN_CDDISPOS_DYREG_DTTELGRM_MAKE_DY
9902010090970<NA>20020416.02010/09/14 11:35:5820100914
9912010090969<NA><NA>2010/09/14 11:35:5820100914
9922010090968<NA>19960506.02010/09/14 11:35:5820100914
9932010090967<NA>20070718.02010/09/14 11:35:5820100914
9942010090966<NA><NA>2010/09/14 11:35:5820100914
9952010090965<NA><NA>2010/09/14 11:35:5820100914
9962010090964<NA>20021002.02010/09/14 11:35:5820100914
9972010090963<NA>20041213.02010/09/14 11:35:5820100914
9982010090962<NA>20080509.02010/09/14 11:35:5820100914
999201009020120000306.02010/09/07 11:59:4320100907

Duplicate rows

Most frequently occurring

BASIS_DYSEQFMLY_RELTN_CDDISPOS_DYREG_DTTELGRM_MAKE_DY# duplicates
23201306243<NA>19950829.02013/06/25 11:29:19201306255
02011012715<NA>20040503.02011/01/31 15:55:02201101313
32011021037<NA><NA>2011/02/14 11:08:20201102153
52011021037<NA><NA>2011/02/14 11:08:22201102153
62011022117<NA><NA>2011/02/22 18:33:18201102223
72011060815<NA><NA>2011/06/09 11:10:56201106093
10201110194<NA><NA>2011/10/20 11:32:19201110243
12011021032<NA><NA>2011/02/14 11:08:24201102152
22011021037<NA><NA>2011/02/14 11:08:19201102152
42011021037<NA><NA>2011/02/14 11:08:21201102152