Overview

Dataset statistics

Number of variables11
Number of observations1000
Missing cells619
Missing cells (%)5.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory93.9 KiB
Average record size in memory96.1 B

Variable types

Numeric3
Categorical7
DateTime1

Dataset

Description한국주택금융공사 채권관리부 업무 관련 공개 데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072934/fileData.do

Alerts

ACCT_DEPT_CD is highly overall correlated with REG_BR_CDHigh correlation
REG_BR_CD is highly overall correlated with ACCT_DEPT_CDHigh correlation
DPOSIT_ORGN_RCV_DY is highly overall correlated with DPOSIT_PRCSS_SEQ and 2 other fieldsHigh correlation
DPOSIT_PRCSS_SEQ is highly overall correlated with DPOSIT_ORGN_RCV_DYHigh correlation
DPOSIT_CLLCT_CANCEL_SEQ is highly overall correlated with DPOSIT_ORGN_RCV_DYHigh correlation
DPOSIT_CLLCT_CRRCT_SEQ is highly overall correlated with DPOSIT_ORGN_RCV_DYHigh correlation
DPOSIT_SEQ is highly imbalanced (82.0%)Imbalance
DPOSIT_PRCSS_SEQ is highly imbalanced (94.7%)Imbalance
UPDT_APPLY_DY is highly imbalanced (98.9%)Imbalance
DPOSIT_CLLCT_CANCEL_SEQ is highly imbalanced (94.6%)Imbalance
DPOSIT_CLLCT_CRRCT_SEQ is highly imbalanced (92.6%)Imbalance
DPOSIT_ORGN_RCV_DY has 619 (61.9%) missing valuesMissing
REG_ENO is highly skewed (γ1 = 29.78769685)Skewed

Reproduction

Analysis started2023-12-13 00:15:30.383383
Analysis finished2023-12-13 00:15:31.834217
Duration1.45 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

MDBTR_CUST_NO
Real number (ℝ)

Distinct886
Distinct (%)88.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean91453628
Minimum86118
Maximum1.3603451 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T09:15:31.892621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum86118
5-th percentile38360015
Q178895418
median94808299
Q31.1019683 × 108
95-th percentile1.2280051 × 108
Maximum1.3603451 × 108
Range1.3594839 × 108
Interquartile range (IQR)31301411

Descriptive statistics

Standard deviation24425481
Coefficient of variation (CV)0.2670805
Kurtosis0.91500053
Mean91453628
Median Absolute Deviation (MAD)15492232
Skewness-0.96151942
Sum9.1453628 × 1010
Variance5.9660411 × 1014
MonotonicityNot monotonic
2023-12-13T09:15:31.999739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
93599789 4
 
0.4%
117767895 4
 
0.4%
118995297 4
 
0.4%
89418683 3
 
0.3%
40744164 3
 
0.3%
35453541 3
 
0.3%
117676122 3
 
0.3%
78191995 3
 
0.3%
128336338 3
 
0.3%
88164352 3
 
0.3%
Other values (876) 967
96.7%
ValueCountFrequency (%)
86118 1
 
0.1%
8975865 1
 
0.1%
14384482 3
0.3%
14470383 1
 
0.1%
14929656 3
0.3%
15360874 1
 
0.1%
15802523 1
 
0.1%
17194585 1
 
0.1%
18716043 2
0.2%
19006723 1
 
0.1%
ValueCountFrequency (%)
136034509 1
 
0.1%
130043473 1
 
0.1%
129907119 1
 
0.1%
129398236 1
 
0.1%
128943598 1
 
0.1%
128881391 1
 
0.1%
128690087 2
0.2%
128336338 3
0.3%
128236111 1
 
0.1%
128050287 1
 
0.1%

DPOSIT_SEQ
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
944 
2
 
43
3
 
12
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 944
94.4%
2 43
 
4.3%
3 12
 
1.2%
4 1
 
0.1%

Length

2023-12-13T09:15:32.095843image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:15:32.177806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 944
94.4%
2 43
 
4.3%
3 12
 
1.2%
4 1
 
0.1%

DPOSIT_PRCSS_SEQ
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
994 
2
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 994
99.4%
2 6
 
0.6%

Length

2023-12-13T09:15:32.268616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:15:32.340513image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 994
99.4%
2 6
 
0.6%

UPDT_APPLY_DY
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
<NA>
999 
20200901
 
1

Length

Max length8
Median length4
Mean length4.004
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 999
99.9%
20200901 1
 
0.1%

Length

2023-12-13T09:15:32.419708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:15:32.498421image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 999
99.9%
20200901 1
 
0.1%

DPOSIT_CLLCT_CANCEL_SEQ
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
0
990 
1
 
9
<NA>
 
1

Length

Max length4
Median length1
Mean length1.003
Min length1

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 990
99.0%
1 9
 
0.9%
<NA> 1
 
0.1%

Length

2023-12-13T09:15:32.574846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:15:32.648862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 990
99.0%
1 9
 
0.9%
na 1
 
0.1%

DPOSIT_CLLCT_CRRCT_SEQ
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
0
991 
1
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 991
99.1%
1 9
 
0.9%

Length

2023-12-13T09:15:32.718858image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T09:15:32.783531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 991
99.1%
1 9
 
0.9%

ACCT_DEPT_CD
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
THA
129 
TAA
93 
QAD
89 
THO
86 
TAC
83 
Other values (21)
520 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowTAB
2nd rowTAA
3rd rowTHB
4th rowTHB
5th rowTHB

Common Values

ValueCountFrequency (%)
THA 129
12.9%
TAA 93
 
9.3%
QAD 89
 
8.9%
THO 86
 
8.6%
TAC 83
 
8.3%
THB 73
 
7.3%
TAB 56
 
5.6%
TAD 50
 
5.0%
TBA 48
 
4.8%
TPA 44
 
4.4%
Other values (16) 249
24.9%

Length

2023-12-13T09:15:32.857170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tha 129
12.9%
taa 93
 
9.3%
qad 89
 
8.9%
tho 86
 
8.6%
tac 83
 
8.3%
thb 73
 
7.3%
tab 56
 
5.6%
tad 50
 
5.0%
tba 48
 
4.8%
tpa 44
 
4.4%
Other values (16) 249
24.9%

REG_TS
Date

Distinct982
Distinct (%)98.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2020-04-29 10:49:22
Maximum2020-10-30 10:54:08
2023-12-13T09:15:32.940876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:33.041777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

REG_ENO
Real number (ℝ)

SKEWED 

Distinct81
Distinct (%)8.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1821.732
Minimum1253
Maximum53620
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T09:15:33.150870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1253
5-th percentile1487
Q11606
median1823
Q31883
95-th percentile1958
Maximum53620
Range52367
Interquartile range (IQR)277

Descriptive statistics

Standard deviation1674.082
Coefficient of variation (CV)0.91895077
Kurtosis920.17688
Mean1821.732
Median Absolute Deviation (MAD)127
Skewness29.787697
Sum1821732
Variance2802550.6
MonotonicityNot monotonic
2023-12-13T09:15:33.271785image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1842 87
 
8.7%
1592 68
 
6.8%
1897 59
 
5.9%
1690 50
 
5.0%
1872 46
 
4.6%
1696 45
 
4.5%
1590 37
 
3.7%
1487 37
 
3.7%
1883 32
 
3.2%
1823 31
 
3.1%
Other values (71) 508
50.8%
ValueCountFrequency (%)
1253 1
 
0.1%
1339 12
 
1.2%
1348 1
 
0.1%
1406 6
 
0.6%
1455 4
 
0.4%
1476 7
 
0.7%
1487 37
3.7%
1513 4
 
0.4%
1518 8
 
0.8%
1520 12
 
1.2%
ValueCountFrequency (%)
53620 1
 
0.1%
6002 5
 
0.5%
2002 1
 
0.1%
2000 4
 
0.4%
1987 8
 
0.8%
1980 4
 
0.4%
1978 21
2.1%
1973 4
 
0.4%
1958 3
 
0.3%
1937 9
0.9%

REG_BR_CD
Categorical

HIGH CORRELATION 

Distinct27
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
THA
129 
TAA
93 
QAD
89 
THO
86 
TAC
83 
Other values (22)
520 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique2 ?
Unique (%)0.2%

Sample

1st rowTAB
2nd rowTAA
3rd rowTHB
4th rowTHB
5th rowTHB

Common Values

ValueCountFrequency (%)
THA 129
12.9%
TAA 93
 
9.3%
QAD 89
 
8.9%
THO 86
 
8.6%
TAC 83
 
8.3%
THB 73
 
7.3%
TAB 56
 
5.6%
TAD 50
 
5.0%
TBA 48
 
4.8%
TPA 44
 
4.4%
Other values (17) 249
24.9%

Length

2023-12-13T09:15:33.375721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tha 129
12.9%
taa 93
 
9.3%
qad 89
 
8.9%
tho 86
 
8.6%
tac 83
 
8.3%
thb 73
 
7.3%
tab 56
 
5.6%
tad 50
 
5.0%
tba 48
 
4.8%
tpa 44
 
4.4%
Other values (17) 249
24.9%

DPOSIT_ORGN_RCV_DY
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct111
Distinct (%)29.1%
Missing619
Missing (%)61.9%
Infinite0
Infinite (%)0.0%
Mean20200767
Minimum20200429
Maximum20201028
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T09:15:33.462791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20200429
5-th percentile20200514
Q120200623
median20200728
Q320200915
95-th percentile20201020
Maximum20201028
Range599
Interquartile range (IQR)292

Descriptive statistics

Standard deviation163.38713
Coefficient of variation (CV)8.0881645 × 10-6
Kurtosis-1.1361828
Mean20200767
Median Absolute Deviation (MAD)124
Skewness-0.0036151564
Sum7.6964923 × 109
Variance26695.354
MonotonicityNot monotonic
2023-12-13T09:15:33.563255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20200924 13
 
1.3%
20200728 12
 
1.2%
20200703 10
 
1.0%
20200720 9
 
0.9%
20200731 9
 
0.9%
20200721 8
 
0.8%
20201008 8
 
0.8%
20200630 8
 
0.8%
20200608 7
 
0.7%
20201021 7
 
0.7%
Other values (101) 290
29.0%
(Missing) 619
61.9%
ValueCountFrequency (%)
20200429 3
0.3%
20200506 2
 
0.2%
20200507 5
0.5%
20200508 2
 
0.2%
20200511 3
0.3%
20200512 2
 
0.2%
20200513 1
 
0.1%
20200514 4
0.4%
20200515 2
 
0.2%
20200518 3
0.3%
ValueCountFrequency (%)
20201028 2
 
0.2%
20201027 5
0.5%
20201026 1
 
0.1%
20201022 1
 
0.1%
20201021 7
0.7%
20201020 5
0.5%
20201019 1
 
0.1%
20201016 2
 
0.2%
20201014 1
 
0.1%
20201013 4
0.4%

Interactions

2023-12-13T09:15:31.413770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:30.932014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.174288image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.484825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.022492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.275486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.552287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.097165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T09:15:31.345668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T09:15:33.638565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
MDBTR_CUST_NODPOSIT_SEQDPOSIT_PRCSS_SEQDPOSIT_CLLCT_CANCEL_SEQDPOSIT_CLLCT_CRRCT_SEQACCT_DEPT_CDREG_ENOREG_BR_CDDPOSIT_ORGN_RCV_DY
MDBTR_CUST_NO1.0000.0000.1320.1010.1010.3110.1390.3050.000
DPOSIT_SEQ0.0001.0000.0000.0000.0000.0690.0000.0330.000
DPOSIT_PRCSS_SEQ0.1320.0001.0000.0000.0000.4090.0000.376NaN
DPOSIT_CLLCT_CANCEL_SEQ0.1010.0000.0001.0000.0000.3210.0000.321NaN
DPOSIT_CLLCT_CRRCT_SEQ0.1010.0000.0000.0001.0000.3210.0000.295NaN
ACCT_DEPT_CD0.3110.0690.4090.3210.3211.0000.0001.0000.566
REG_ENO0.1390.0000.0000.0000.0000.0001.0000.0000.164
REG_BR_CD0.3050.0330.3760.3210.2951.0000.0001.0000.568
DPOSIT_ORGN_RCV_DY0.0000.000NaNNaNNaN0.5660.1640.5681.000
2023-12-13T09:15:33.737006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
DPOSIT_CLLCT_CRRCT_SEQDPOSIT_SEQDPOSIT_CLLCT_CANCEL_SEQUPDT_APPLY_DYACCT_DEPT_CDDPOSIT_PRCSS_SEQREG_BR_CD
DPOSIT_CLLCT_CRRCT_SEQ1.0000.0000.000NaN0.2520.0000.250
DPOSIT_SEQ0.0001.0000.000NaN0.0360.0000.016
DPOSIT_CLLCT_CANCEL_SEQ0.0000.0001.000NaN0.2520.0000.252
UPDT_APPLY_DYNaNNaNNaN1.000NaNNaNNaN
ACCT_DEPT_CD0.2520.0360.252NaN1.0000.3210.999
DPOSIT_PRCSS_SEQ0.0000.0000.000NaN0.3211.0000.320
REG_BR_CD0.2500.0160.252NaN0.9990.3201.000
2023-12-13T09:15:34.050313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
MDBTR_CUST_NOREG_ENODPOSIT_ORGN_RCV_DYDPOSIT_SEQDPOSIT_PRCSS_SEQUPDT_APPLY_DYDPOSIT_CLLCT_CANCEL_SEQDPOSIT_CLLCT_CRRCT_SEQACCT_DEPT_CDREG_BR_CD
MDBTR_CUST_NO1.0000.018-0.0350.0000.101NaN0.0770.0770.1230.121
REG_ENO0.0181.000-0.1150.0000.000NaN0.0000.0000.0000.000
DPOSIT_ORGN_RCV_DY-0.035-0.1151.0000.0001.0000.0001.0001.0000.2310.232
DPOSIT_SEQ0.0000.0000.0001.0000.000NaN0.0000.0000.0360.016
DPOSIT_PRCSS_SEQ0.1010.0001.0000.0001.000NaN0.0000.0000.3210.320
UPDT_APPLY_DYNaNNaN0.000NaNNaN1.000NaNNaNNaNNaN
DPOSIT_CLLCT_CANCEL_SEQ0.0770.0001.0000.0000.000NaN1.0000.0000.2520.252
DPOSIT_CLLCT_CRRCT_SEQ0.0770.0001.0000.0000.000NaN0.0001.0000.2520.250
ACCT_DEPT_CD0.1230.0000.2310.0360.321NaN0.2520.2521.0000.999
REG_BR_CD0.1210.0000.2320.0160.320NaN0.2520.2500.9991.000

Missing values

2023-12-13T09:15:31.647964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T09:15:31.777230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

MDBTR_CUST_NODPOSIT_SEQDPOSIT_PRCSS_SEQUPDT_APPLY_DYDPOSIT_CLLCT_CANCEL_SEQDPOSIT_CLLCT_CRRCT_SEQACCT_DEPT_CDREG_TSREG_ENOREG_BR_CDDPOSIT_ORGN_RCV_DY
07338267311<NA>00TAB2020/10/30 10:54:081696TAB<NA>
18100367211<NA>00TAA2020/10/29 15:57:201823TAA<NA>
24701489711<NA>00THB2020/10/29 13:44:391592THB<NA>
311059138511<NA>00THB2020/10/29 13:40:311592THB<NA>
45232658211<NA>00THB2020/10/29 13:38:201592THB<NA>
58127689411<NA>00THB2020/10/29 13:35:191592THB<NA>
68373744111<NA>00TOA2020/10/29 12:06:251520TOA<NA>
77857876711<NA>00TNA2020/10/29 10:20:071648TNA<NA>
89828124111<NA>00TNA2020/10/29 10:17:411648TNA<NA>
99395038611<NA>00THB2020/10/28 15:05:561592THB20201028
MDBTR_CUST_NODPOSIT_SEQDPOSIT_PRCSS_SEQUPDT_APPLY_DYDPOSIT_CLLCT_CANCEL_SEQDPOSIT_CLLCT_CRRCT_SEQACCT_DEPT_CDREG_TSREG_ENOREG_BR_CDDPOSIT_ORGN_RCV_DY
99010959780821<NA>00TLB2020/04/29 14:38:561476TLB20200610
99110959780811<NA>00TLB2020/04/29 14:38:561476TLB20200610
9928812887711<NA>00TAA2020/04/29 14:35:381823TAA20200521
9938089845111<NA>00THB2020/04/29 14:14:071455THB20200429
9945232658211<NA>00THB2020/04/29 14:13:391455THB20200429
9958262117611<NA>00TAA2020/04/29 13:52:211883TAA20200507
9967776171111<NA>00TAC2020/04/29 13:48:041590TAC20200429
99711899529721<NA>00TQB2020/04/29 11:34:391817TQB20200506
99811899529711<NA>00TQB2020/04/29 11:34:391817TQB20200506
9997443495711<NA>00TNA2020/04/29 10:49:221915TNA20200508