Overview

Dataset statistics

Number of variables10
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory86.1 KiB
Average record size in memory88.1 B

Variable types

Text1
Numeric4
Categorical4
DateTime1

Dataset

Description한국주택금융공사 주택연금부 승인정보 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)입니다.
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072953/fileData.do

Alerts

APPRV_DVCD is highly overall correlated with CANCEL_DYHigh correlation
INSP_PP_KIND_CD is highly overall correlated with REG_ENO and 1 other fieldsHigh correlation
CANCEL_DY is highly overall correlated with INDIV_SEQ and 5 other fieldsHigh correlation
INDIV_SEQ is highly overall correlated with REG_ENO and 1 other fieldsHigh correlation
APPRV_DY is highly overall correlated with APPRV_NOTI_DY and 1 other fieldsHigh correlation
APPRV_NOTI_DY is highly overall correlated with APPRV_DY and 1 other fieldsHigh correlation
REG_ENO is highly overall correlated with INDIV_SEQ and 2 other fieldsHigh correlation
APPRV_DVCD is highly imbalanced (97.1%)Imbalance
CANCEL_DY is highly imbalanced (98.0%)Imbalance
INDIV_SEQ has 712 (71.2%) zerosZeros

Reproduction

Analysis started2023-12-12 18:37:56.441751
Analysis finished2023-12-12 18:38:01.969463
Duration5.53 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct788
Distinct (%)78.8%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-13T03:38:02.376002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters14000
Distinct characters24
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique577 ?
Unique (%)57.7%

Sample

1st rowRTHB2020000607
2nd rowRTHB2020000607
3rd rowRTBA2015000097
4th rowRTAA2019000046
5th rowRTAA2013000232
ValueCountFrequency (%)
rtab2018000639 3
 
0.3%
rtab2020000777 2
 
0.2%
rtba2020000647 2
 
0.2%
rtad2020000629 2
 
0.2%
rtab2020000767 2
 
0.2%
rtad2020000640 2
 
0.2%
rtac2020000730 2
 
0.2%
rtaa2020000611 2
 
0.2%
rtaa2020000570 2
 
0.2%
rtab2020000801 2
 
0.2%
Other values (778) 979
97.9%
2023-12-13T03:38:03.275658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 5147
36.8%
2 2227
15.9%
R 1002
 
7.2%
T 909
 
6.5%
A 886
 
6.3%
6 575
 
4.1%
7 406
 
2.9%
5 355
 
2.5%
1 349
 
2.5%
B 338
 
2.4%
Other values (14) 1806
 
12.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10000
71.4%
Uppercase Letter 4000
 
28.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 1002
25.1%
T 909
22.7%
A 886
22.1%
B 338
 
8.5%
H 221
 
5.5%
D 187
 
4.7%
Q 125
 
3.1%
C 94
 
2.4%
O 75
 
1.9%
P 69
 
1.7%
Other values (4) 94
 
2.4%
Decimal Number
ValueCountFrequency (%)
0 5147
51.5%
2 2227
22.3%
6 575
 
5.8%
7 406
 
4.1%
5 355
 
3.5%
1 349
 
3.5%
4 272
 
2.7%
3 246
 
2.5%
8 212
 
2.1%
9 211
 
2.1%

Most occurring scripts

ValueCountFrequency (%)
Common 10000
71.4%
Latin 4000
 
28.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 1002
25.1%
T 909
22.7%
A 886
22.1%
B 338
 
8.5%
H 221
 
5.5%
D 187
 
4.7%
Q 125
 
3.1%
C 94
 
2.4%
O 75
 
1.9%
P 69
 
1.7%
Other values (4) 94
 
2.4%
Common
ValueCountFrequency (%)
0 5147
51.5%
2 2227
22.3%
6 575
 
5.8%
7 406
 
4.1%
5 355
 
3.5%
1 349
 
3.5%
4 272
 
2.7%
3 246
 
2.5%
8 212
 
2.1%
9 211
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5147
36.8%
2 2227
15.9%
R 1002
 
7.2%
T 909
 
6.5%
A 886
 
6.3%
6 575
 
4.1%
7 406
 
2.9%
5 355
 
2.5%
1 349
 
2.5%
B 338
 
2.4%
Other values (14) 1806
 
12.9%

INDIV_SEQ
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct12
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.485
Minimum0
Maximum36
Zeros712
Zeros (%)71.2%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T03:38:03.546102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile2
Maximum36
Range36
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.0717305
Coefficient of variation (CV)4.2716092
Kurtosis232.01665
Mean0.485
Median Absolute Deviation (MAD)0
Skewness14.288464
Sum485
Variance4.2920671
MonotonicityNot monotonic
2023-12-13T03:38:03.771872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
0 712
71.2%
1 237
 
23.7%
2 29
 
2.9%
3 11
 
1.1%
4 3
 
0.3%
6 2
 
0.2%
7 1
 
0.1%
36 1
 
0.1%
35 1
 
0.1%
34 1
 
0.1%
Other values (2) 2
 
0.2%
ValueCountFrequency (%)
0 712
71.2%
1 237
 
23.7%
2 29
 
2.9%
3 11
 
1.1%
4 3
 
0.3%
6 2
 
0.2%
7 1
 
0.1%
9 1
 
0.1%
12 1
 
0.1%
34 1
 
0.1%
ValueCountFrequency (%)
36 1
 
0.1%
35 1
 
0.1%
34 1
 
0.1%
12 1
 
0.1%
9 1
 
0.1%
7 1
 
0.1%
6 2
 
0.2%
4 3
 
0.3%
3 11
 
1.1%
2 29
2.9%

INSP_PP_KIND_CD
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
712 
<NA>
208 
2
80 

Length

Max length4
Median length1
Mean length1.624
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row<NA>
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 712
71.2%
<NA> 208
 
20.8%
2 80
 
8.0%

Length

2023-12-13T03:38:04.047751image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:38:04.242678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 712
71.2%
na 208
 
20.8%
2 80
 
8.0%

APPRV_DY
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20201009
Minimum20200928
Maximum20201026
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T03:38:04.450483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20200928
5-th percentile20200929
Q120201008
median20201015
Q320201020
95-th percentile20201023
Maximum20201026
Range98
Interquartile range (IQR)12.25

Descriptive statistics

Standard deviation23.180595
Coefficient of variation (CV)1.1474968 × 10-6
Kurtosis7.72704
Mean20201009
Median Absolute Deviation (MAD)6
Skewness-2.9981353
Sum2.0201009 × 1010
Variance537.33997
MonotonicityDecreasing
2023-12-13T03:38:04.689963image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
20201015 88
 
8.8%
20201023 85
 
8.5%
20201022 80
 
8.0%
20201013 79
 
7.9%
20201020 74
 
7.4%
20201016 72
 
7.2%
20201019 69
 
6.9%
20201021 64
 
6.4%
20201012 63
 
6.3%
20201007 60
 
6.0%
Other values (7) 266
26.6%
ValueCountFrequency (%)
20200928 24
 
2.4%
20200929 48
4.8%
20201005 26
 
2.6%
20201006 51
5.1%
20201007 60
6.0%
20201008 42
4.2%
20201012 63
6.3%
20201013 79
7.9%
20201014 54
5.4%
20201015 88
8.8%
ValueCountFrequency (%)
20201026 21
 
2.1%
20201023 85
8.5%
20201022 80
8.0%
20201021 64
6.4%
20201020 74
7.4%
20201019 69
6.9%
20201016 72
7.2%
20201015 88
8.8%
20201014 54
5.4%
20201013 79
7.9%

APPRV_DVCD
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
997 
3
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 997
99.7%
3 3
 
0.3%

Length

2023-12-13T03:38:04.916296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:38:05.101461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 997
99.7%
3 3
 
0.3%

APPRV_NOTI_DY
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20201009
Minimum20200928
Maximum20201026
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T03:38:05.291563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20200928
5-th percentile20200929
Q120201008
median20201015
Q320201020
95-th percentile20201023
Maximum20201026
Range98
Interquartile range (IQR)12.25

Descriptive statistics

Standard deviation23.180595
Coefficient of variation (CV)1.1474968 × 10-6
Kurtosis7.72704
Mean20201009
Median Absolute Deviation (MAD)6
Skewness-2.9981353
Sum2.0201009 × 1010
Variance537.33997
MonotonicityDecreasing
2023-12-13T03:38:05.501356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
20201015 88
 
8.8%
20201023 85
 
8.5%
20201022 80
 
8.0%
20201013 79
 
7.9%
20201020 74
 
7.4%
20201016 72
 
7.2%
20201019 69
 
6.9%
20201021 64
 
6.4%
20201012 63
 
6.3%
20201007 60
 
6.0%
Other values (7) 266
26.6%
ValueCountFrequency (%)
20200928 24
 
2.4%
20200929 48
4.8%
20201005 26
 
2.6%
20201006 51
5.1%
20201007 60
6.0%
20201008 42
4.2%
20201012 63
6.3%
20201013 79
7.9%
20201014 54
5.4%
20201015 88
8.8%
ValueCountFrequency (%)
20201026 21
 
2.1%
20201023 85
8.5%
20201022 80
8.0%
20201021 64
6.4%
20201020 74
7.4%
20201019 69
6.9%
20201016 72
7.2%
20201015 88
8.8%
20201014 54
5.4%
20201013 79
7.9%

TEAM_CD
Categorical

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
<NA>
647 
1
189 
2
82 
3
82 

Length

Max length4
Median length4
Mean length2.941
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
<NA> 647
64.7%
1 189
 
18.9%
2 82
 
8.2%
3 82
 
8.2%

Length

2023-12-13T03:38:05.727679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:38:05.920648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 647
64.7%
1 189
 
18.9%
2 82
 
8.2%
3 82
 
8.2%

CANCEL_DY
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
<NA>
997 
20201023
 
2
20201020
 
1

Length

Max length8
Median length4
Mean length4.012
Min length4

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 997
99.7%
20201023 2
 
0.2%
20201020 1
 
0.1%

Length

2023-12-13T03:38:06.170041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T03:38:06.426815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 997
99.7%
20201023 2
 
0.2%
20201020 1
 
0.1%

REG_ENO
Real number (ℝ)

HIGH CORRELATION 

Distinct93
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1297.536
Minimum1007
Maximum6018
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T03:38:06.650672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1007
5-th percentile1023
Q11088
median1175
Q31385
95-th percentile1937.05
Maximum6018
Range5011
Interquartile range (IQR)297

Descriptive statistics

Standard deviation325.81898
Coefficient of variation (CV)0.25110592
Kurtosis43.196059
Mean1297.536
Median Absolute Deviation (MAD)94
Skewness3.8341379
Sum1297536
Variance106158
MonotonicityNot monotonic
2023-12-13T03:38:06.923239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1175 80
 
8.0%
1086 68
 
6.8%
1104 67
 
6.7%
1023 59
 
5.9%
1279 51
 
5.1%
1224 49
 
4.9%
1037 49
 
4.9%
1088 48
 
4.8%
1194 27
 
2.7%
1113 25
 
2.5%
Other values (83) 477
47.7%
ValueCountFrequency (%)
1007 19
 
1.9%
1021 19
 
1.9%
1023 59
5.9%
1032 14
 
1.4%
1037 49
4.9%
1081 15
 
1.5%
1086 68
6.8%
1088 48
4.8%
1098 23
 
2.3%
1103 8
 
0.8%
ValueCountFrequency (%)
6018 1
 
0.1%
2003 1
 
0.1%
2001 11
1.1%
2000 3
 
0.3%
1982 1
 
0.1%
1980 1
 
0.1%
1977 10
1.0%
1973 2
 
0.2%
1970 9
0.9%
1968 3
 
0.3%

REG_TS
Date

Distinct715
Distinct (%)71.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2020-09-28 15:38:29
Maximum2020-10-26 15:11:01
2023-12-13T03:38:07.143387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:38:07.402116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-13T03:38:00.381169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:57.457038image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:58.400327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:59.362678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:38:00.643001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:57.706332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:58.680023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:59.625872image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:38:00.870862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:57.943887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:58.900400image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:59.880628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:38:01.124696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:58.166724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:37:59.122979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T03:38:00.124237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T03:38:07.580626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
INDIV_SEQINSP_PP_KIND_CDAPPRV_DYAPPRV_DVCDAPPRV_NOTI_DYTEAM_CDCANCEL_DYREG_ENO
INDIV_SEQ1.0000.2850.0000.0000.0000.150NaN0.721
INSP_PP_KIND_CD0.2851.0000.0210.0000.0210.166NaN0.600
APPRV_DY0.0000.0211.0000.0311.0000.000NaN0.000
APPRV_DVCD0.0000.0000.0311.0000.0310.000NaN0.000
APPRV_NOTI_DY0.0000.0211.0000.0311.0000.000NaN0.000
TEAM_CD0.1500.1660.0000.0000.0001.000NaN0.125
CANCEL_DYNaNNaNNaNNaNNaNNaN1.000NaN
REG_ENO0.7210.6000.0000.0000.0000.125NaN1.000
2023-12-13T03:38:07.825649image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
APPRV_DVCDTEAM_CDINSP_PP_KIND_CDCANCEL_DY
APPRV_DVCD1.0000.0000.0001.000
TEAM_CD0.0001.0000.272NaN
INSP_PP_KIND_CD0.0000.2721.0001.000
CANCEL_DY1.000NaN1.0001.000
2023-12-13T03:38:08.027321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
INDIV_SEQAPPRV_DYAPPRV_NOTI_DYREG_ENOINSP_PP_KIND_CDAPPRV_DVCDTEAM_CDCANCEL_DY
INDIV_SEQ1.000-0.016-0.0160.7630.3470.0000.1131.000
APPRV_DY-0.0161.0001.0000.0520.0510.0240.0401.000
APPRV_NOTI_DY-0.0161.0001.0000.0520.0510.0240.0401.000
REG_ENO0.7630.0520.0521.0000.8690.0000.0371.000
INSP_PP_KIND_CD0.3470.0510.0510.8691.0000.0000.2721.000
APPRV_DVCD0.0000.0240.0240.0000.0001.0000.0001.000
TEAM_CD0.1130.0400.0400.0370.2720.0001.000NaN
CANCEL_DY1.0001.0001.0001.0001.0001.000NaN1.000

Missing values

2023-12-13T03:38:01.448553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T03:38:01.832292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

GUARNT_NOINDIV_SEQINSP_PP_KIND_CDAPPRV_DYAPPRV_DVCDAPPRV_NOTI_DYTEAM_CDCANCEL_DYREG_ENOREG_TS
0RTHB20200006070120201026120201026<NA><NA>12792020/10/26 15:11:01
1RTHB20200006071<NA>202010261202010262<NA>20012020/10/26 15:11:01
2RTBA201500009712202010261202010262<NA>19012020/10/26 14:55:40
3RTAA201900004622202010261202010261<NA>17992020/10/26 14:55:13
4RTAA201300023272202010261202010261<NA>17992020/10/26 14:50:18
5RTBA20200006670120201026120201026<NA><NA>12242020/10/26 14:24:28
6RTBA20200006730120201026120201026<NA><NA>12242020/10/26 14:24:28
7RTBA20200006650120201026120201026<NA><NA>12242020/10/26 14:24:27
8RTBA20200006720120201026120201026<NA><NA>12242020/10/26 14:24:26
9RTBA20200006550120201026120201026<NA><NA>12242020/10/26 14:24:25
GUARNT_NOINDIV_SEQINSP_PP_KIND_CDAPPRV_DYAPPRV_DVCDAPPRV_NOTI_DYTEAM_CDCANCEL_DYREG_ENOREG_TS
990RTHA20200006680120200928120200928<NA><NA>11752020/09/28 16:48:54
991RTHA20200006700120200928120200928<NA><NA>11752020/09/28 16:48:53
992RTHA20200006660120200928120200928<NA><NA>11752020/09/28 16:48:52
993RTHA20200006520120200928120200928<NA><NA>11752020/09/28 16:48:52
994RQAD20200006020120200928120200928<NA><NA>10232020/09/28 16:05:51
995RQAD20200006021<NA>202009281202009281<NA>14062020/09/28 16:05:51
996RTBB20200001860120200928120200928<NA><NA>10322020/09/28 16:02:21
997RTAA202000052201202009281202009281<NA>13162020/09/28 15:42:01
998RTAA20200005221<NA>202009281202009281<NA>17992020/09/28 15:42:01
999RTBB20200001870120200928120200928<NA><NA>10322020/09/28 15:38:29