Overview

Dataset statistics

Number of variables7
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)8.0%
Total size in memory5.7 KiB
Average record size in memory58.3 B

Variable types

Text2
Categorical4
Numeric1

Dataset

Description한국주택금융공사의 DM발송정보에 대한 데이터로, 발송일, 최고서종류 등 에 대한 정보를 포함하고 있습니다. 공공데이터 개방 정책에 따라 공개됩니다.
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15073300/fileData.do

Alerts

Dataset has 8 (8.0%) duplicate rowsDuplicates
등록자사번 is highly overall correlated with 최고서종류 and 3 other fieldsHigh correlation
발송부점 is highly overall correlated with 최고서종류 and 2 other fieldsHigh correlation
최고서종류 is highly overall correlated with 최고일 and 2 other fieldsHigh correlation
발송일 is highly overall correlated with 최고일 and 2 other fieldsHigh correlation
최고일 is highly overall correlated with 최고서종류 and 2 other fieldsHigh correlation
발송일 is highly imbalanced (89.8%)Imbalance
발송부점 is highly imbalanced (63.0%)Imbalance
등록자사번 is highly imbalanced (51.5%)Imbalance

Reproduction

Analysis started2023-12-12 00:30:21.440052
Analysis finished2023-12-12 00:30:22.215337
Duration0.78 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct64
Distinct (%)64.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-12T09:30:22.423335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters1400
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)38.0%

Sample

1st rowKHFCMB2017S_14
2nd rowKHFCMB2013S_22
3rd rowKHFCMB2017S_02
4th rowKHFCMB2017S_13
5th rowKHFCMB2017S_04
ValueCountFrequency (%)
khfcmb2014s_19 4
 
4.0%
khfcmb2012s_31 4
 
4.0%
khfcmb2017s_05 4
 
4.0%
khfcmb2013s_27 3
 
3.0%
khfcmb2017s_04 3
 
3.0%
khfcmb2012s_17 3
 
3.0%
khfcmb2015s_26 3
 
3.0%
khfcmb2017s_27 2
 
2.0%
khfcmb2016s_25 2
 
2.0%
khfcmb2015s_17 2
 
2.0%
Other values (54) 70
70.0%
2023-12-12T09:30:22.875341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 151
10.8%
1 145
10.4%
0 135
9.6%
B 101
 
7.2%
K 100
 
7.1%
H 100
 
7.1%
F 100
 
7.1%
C 100
 
7.1%
M 100
 
7.1%
_ 100
 
7.1%
Other values (10) 268
19.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 702
50.1%
Decimal Number 598
42.7%
Connector Punctuation 100
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 151
25.3%
1 145
24.2%
0 135
22.6%
7 38
 
6.4%
3 32
 
5.4%
5 29
 
4.8%
6 24
 
4.0%
8 22
 
3.7%
4 15
 
2.5%
9 7
 
1.2%
Uppercase Letter
ValueCountFrequency (%)
B 101
14.4%
K 100
14.2%
H 100
14.2%
F 100
14.2%
C 100
14.2%
M 100
14.2%
S 97
13.8%
L 2
 
0.3%
A 2
 
0.3%
Connector Punctuation
ValueCountFrequency (%)
_ 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 702
50.1%
Common 698
49.9%

Most frequent character per script

Common
ValueCountFrequency (%)
2 151
21.6%
1 145
20.8%
0 135
19.3%
_ 100
14.3%
7 38
 
5.4%
3 32
 
4.6%
5 29
 
4.2%
6 24
 
3.4%
8 22
 
3.2%
4 15
 
2.1%
Latin
ValueCountFrequency (%)
B 101
14.4%
K 100
14.2%
H 100
14.2%
F 100
14.2%
C 100
14.2%
M 100
14.2%
S 97
13.8%
L 2
 
0.3%
A 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 151
10.8%
1 145
10.4%
0 135
9.6%
B 101
 
7.2%
K 100
 
7.1%
H 100
 
7.1%
F 100
 
7.1%
C 100
 
7.1%
M 100
 
7.1%
_ 100
 
7.1%
Other values (10) 268
19.1%
Distinct77
Distinct (%)77.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-12T09:30:23.139079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters1400
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)57.0%

Sample

1st rowB088_2017_0078
2nd rowB088_2013_0019
3rd rowB020_2017_0010
4th rowB081_2017_0063
5th rowB003_2017_0024
ValueCountFrequency (%)
b081_2018_0026 3
 
3.0%
b003_2012_0035 3
 
3.0%
b003_2017_0030 3
 
3.0%
b081_2015_0092 2
 
2.0%
b081_2017_0137 2
 
2.0%
b010_2017_0071 2
 
2.0%
b081_2012_0043 2
 
2.0%
b010_2017_0121 2
 
2.0%
b004_2012_0027 2
 
2.0%
b004_2015_0022 2
 
2.0%
Other values (67) 77
77.0%
2023-12-12T09:30:23.534094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 465
33.2%
_ 200
14.3%
1 181
 
12.9%
2 167
 
11.9%
B 100
 
7.1%
8 74
 
5.3%
3 61
 
4.4%
4 42
 
3.0%
7 41
 
2.9%
6 30
 
2.1%
Other values (2) 39
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1100
78.6%
Connector Punctuation 200
 
14.3%
Uppercase Letter 100
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 465
42.3%
1 181
 
16.5%
2 167
 
15.2%
8 74
 
6.7%
3 61
 
5.5%
4 42
 
3.8%
7 41
 
3.7%
6 30
 
2.7%
5 28
 
2.5%
9 11
 
1.0%
Connector Punctuation
ValueCountFrequency (%)
_ 200
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 100
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1300
92.9%
Latin 100
 
7.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 465
35.8%
_ 200
15.4%
1 181
 
13.9%
2 167
 
12.8%
8 74
 
5.7%
3 61
 
4.7%
4 42
 
3.2%
7 41
 
3.2%
6 30
 
2.3%
5 28
 
2.2%
Latin
ValueCountFrequency (%)
B 100
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 465
33.2%
_ 200
14.3%
1 181
 
12.9%
2 167
 
11.9%
B 100
 
7.1%
8 74
 
5.3%
3 61
 
4.4%
4 42
 
3.0%
7 41
 
2.9%
6 30
 
2.1%
Other values (2) 39
 
2.8%

발송일
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2020-01-09
98 
2019-12-26
 
1
2019-12-30
 
1

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row2020-01-09
2nd row2020-01-09
3rd row2020-01-09
4th row2020-01-09
5th row2020-01-09

Common Values

ValueCountFrequency (%)
2020-01-09 98
98.0%
2019-12-26 1
 
1.0%
2019-12-30 1
 
1.0%

Length

2023-12-12T09:30:23.717472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:30:23.862990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-01-09 98
98.0%
2019-12-26 1
 
1.0%
2019-12-30 1
 
1.0%

최고서종류
Real number (ℝ)

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.18
Minimum12
Maximum27
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-12T09:30:23.995104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum12
5-th percentile13
Q122
median22
Q323
95-th percentile23.05
Maximum27
Range15
Interquartile range (IQR)1

Descriptive statistics

Standard deviation3.4855546
Coefficient of variation (CV)0.16456821
Kurtosis1.4968898
Mean21.18
Median Absolute Deviation (MAD)1
Skewness-1.6653702
Sum2118
Variance12.149091
MonotonicityNot monotonic
2023-12-12T09:30:24.114048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
22 41
41.0%
23 38
38.0%
13 9
 
9.0%
14 4
 
4.0%
24 3
 
3.0%
27 2
 
2.0%
15 1
 
1.0%
12 1
 
1.0%
16 1
 
1.0%
ValueCountFrequency (%)
12 1
 
1.0%
13 9
 
9.0%
14 4
 
4.0%
15 1
 
1.0%
16 1
 
1.0%
22 41
41.0%
23 38
38.0%
24 3
 
3.0%
27 2
 
2.0%
ValueCountFrequency (%)
27 2
 
2.0%
24 3
 
3.0%
23 38
38.0%
22 41
41.0%
16 1
 
1.0%
15 1
 
1.0%
14 4
 
4.0%
13 9
 
9.0%
12 1
 
1.0%

최고일
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2020-02-19
37 
2020-01-29
33 
2020-01-23
12 
2020-01-09
10 
2020-02-10
 
3
Other values (4)

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique3 ?
Unique (%)3.0%

Sample

1st row2020-01-23
2nd row2020-01-23
3rd row2020-01-19
4th row2020-01-09
5th row2020-01-09

Common Values

ValueCountFrequency (%)
2020-02-19 37
37.0%
2020-01-29 33
33.0%
2020-01-23 12
 
12.0%
2020-01-09 10
 
10.0%
2020-02-10 3
 
3.0%
2020-01-19 2
 
2.0%
2020-01-16 1
 
1.0%
2020-02-02 1
 
1.0%
2020-01-30 1
 
1.0%

Length

2023-12-12T09:30:24.246329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:30:24.385111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-02-19 37
37.0%
2020-01-29 33
33.0%
2020-01-23 12
 
12.0%
2020-01-09 10
 
10.0%
2020-02-10 3
 
3.0%
2020-01-19 2
 
2.0%
2020-01-16 1
 
1.0%
2020-02-02 1
 
1.0%
2020-01-30 1
 
1.0%

발송부점
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct7
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
AAZ
83 
THB
 
6
TLB
 
4
TMB
 
4
TPA
 
1
Other values (2)
 
2

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique3 ?
Unique (%)3.0%

Sample

1st rowTPA
2nd rowTLB
3rd rowTRA
4th rowAAZ
5th rowAAZ

Common Values

ValueCountFrequency (%)
AAZ 83
83.0%
THB 6
 
6.0%
TLB 4
 
4.0%
TMB 4
 
4.0%
TPA 1
 
1.0%
TRA 1
 
1.0%
TQA 1
 
1.0%

Length

2023-12-12T09:30:24.516996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:30:24.628521image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
aaz 83
83.0%
thb 6
 
6.0%
tlb 4
 
4.0%
tmb 4
 
4.0%
tpa 1
 
1.0%
tra 1
 
1.0%
tqa 1
 
1.0%

등록자사번
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct9
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
aaz01
73 
8889
1601
 
6
1854
 
4
1598
 
3
Other values (4)
 
5

Length

Max length5
Median length5
Mean length4.73
Min length4

Unique

Unique3 ?
Unique (%)3.0%

Sample

1st row1913
2nd row1878
3rd row1604
4th row8889
5th row8889

Common Values

ValueCountFrequency (%)
aaz01 73
73.0%
8889 9
 
9.0%
1601 6
 
6.0%
1854 4
 
4.0%
1598 3
 
3.0%
1679 2
 
2.0%
1913 1
 
1.0%
1878 1
 
1.0%
1604 1
 
1.0%

Length

2023-12-12T09:30:24.784096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:30:24.920479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
aaz01 73
73.0%
8889 9
 
9.0%
1601 6
 
6.0%
1854 4
 
4.0%
1598 3
 
3.0%
1679 2
 
2.0%
1913 1
 
1.0%
1878 1
 
1.0%
1604 1
 
1.0%

Interactions

2023-12-12T09:30:21.868240image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T09:30:25.014778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
유동화계획코드보유목적코드발송일최고서종류최고일발송부점등록자사번
유동화계획코드1.0001.0001.0000.8840.9370.9960.979
보유목적코드1.0001.0001.0000.7960.9271.0000.979
발송일1.0001.0001.0000.4520.9400.7530.922
최고서종류0.8840.7960.4521.0000.8560.7850.894
최고일0.9370.9270.9400.8561.0000.7320.936
발송부점0.9961.0000.7530.7850.7321.0000.976
등록자사번0.9790.9790.9220.8940.9360.9761.000
2023-12-12T09:30:25.115076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록자사번발송일발송부점최고일
등록자사번1.0000.6540.9430.594
발송일0.6541.0000.6700.689
발송부점0.9430.6701.0000.498
최고일0.5940.6890.4981.000
2023-12-12T09:30:25.205414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
최고서종류발송일최고일발송부점등록자사번
최고서종류1.0000.2050.6700.5760.618
발송일0.2051.0000.6890.6700.654
최고일0.6700.6891.0000.4980.594
발송부점0.5760.6700.4981.0000.943
등록자사번0.6180.6540.5940.9431.000

Missing values

2023-12-12T09:30:22.011191image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:30:22.159765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

유동화계획코드보유목적코드발송일최고서종류최고일발송부점등록자사번
0KHFCMB2017S_14B088_2017_00782020-01-09132020-01-23TPA1913
1KHFCMB2013S_22B088_2013_00192020-01-09152020-01-23TLB1878
2KHFCMB2017S_02B020_2017_00102020-01-09132020-01-19TRA1604
3KHFCMB2017S_13B081_2017_00632020-01-09222020-01-09AAZ8889
4KHFCMB2017S_04B003_2017_00242020-01-09222020-01-09AAZ8889
5KHFCMB2016S_03B081_2016_00102019-12-26132020-01-09TQA1679
6KHFCMB2017S_25B020_2017_01502019-12-30122020-01-16AAZ1679
7KHFCMB2015S_23B081_2015_00922020-01-09142020-01-23TMB1854
8KHFCMB2015S_23B081_2015_00922020-01-09142020-01-23TMB1854
9KHFCMB2014S_19B020_2014_00492020-01-09232020-01-09AAZ8889
유동화계획코드보유목적코드발송일최고서종류최고일발송부점등록자사번
90KHFCMB2015S_17B020_2015_00542020-01-09222020-02-19AAZaaz01
91KHFCMB2013S_27B020_2013_00142020-01-09222020-02-19AAZaaz01
92KHFCMB2019S_17B020_2019_00702020-01-09222020-02-19AAZaaz01
93KHFCMB2018S_13B020_2018_00652020-01-09222020-02-19AAZaaz01
94KHFCMB2014S_18B004_2014_00302020-01-09222020-02-19AAZaaz01
95KHFCMB2015S_21B010_2015_00742020-01-09222020-02-19AAZaaz01
96KHFCMB2015S_25B010_2015_01062020-01-09222020-02-19AAZaaz01
97KHFCMB2012S_31B088_2012_00262020-01-09222020-02-19AAZaaz01
98KHFCMB2012S_03B088_2012_00072020-01-09222020-02-19AAZaaz01
99KHFCMB2011S_21B004_2011_00212020-01-09222020-02-19AAZaaz01

Duplicate rows

Most frequently occurring

유동화계획코드보유목적코드발송일최고서종류최고일발송부점등록자사번# duplicates
0KHFCMB2012S_17B003_2012_00352020-01-09232020-01-29AAZaaz013
1KHFCMB2012S_20B004_2012_00272020-01-09232020-01-29AAZaaz012
2KHFCMB2012S_31B081_2012_00432020-01-09232020-01-29AAZaaz012
3KHFCMB2015S_22B081_2015_00882020-01-09142020-01-23TMB18542
4KHFCMB2015S_23B081_2015_00922020-01-09142020-01-23TMB18542
5KHFCMB2017S_05B003_2017_00302020-01-09132020-01-23TLB15982
6KHFCMB2017S_17B010_2017_00712020-01-09222020-02-19AAZaaz012
7KHFCMB2018S_12B088_2018_00372020-01-09132020-01-23THB16012