Overview

Dataset statistics

Number of variables7
Number of observations1000
Missing cells133
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory57.7 KiB
Average record size in memory59.1 B

Variable types

Text1
Boolean1
Numeric3
Categorical2

Dataset

Description한국주택금융공사 채권관리부 업무 관련 공개 데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터) 보증번호,본부수관여부,주채무자고객번호,변경사번,변경부점코드,등록사번,등록부점코드가 포함된 데이터 입니다.
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072970/fileData.do

Alerts

등록부점코드 is highly overall correlated with 등록사번 and 1 other fieldsHigh correlation
변경부점코드 is highly overall correlated with 등록사번 and 1 other fieldsHigh correlation
주채무자고객번호 is highly overall correlated with 본부수관여부High correlation
변경사번 is highly overall correlated with 등록사번High correlation
등록사번 is highly overall correlated with 변경사번 and 2 other fieldsHigh correlation
본부수관여부 is highly overall correlated with 주채무자고객번호High correlation
본부수관여부 is highly imbalanced (89.0%)Imbalance
본부수관여부 has 42 (4.2%) missing valuesMissing
변경사번 has 36 (3.6%) missing valuesMissing
등록사번 has 55 (5.5%) missing valuesMissing
보증번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 23:37:04.808224
Analysis finished2023-12-12 23:37:06.556225
Duration1.75 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

보증번호
Text

UNIQUE 

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-13T08:37:06.701477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters13000
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000 ?
Unique (%)100.0%

Sample

1st rowTQH2000000140
2nd rowTAA2014130650
3rd rowTAC2018087561
4th rowTAC2014068194
5th rowTAC2014055963
ValueCountFrequency (%)
tqh2000000140 1
 
0.1%
tla2017007324 1
 
0.1%
taa2011145898 1
 
0.1%
qad2003111447 1
 
0.1%
tpa2016004380 1
 
0.1%
tpa2016003042 1
 
0.1%
taa2015098181 1
 
0.1%
taa2017054958 1
 
0.1%
tac2012017816 1
 
0.1%
tha2011074772 1
 
0.1%
Other values (990) 990
99.0%
2023-12-13T08:37:07.078792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 2747
21.1%
2 1683
12.9%
1 1563
12.0%
A 994
 
7.6%
T 850
 
6.5%
8 643
 
4.9%
6 632
 
4.9%
3 597
 
4.6%
4 591
 
4.5%
7 552
 
4.2%
Other values (15) 2148
16.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10000
76.9%
Uppercase Letter 3000
 
23.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 994
33.1%
T 850
28.3%
Q 213
 
7.1%
B 200
 
6.7%
D 195
 
6.5%
H 186
 
6.2%
O 109
 
3.6%
P 83
 
2.8%
C 65
 
2.2%
L 34
 
1.1%
Other values (5) 71
 
2.4%
Decimal Number
ValueCountFrequency (%)
0 2747
27.5%
2 1683
16.8%
1 1563
15.6%
8 643
 
6.4%
6 632
 
6.3%
3 597
 
6.0%
4 591
 
5.9%
7 552
 
5.5%
5 515
 
5.1%
9 477
 
4.8%

Most occurring scripts

ValueCountFrequency (%)
Common 10000
76.9%
Latin 3000
 
23.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 994
33.1%
T 850
28.3%
Q 213
 
7.1%
B 200
 
6.7%
D 195
 
6.5%
H 186
 
6.2%
O 109
 
3.6%
P 83
 
2.8%
C 65
 
2.2%
L 34
 
1.1%
Other values (5) 71
 
2.4%
Common
ValueCountFrequency (%)
0 2747
27.5%
2 1683
16.8%
1 1563
15.6%
8 643
 
6.4%
6 632
 
6.3%
3 597
 
6.0%
4 591
 
5.9%
7 552
 
5.5%
5 515
 
5.1%
9 477
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 2747
21.1%
2 1683
12.9%
1 1563
12.0%
A 994
 
7.6%
T 850
 
6.5%
8 643
 
4.9%
6 632
 
4.9%
3 597
 
4.6%
4 591
 
4.5%
7 552
 
4.2%
Other values (15) 2148
16.5%

본부수관여부
Boolean

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct2
Distinct (%)0.2%
Missing42
Missing (%)4.2%
Memory size2.1 KiB
False
944 
True
 
14
(Missing)
 
42
ValueCountFrequency (%)
False 944
94.4%
True 14
 
1.4%
(Missing) 42
 
4.2%
2023-12-13T08:37:07.198547image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

주채무자고객번호
Real number (ℝ)

HIGH CORRELATION 

Distinct974
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88416508
Minimum2510268
Maximum1.3988455 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:37:07.314168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2510268
5-th percentile31414687
Q171068820
median95491068
Q31.1451872 × 108
95-th percentile1.2480046 × 108
Maximum1.3988455 × 108
Range1.3737428 × 108
Interquartile range (IQR)43449902

Descriptive statistics

Standard deviation30845874
Coefficient of variation (CV)0.34887007
Kurtosis-0.51301306
Mean88416508
Median Absolute Deviation (MAD)20727141
Skewness-0.70103416
Sum8.8416508 × 1010
Variance9.5146792 × 1014
MonotonicityNot monotonic
2023-12-13T08:37:07.464724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
42841285 3
 
0.3%
97772337 2
 
0.2%
110741025 2
 
0.2%
75201697 2
 
0.2%
85816289 2
 
0.2%
123320734 2
 
0.2%
97692462 2
 
0.2%
108367194 2
 
0.2%
95441563 2
 
0.2%
78272555 2
 
0.2%
Other values (964) 979
97.9%
ValueCountFrequency (%)
2510268 1
0.1%
6246347 1
0.1%
7850583 1
0.1%
9368743 1
0.1%
11338936 1
0.1%
12493634 1
0.1%
13136631 1
0.1%
14712588 1
0.1%
14929656 1
0.1%
15226565 1
0.1%
ValueCountFrequency (%)
139884549 1
0.1%
138104880 1
0.1%
137655932 1
0.1%
136387201 1
0.1%
136202904 1
0.1%
136104060 1
0.1%
131765404 1
0.1%
130954009 1
0.1%
130928776 1
0.1%
130715235 1
0.1%

변경사번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct104
Distinct (%)10.8%
Missing36
Missing (%)3.6%
Infinite0
Infinite (%)0.0%
Mean2229.5612
Minimum1159
Maximum61794
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:37:07.631575image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1159
5-th percentile1487
Q11605
median1690
Q31872
95-th percentile2000.95
Maximum61794
Range60635
Interquartile range (IQR)267

Descriptive statistics

Standard deviation4277.34
Coefficient of variation (CV)1.9184672
Kurtosis144.88309
Mean2229.5612
Median Absolute Deviation (MAD)146
Skewness11.844126
Sum2149297
Variance18295637
MonotonicityNot monotonic
2023-12-13T08:37:07.784947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1690 98
 
9.8%
1487 74
 
7.4%
1605 73
 
7.3%
1696 54
 
5.4%
1590 34
 
3.4%
1592 32
 
3.2%
1842 29
 
2.9%
1867 26
 
2.6%
1872 26
 
2.6%
1883 25
 
2.5%
Other values (94) 493
49.3%
(Missing) 36
 
3.6%
ValueCountFrequency (%)
1159 1
 
0.1%
1179 1
 
0.1%
1248 1
 
0.1%
1253 2
 
0.2%
1360 1
 
0.1%
1406 15
1.5%
1438 1
 
0.1%
1455 1
 
0.1%
1459 1
 
0.1%
1461 2
 
0.2%
ValueCountFrequency (%)
61794 1
 
0.1%
53655 1
 
0.1%
53569 2
0.2%
53567 1
 
0.1%
53566 1
 
0.1%
6201 1
 
0.1%
6153 1
 
0.1%
6151 3
0.3%
6070 2
0.2%
6066 3
0.3%

변경부점코드
Categorical

HIGH CORRELATION 

Distinct28
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
QAD
118 
TAA
115 
TAD
106 
TAC
67 
THA
60 
Other values (23)
534 

Length

Max length4
Median length3
Mean length3.036
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowACS
2nd rowTAA
3rd rowTAC
4th rowTAC
5th rowTAC

Common Values

ValueCountFrequency (%)
QAD 118
11.8%
TAA 115
11.5%
TAD 106
 
10.6%
TAC 67
 
6.7%
THA 60
 
6.0%
TAB 58
 
5.8%
TPA 56
 
5.6%
THO 50
 
5.0%
TQA 45
 
4.5%
TBA 39
 
3.9%
Other values (18) 286
28.6%

Length

2023-12-13T08:37:07.953419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
qad 118
11.8%
taa 115
11.5%
tad 106
 
10.6%
tac 67
 
6.7%
tha 60
 
6.0%
tab 58
 
5.8%
tpa 56
 
5.6%
tho 50
 
5.0%
tqa 45
 
4.5%
tba 39
 
3.9%
Other values (18) 286
28.6%

등록사번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct114
Distinct (%)12.1%
Missing55
Missing (%)5.5%
Infinite0
Infinite (%)0.0%
Mean1676.9365
Minimum1020
Maximum2002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:37:08.086846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1020
5-th percentile1365.2
Q11590
median1690
Q31843
95-th percentile1937
Maximum2002
Range982
Interquartile range (IQR)253

Descriptive statistics

Standard deviation183.70802
Coefficient of variation (CV)0.10954978
Kurtosis0.41946445
Mean1676.9365
Median Absolute Deviation (MAD)133
Skewness-0.49830097
Sum1584705
Variance33748.638
MonotonicityNot monotonic
2023-12-13T08:37:08.249484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1690 105
 
10.5%
1605 84
 
8.4%
1487 74
 
7.4%
1696 57
 
5.7%
1592 33
 
3.3%
1590 33
 
3.3%
1867 29
 
2.9%
1872 27
 
2.7%
1842 27
 
2.7%
1883 24
 
2.4%
Other values (104) 452
45.2%
(Missing) 55
 
5.5%
ValueCountFrequency (%)
1020 1
0.1%
1088 1
0.1%
1096 1
0.1%
1098 1
0.1%
1129 1
0.1%
1130 1
0.1%
1133 1
0.1%
1137 1
0.1%
1141 1
0.1%
1145 1
0.1%
ValueCountFrequency (%)
2002 1
 
0.1%
1995 7
 
0.7%
1987 5
 
0.5%
1980 4
 
0.4%
1978 21
2.1%
1973 5
 
0.5%
1958 4
 
0.4%
1937 10
1.0%
1934 10
1.0%
1929 1
 
0.1%

등록부점코드
Categorical

HIGH CORRELATION 

Distinct25
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
TAA
129 
QAD
129 
TAD
105 
THO
76 
TAC
66 
Other values (20)
495 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowTBA
2nd rowTAA
3rd rowTAC
4th rowTAC
5th rowTAC

Common Values

ValueCountFrequency (%)
TAA 129
12.9%
QAD 129
12.9%
TAD 105
10.5%
THO 76
 
7.6%
TAC 66
 
6.6%
THA 66
 
6.6%
TAB 58
 
5.8%
TPA 57
 
5.7%
TBA 50
 
5.0%
TQA 47
 
4.7%
Other values (15) 217
21.7%

Length

2023-12-13T08:37:08.411197image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
taa 129
12.9%
qad 129
12.9%
tad 105
10.5%
tho 76
 
7.6%
tac 66
 
6.6%
tha 66
 
6.6%
tab 58
 
5.8%
tpa 57
 
5.7%
tba 50
 
5.0%
tqa 47
 
4.7%
Other values (15) 217
21.7%

Interactions

2023-12-13T08:37:05.758850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.138883image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.418214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.876250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.238750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.515866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:06.024346image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.330620image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:37:05.612067image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:37:08.515689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부수관여부주채무자고객번호변경사번변경부점코드등록사번등록부점코드
본부수관여부1.0000.7460.0000.3910.1840.183
주채무자고객번호0.7461.0000.1380.3920.4460.289
변경사번0.0000.1381.0000.3560.2780.000
변경부점코드0.3910.3920.3561.0000.8720.998
등록사번0.1840.4460.2780.8721.0000.855
등록부점코드0.1830.2890.0000.9980.8551.000
2023-12-13T08:37:08.641676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
본부수관여부등록부점코드변경부점코드
본부수관여부1.0000.1560.307
등록부점코드0.1561.0000.958
변경부점코드0.3070.9581.000
2023-12-13T08:37:08.733298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주채무자고객번호변경사번등록사번본부수관여부변경부점코드등록부점코드
주채무자고객번호1.0000.0690.2500.5830.1500.105
변경사번0.0691.0000.7520.0000.1700.000
등록사번0.2500.7521.0000.1410.5460.504
본부수관여부0.5830.0000.1411.0000.3070.156
변경부점코드0.1500.1700.5460.3071.0000.958
등록부점코드0.1050.0000.5040.1560.9581.000

Missing values

2023-12-13T08:37:06.190636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:37:06.352749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:37:06.492097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

보증번호본부수관여부주채무자고객번호변경사번변경부점코드등록사번등록부점코드
0TQH2000000140Y223916451890ACS<NA>TBA
1TAA2014130650N997491191883TAA1605TAA
2TAC2018087561N1240586101590TAC1590TAC
3TAC2014068194N975544741590TAC1590TAC
4TAC2014055963N975544741590TAC1590TAC
5TAC2016051061N1104022831590TAC1590TAC
6TLB2018010688N1215450901476TLB1476TLB
7THO2017055226N1194314621614THO1614THO
8TOA2017021994N1184213181934TOA1934TOA
9TJA2015000856N1007855381530TJB1530TJB
보증번호본부수관여부주채무자고객번호변경사번변경부점코드등록사번등록부점코드
990TPA2012017978N875087731521TPB1521TPB
991TLB2016003060N878259486054TLB1476TLB
992TLB2016011220N1104345231926TLB1926TLB
993QAD2011041447N824355531686QAD1686QAD
994TOA2012023195N898203811934TOA1520TOA
995TQA2018019079N704116571679TQA1679TQA
996TPA2015045490N516110081668TPA1668TPA
997TMA2019036614N617870026151TMA1606TMA
998TMA2013008997N906414651606TMA1606TMA
999TRA2019006076N495638541726TRA1726TRA