Overview

Dataset statistics

Number of variables8
Number of observations1000
Missing cells847
Missing cells (%)10.6%
Duplicate rows6
Duplicate rows (%)0.6%
Total size in memory65.6 KiB
Average record size in memory67.1 B

Variable types

Numeric3
DateTime3
Categorical1
Text1

Dataset

Description한국주택금융공사 채권관리부 업무 관련 공개 데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072967/fileData.do

Alerts

Dataset has 6 (0.6%) duplicate rowsDuplicates
채무관계자고객번호 is highly overall correlated with 주채무자고객번호High correlation
주채무자고객번호 is highly overall correlated with 채무관계자고객번호High correlation
불량규제해제취급사번 has 632 (63.2%) missing valuesMissing
등록사번 has 215 (21.5%) missing valuesMissing

Reproduction

Analysis started2023-12-12 23:48:59.190136
Analysis finished2023-12-12 23:49:00.379541
Duration1.19 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

채무관계자고객번호
Real number (ℝ)

HIGH CORRELATION 

Distinct784
Distinct (%)78.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88939476
Minimum4937933
Maximum1.4527953 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:49:00.438174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4937933
5-th percentile23733654
Q169086806
median94001878
Q31.1573302 × 108
95-th percentile1.299901 × 108
Maximum1.4527953 × 108
Range1.403416 × 108
Interquartile range (IQR)46646209

Descriptive statistics

Standard deviation33339845
Coefficient of variation (CV)0.37485993
Kurtosis-0.5308824
Mean88939476
Median Absolute Deviation (MAD)22523322
Skewness-0.57266867
Sum8.8939476 × 1010
Variance1.1115453 × 1015
MonotonicityNot monotonic
2023-12-13T08:49:00.555339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
123982693 7
 
0.7%
126690283 4
 
0.4%
106203645 4
 
0.4%
106203661 4
 
0.4%
100916019 4
 
0.4%
73365179 3
 
0.3%
97225172 3
 
0.3%
75249701 3
 
0.3%
98565226 3
 
0.3%
24901169 3
 
0.3%
Other values (774) 962
96.2%
ValueCountFrequency (%)
4937933 1
0.1%
5941012 1
0.1%
8029245 1
0.1%
11367538 1
0.1%
11436692 1
0.1%
11483904 1
0.1%
12435456 1
0.1%
13136631 2
0.2%
13496232 1
0.1%
13668536 2
0.2%
ValueCountFrequency (%)
145279533 1
0.1%
145279371 1
0.1%
145279216 1
0.1%
145278301 1
0.1%
145278259 1
0.1%
145271023 1
0.1%
145191352 1
0.1%
145172083 1
0.1%
145172070 1
0.1%
145171929 1
0.1%

주채무자고객번호
Real number (ℝ)

HIGH CORRELATION 

Distinct702
Distinct (%)70.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85956824
Minimum4937933
Maximum1.4156242 × 108
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:49:00.663143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4937933
5-th percentile22872939
Q164877811
median90556123
Q31.1343893 × 108
95-th percentile1.2849781 × 108
Maximum1.4156242 × 108
Range1.3662449 × 108
Interquartile range (IQR)48561116

Descriptive statistics

Standard deviation33387754
Coefficient of variation (CV)0.3884247
Kurtosis-0.67951675
Mean85956824
Median Absolute Deviation (MAD)23824940
Skewness-0.55531088
Sum8.5956824 × 1010
Variance1.1147421 × 1015
MonotonicityNot monotonic
2023-12-13T08:49:00.772759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
106203645 8
 
0.8%
73889192 7
 
0.7%
123982693 7
 
0.7%
35412045 6
 
0.6%
28875110 6
 
0.6%
88198564 6
 
0.6%
56650597 4
 
0.4%
100916019 4
 
0.4%
82385179 4
 
0.4%
86024300 4
 
0.4%
Other values (692) 944
94.4%
ValueCountFrequency (%)
4937933 1
 
0.1%
5941012 2
0.2%
8016960 1
 
0.1%
8029245 1
 
0.1%
9828720 3
0.3%
11367538 1
 
0.1%
11436692 1
 
0.1%
11483904 1
 
0.1%
12435456 1
 
0.1%
13136631 2
0.2%
ValueCountFrequency (%)
141562419 1
0.1%
141274545 1
0.1%
140734963 1
0.1%
139188164 1
0.1%
138977512 1
0.1%
138447013 1
0.1%
138322594 1
0.1%
137636728 1
0.1%
137468983 1
0.1%
136345735 1
0.1%
Distinct491
Distinct (%)49.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2020-10-16 15:22:00
Maximum2020-10-26 10:10:00
2023-12-13T08:49:00.890150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:49:00.998647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct318
Distinct (%)31.8%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2000-12-13 00:00:00
Maximum2020-10-26 00:00:00
2023-12-13T08:49:01.101945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:49:01.212818image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

처리일자
Categorical

Distinct7
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2020-10-23
245 
2020-10-19
202 
2020-10-22
194 
2020-10-20
176 
2020-10-21
126 
Other values (2)
57 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-10-26
2nd row2020-10-26
3rd row2020-10-26
4th row2020-10-26
5th row2020-10-26

Common Values

ValueCountFrequency (%)
2020-10-23 245
24.5%
2020-10-19 202
20.2%
2020-10-22 194
19.4%
2020-10-20 176
17.6%
2020-10-21 126
12.6%
2020-10-16 35
 
3.5%
2020-10-26 22
 
2.2%

Length

2023-12-13T08:49:01.327068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:49:01.416483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-10-23 245
24.5%
2020-10-19 202
20.2%
2020-10-22 194
19.4%
2020-10-20 176
17.6%
2020-10-21 126
12.6%
2020-10-16 35
 
3.5%
2020-10-26 22
 
2.2%

불량규제해제취급사번
Real number (ℝ)

MISSING 

Distinct75
Distinct (%)20.4%
Missing632
Missing (%)63.2%
Infinite0
Infinite (%)0.0%
Mean1394.5652
Minimum1032
Maximum2002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-13T08:49:01.529182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1032
5-th percentile1121
Q11166
median1221
Q31593
95-th percentile1916.45
Maximum2002
Range970
Interquartile range (IQR)427

Descriptive statistics

Standard deviation273.11993
Coefficient of variation (CV)0.19584594
Kurtosis-0.79082615
Mean1394.5652
Median Absolute Deviation (MAD)103
Skewness0.75843971
Sum513200
Variance74594.497
MonotonicityNot monotonic
2023-12-13T08:49:01.655648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1121 25
 
2.5%
1221 24
 
2.4%
1160 20
 
2.0%
1185 16
 
1.6%
1186 14
 
1.4%
1593 14
 
1.4%
1535 11
 
1.1%
1198 11
 
1.1%
1339 11
 
1.1%
1166 10
 
1.0%
Other values (65) 212
 
21.2%
(Missing) 632
63.2%
ValueCountFrequency (%)
1032 1
 
0.1%
1037 1
 
0.1%
1086 2
 
0.2%
1118 5
 
0.5%
1121 25
2.5%
1136 1
 
0.1%
1138 3
 
0.3%
1141 8
 
0.8%
1142 10
 
1.0%
1149 5
 
0.5%
ValueCountFrequency (%)
2002 1
 
0.1%
1985 3
0.3%
1978 1
 
0.1%
1973 2
 
0.2%
1935 7
0.7%
1934 2
 
0.2%
1926 1
 
0.1%
1921 2
 
0.2%
1908 3
0.3%
1905 1
 
0.1%
Distinct491
Distinct (%)49.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2020-10-16 15:22:00
Maximum2020-10-26 10:10:00
2023-12-13T08:49:01.769209image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:49:01.876888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

등록사번
Text

MISSING 

Distinct63
Distinct (%)8.0%
Missing215
Missing (%)21.5%
Memory size7.9 KiB
2023-12-13T08:49:02.055833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length4.5808917
Min length4

Characters and Unicode

Total characters3596
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.3%

Sample

1st row99011
2nd row99011
3rd row99011
4th row99011
5th row99021
ValueCountFrequency (%)
99020 136
17.3%
99011 69
 
8.8%
99010 46
 
5.9%
1690 41
 
5.2%
1605 32
 
4.1%
99019 31
 
3.9%
99006 28
 
3.6%
99004 28
 
3.6%
99088 28
 
3.6%
99026 27
 
3.4%
Other values (53) 319
40.6%
2023-12-13T08:49:02.386553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9 1070
29.8%
0 831
23.1%
1 602
16.7%
6 289
 
8.0%
2 225
 
6.3%
8 208
 
5.8%
4 107
 
3.0%
5 86
 
2.4%
7 85
 
2.4%
3 67
 
1.9%
Other values (2) 26
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3570
99.3%
Uppercase Letter 26
 
0.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9 1070
30.0%
0 831
23.3%
1 602
16.9%
6 289
 
8.1%
2 225
 
6.3%
8 208
 
5.8%
4 107
 
3.0%
5 86
 
2.4%
7 85
 
2.4%
3 67
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
H 13
50.0%
F 13
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3570
99.3%
Latin 26
 
0.7%

Most frequent character per script

Common
ValueCountFrequency (%)
9 1070
30.0%
0 831
23.3%
1 602
16.9%
6 289
 
8.1%
2 225
 
6.3%
8 208
 
5.8%
4 107
 
3.0%
5 86
 
2.4%
7 85
 
2.4%
3 67
 
1.9%
Latin
ValueCountFrequency (%)
H 13
50.0%
F 13
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3596
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9 1070
29.8%
0 831
23.1%
1 602
16.7%
6 289
 
8.0%
2 225
 
6.3%
8 208
 
5.8%
4 107
 
3.0%
5 86
 
2.4%
7 85
 
2.4%
3 67
 
1.9%
Other values (2) 26
 
0.7%

Interactions

2023-12-13T08:48:59.851035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.385913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.618867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.947449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.467996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.699939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:49:00.039085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.549757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:48:59.781145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:49:02.482431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
채무관계자고객번호주채무자고객번호처리일자불량규제해제취급사번등록사번
채무관계자고객번호1.0000.9830.1730.4360.680
주채무자고객번호0.9831.0000.1580.5440.659
처리일자0.1730.1581.0000.4230.713
불량규제해제취급사번0.4360.5440.4231.0001.000
등록사번0.6800.6590.7131.0001.000
2023-12-13T08:49:02.932016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
채무관계자고객번호주채무자고객번호불량규제해제취급사번처리일자
채무관계자고객번호1.0000.909-0.3730.088
주채무자고객번호0.9091.000-0.4880.080
불량규제해제취급사번-0.373-0.4881.0000.228
처리일자0.0880.0800.2281.000

Missing values

2023-12-13T08:49:00.130925image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:49:00.255538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-13T08:49:00.341626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

채무관계자고객번호주채무자고객번호갱신일자불량규제일자처리일자불량규제해제취급사번등록일시등록사번
01127823961127823962020-10-26 10:102020-10-262020-10-26<NA>2020-10-26 10:1099011
180905593809055932020-10-26 10:082015-02-052020-10-2614872020-10-26 10:08<NA>
278338868783388682020-10-26 10:042012-10-232020-10-2614872020-10-26 10:04<NA>
380905263809052632020-10-26 10:032014-07-222020-10-2614872020-10-26 10:03<NA>
41000459061000459062020-10-26 09:522016-01-082020-10-2613602020-10-26 09:52<NA>
583714260837142602020-10-26 09:492020-06-232020-10-26<NA>2020-10-26 09:4999011
61018172721018172722020-10-26 09:482020-10-222020-10-26<NA>2020-10-26 09:4899011
71154972771154972772020-10-26 09:472020-10-222020-10-26<NA>2020-10-26 09:4799011
890790989907909892020-10-26 08:412020-10-262020-10-26<NA>2020-10-26 08:4199021
999739158997391582020-10-26 08:402016-11-232020-10-26<NA>2020-10-26 08:4099020
채무관계자고객번호주채무자고객번호갱신일자불량규제일자처리일자불량규제해제취급사번등록일시등록사번
9901133505231133505232020-10-16 16:002020-10-132020-10-16<NA>2020-10-16 16:00990HF
99199491564994915642020-10-16 15:512020-10-162020-10-16<NA>2020-10-16 15:5199026
99291982822919828222020-10-16 15:492019-12-232020-10-16<NA>2020-10-16 15:4999020
9931134986341237047032020-10-16 15:482020-09-232020-10-16<NA>2020-10-16 15:4899004
9941237047031237047032020-10-16 15:482020-09-232020-10-16<NA>2020-10-16 15:4899004
99539192743391927432020-10-16 15:422005-06-242020-10-1616482020-10-16 15:42<NA>
99681165125391927432020-10-16 15:422011-01-112020-10-1616482020-10-16 15:42<NA>
99781165109391927432020-10-16 15:422011-01-112020-10-1616482020-10-16 15:42<NA>
99878137580781375802020-10-16 15:412019-11-122020-10-16<NA>2020-10-16 15:4199003
99998462002984620022020-10-16 15:222020-10-162020-10-16<NA>2020-10-16 15:2299003

Duplicate rows

Most frequently occurring

채무관계자고객번호주채무자고객번호갱신일자불량규제일자처리일자불량규제해제취급사번등록일시등록사번# duplicates
013668536136685362020-10-21 14:452020-04-202020-10-2111952020-10-21 14:45<NA>2
169129934691299342020-10-23 09:432018-08-132020-10-23<NA>2020-10-23 09:43990112
273365179733651792020-10-22 14:062020-10-222020-10-22<NA>2020-10-22 14:0616902
378243940782439402020-10-20 13:272013-10-072020-10-2015182020-10-20 13:27<NA>2
478988234789882342020-10-19 14:102020-10-052020-10-19<NA>2020-10-19 14:10990212
51083671941083671942020-10-19 13:202020-10-192020-10-19<NA>2020-10-19 13:2018832