Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory42.1 KiB
Average record size in memory43.1 B

Variable types

Text1
Categorical1
Numeric2
Boolean1

Dataset

Description한국주택금융공사 주택연금부 업무 관련 공개 공공데이터 보증번호 신용정보 여부가 포함되어있습니다. (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072830/fileData.do

Alerts

신용정보여부 is highly imbalanced (60.5%)Imbalance

Reproduction

Analysis started2023-12-12 00:53:28.709180
Analysis finished2023-12-12 00:53:29.910238
Duration1.2 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct706
Distinct (%)70.6%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-12T09:53:30.155927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters14000
Distinct characters23
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique412 ?
Unique (%)41.2%

Sample

1st rowRTHA2011000308
2nd rowRTHA2011000308
3rd rowRQAD2011000516
4th rowRQAD2011000516
5th rowRQAD2011000515
ValueCountFrequency (%)
rtha2011000308 2
 
0.2%
rtab2011000329 2
 
0.2%
rtna2011000038 2
 
0.2%
rtba2011000076 2
 
0.2%
rqad2011000382 2
 
0.2%
rqad2011000459 2
 
0.2%
rtaa2011000347 2
 
0.2%
rtab2011000324 2
 
0.2%
rtaa2011000290 2
 
0.2%
rtab2011000245 2
 
0.2%
Other values (696) 980
98.0%
2023-12-12T09:53:30.612019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4382
31.3%
1 2461
17.6%
2 1376
 
9.8%
A 1019
 
7.3%
R 1002
 
7.2%
T 806
 
5.8%
3 390
 
2.8%
4 334
 
2.4%
H 265
 
1.9%
B 262
 
1.9%
Other values (13) 1703
 
12.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 10000
71.4%
Uppercase Letter 4000
 
28.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1019
25.5%
R 1002
25.1%
T 806
20.2%
H 265
 
6.6%
B 262
 
6.6%
Q 211
 
5.3%
D 194
 
4.9%
O 109
 
2.7%
P 45
 
1.1%
M 38
 
0.9%
Other values (3) 49
 
1.2%
Decimal Number
ValueCountFrequency (%)
0 4382
43.8%
1 2461
24.6%
2 1376
 
13.8%
3 390
 
3.9%
4 334
 
3.3%
5 227
 
2.3%
7 217
 
2.2%
8 216
 
2.2%
9 209
 
2.1%
6 188
 
1.9%

Most occurring scripts

ValueCountFrequency (%)
Common 10000
71.4%
Latin 4000
 
28.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1019
25.5%
R 1002
25.1%
T 806
20.2%
H 265
 
6.6%
B 262
 
6.6%
Q 211
 
5.3%
D 194
 
4.9%
O 109
 
2.7%
P 45
 
1.1%
M 38
 
0.9%
Other values (3) 49
 
1.2%
Common
ValueCountFrequency (%)
0 4382
43.8%
1 2461
24.6%
2 1376
 
13.8%
3 390
 
3.9%
4 334
 
3.3%
5 227
 
2.3%
7 217
 
2.2%
8 216
 
2.2%
9 209
 
2.1%
6 188
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4382
31.3%
1 2461
17.6%
2 1376
 
9.8%
A 1019
 
7.3%
R 1002
 
7.2%
T 806
 
5.8%
3 390
 
2.8%
4 334
 
2.4%
H 265
 
1.9%
B 262
 
1.9%
Other values (13) 1703
 
12.2%

회차
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
624 
2
376 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 624
62.4%
2 376
37.6%

Length

2023-12-12T09:53:30.775450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T09:53:30.935083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 624
62.4%
2 376
37.6%

고객번호
Real number (ℝ)

Distinct967
Distinct (%)96.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean78469957
Minimum7297641
Maximum84308662
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T09:53:31.079771image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum7297641
5-th percentile35800774
Q182859389
median83534622
Q384074168
95-th percentile84245389
Maximum84308662
Range77011021
Interquartile range (IQR)1214779.8

Descriptive statistics

Standard deviation15934879
Coefficient of variation (CV)0.20306981
Kurtosis10.43606
Mean78469957
Median Absolute Deviation (MAD)575552
Skewness-3.3731329
Sum7.8469957 × 1010
Variance2.5392037 × 1014
MonotonicityNot monotonic
2023-12-12T09:53:31.294436image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
83944661 2
 
0.2%
84153435 2
 
0.2%
84077508 2
 
0.2%
83393762 2
 
0.2%
17564102 2
 
0.2%
84069396 2
 
0.2%
84147399 2
 
0.2%
84147289 2
 
0.2%
84069189 2
 
0.2%
83944768 2
 
0.2%
Other values (957) 980
98.0%
ValueCountFrequency (%)
7297641 1
0.1%
7703614 1
0.1%
8571162 1
0.1%
8613884 1
0.1%
8695161 1
0.1%
8724922 1
0.1%
9021491 1
0.1%
9031083 1
0.1%
9263602 1
0.1%
9440218 1
0.1%
ValueCountFrequency (%)
84308662 1
0.1%
84308565 1
0.1%
84307155 1
0.1%
84307126 1
0.1%
84304996 1
0.1%
84304938 2
0.2%
84302105 1
0.1%
84302082 1
0.1%
84295322 1
0.1%
84295241 1
0.1%

신용정보여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
True
922 
False
 
78
ValueCountFrequency (%)
True 922
92.2%
False 78
 
7.8%
2023-12-12T09:53:31.462212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

등록사번
Real number (ℝ)

Distinct56
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5432.264
Minimum1050
Maximum7481
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size8.9 KiB
2023-12-12T09:53:31.602270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1050
5-th percentile1173
Q11431
median7313
Q37394
95-th percentile7471
Maximum7481
Range6431
Interquartile range (IQR)5963

Descriptive statistics

Standard deviation2848.3652
Coefficient of variation (CV)0.5243422
Kurtosis-1.399501
Mean5432.264
Median Absolute Deviation (MAD)144
Skewness-0.77387095
Sum5432264
Variance8113184.6
MonotonicityNot monotonic
2023-12-12T09:53:31.805254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1198 74
 
7.4%
7383 57
 
5.7%
7313 56
 
5.6%
7394 54
 
5.4%
7309 53
 
5.3%
7300 50
 
5.0%
7382 47
 
4.7%
7471 45
 
4.5%
7457 44
 
4.4%
7350 41
 
4.1%
Other values (46) 479
47.9%
ValueCountFrequency (%)
1050 3
 
0.3%
1104 12
1.2%
1130 1
 
0.1%
1133 6
0.6%
1141 9
0.9%
1143 1
 
0.1%
1147 4
 
0.4%
1149 1
 
0.1%
1152 2
 
0.2%
1170 2
 
0.2%
ValueCountFrequency (%)
7481 3
 
0.3%
7476 27
2.7%
7471 45
4.5%
7468 3
 
0.3%
7462 38
3.8%
7461 30
3.0%
7457 44
4.4%
7444 35
3.5%
7394 54
5.4%
7390 15
 
1.5%

Interactions

2023-12-12T09:53:29.320476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:53:28.974159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:53:29.489856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:53:29.158825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T09:53:31.938079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회차고객번호신용정보여부등록사번
회차1.0000.0390.0560.000
고객번호0.0391.0000.0000.103
신용정보여부0.0560.0001.0000.386
등록사번0.0000.1030.3861.000
2023-12-12T09:53:32.038497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
신용정보여부회차
신용정보여부1.0000.035
회차0.0351.000
2023-12-12T09:53:32.129460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
고객번호등록사번회차신용정보여부
고객번호1.0000.0310.0000.000
등록사번0.0311.0000.0000.250
회차0.0000.0001.0000.035
신용정보여부0.0000.2500.0351.000

Missing values

2023-12-12T09:53:29.722633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:53:29.867734image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

보증번호회차고객번호신용정보여부등록사번
0RTHA2011000308228830676Y7394
1RTHA2011000308173411667Y7394
2RQAD2011000516284308662Y7350
3RQAD2011000516184308565Y7350
4RQAD2011000515284307155Y7309
5RQAD2011000515184307126Y7309
6RTHB2011000179284304996Y7462
7RTHB2011000179184304938Y7462
8RTHA2011000307184304938Y7394
9RTAA201100042218724922N1173
보증번호회차고객번호신용정보여부등록사번
990RQAD2011000434283713135Y7309
991RTHB2011000144283706867Y7462
992RTBA2011000142183710222Y7471
993RTPA2011000110283384357Y7457
994RTQA2011000038279647448Y1133
995RTHB2011000140283658326Y7462
996RTHA2011000250183661339Y7394
997RTOA2011000059283622132Y7299
998RTMA2011000096283618489Y7444
999RTHA2011000246183617260Y7394