Overview

Dataset statistics

Number of variables4
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.4 KiB
Average record size in memory35.3 B

Variable types

Text1
Categorical1
Numeric1
DateTime1

Dataset

Description한국주택금융공사 주택연금부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터)
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15073020/fileData.do

Alerts

SEQ is highly imbalanced (65.6%)Imbalance

Reproduction

Analysis started2023-12-12 22:20:53.139642
Analysis finished2023-12-12 22:20:53.522705
Duration0.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct86
Distinct (%)86.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
2023-12-13T07:20:53.699758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters1400
Distinct characters24
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique74 ?
Unique (%)74.0%

Sample

1st rowRTPA2020000519
2nd rowRTHO2020000496
3rd rowRTHB2020000590
4th rowRTOB2020000067
5th rowRTPB2020000155
ValueCountFrequency (%)
rtna2020000220 4
 
4.0%
rtma2020000256 2
 
2.0%
rtma2020000251 2
 
2.0%
rtha2020000742 2
 
2.0%
rtad2020000616 2
 
2.0%
rtab2020000727 2
 
2.0%
rtba2020000625 2
 
2.0%
rtpb2020000155 2
 
2.0%
rtac2020000662 2
 
2.0%
rtad2020000688 2
 
2.0%
Other values (76) 78
78.0%
2023-12-13T07:20:54.039964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 518
37.0%
2 242
17.3%
R 100
 
7.1%
T 95
 
6.8%
A 83
 
5.9%
6 58
 
4.1%
1 37
 
2.6%
7 34
 
2.4%
B 33
 
2.4%
4 31
 
2.2%
Other values (14) 169
 
12.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000
71.4%
Uppercase Letter 400
 
28.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R 100
25.0%
T 95
23.8%
A 83
20.8%
B 33
 
8.2%
H 24
 
6.0%
D 14
 
3.5%
O 11
 
2.8%
C 9
 
2.2%
P 9
 
2.2%
Q 7
 
1.8%
Other values (4) 15
 
3.8%
Decimal Number
ValueCountFrequency (%)
0 518
51.8%
2 242
24.2%
6 58
 
5.8%
1 37
 
3.7%
7 34
 
3.4%
4 31
 
3.1%
5 29
 
2.9%
9 24
 
2.4%
8 16
 
1.6%
3 11
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000
71.4%
Latin 400
 
28.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
R 100
25.0%
T 95
23.8%
A 83
20.8%
B 33
 
8.2%
H 24
 
6.0%
D 14
 
3.5%
O 11
 
2.8%
C 9
 
2.2%
P 9
 
2.2%
Q 7
 
1.8%
Other values (4) 15
 
3.8%
Common
ValueCountFrequency (%)
0 518
51.8%
2 242
24.2%
6 58
 
5.8%
1 37
 
3.7%
7 34
 
3.4%
4 31
 
3.1%
5 29
 
2.9%
9 24
 
2.4%
8 16
 
1.6%
3 11
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1400
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 518
37.0%
2 242
17.3%
R 100
 
7.1%
T 95
 
6.8%
A 83
 
5.9%
6 58
 
4.1%
1 37
 
2.6%
7 34
 
2.4%
B 33
 
2.4%
4 31
 
2.2%
Other values (14) 169
 
12.1%

SEQ
Categorical

IMBALANCE 

Distinct4
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
1
86 
2
12 
4
 
1
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique2 ?
Unique (%)2.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 86
86.0%
2 12
 
12.0%
4 1
 
1.0%
3 1
 
1.0%

Length

2023-12-13T07:20:54.168307image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:20:54.251271image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 86
86.0%
2 12
 
12.0%
4 1
 
1.0%
3 1
 
1.0%

REG_ENO
Real number (ℝ)

Distinct43
Distinct (%)43.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1702.7
Minimum1174
Maximum2003
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 KiB
2023-12-13T07:20:54.358111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1174
5-th percentile1304
Q11557
median1690
Q31917
95-th percentile1982.25
Maximum2003
Range829
Interquartile range (IQR)360

Descriptive statistics

Standard deviation221.72011
Coefficient of variation (CV)0.13021678
Kurtosis-0.3826661
Mean1702.7
Median Absolute Deviation (MAD)152.5
Skewness-0.50995131
Sum170270
Variance49159.808
MonotonicityNot monotonic
2023-12-13T07:20:54.466237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
1656 8
 
8.0%
1970 6
 
6.0%
1557 5
 
5.0%
1689 5
 
5.0%
1956 5
 
5.0%
1753 5
 
5.0%
1917 4
 
4.0%
1174 4
 
4.0%
1691 4
 
4.0%
1385 4
 
4.0%
Other values (33) 50
50.0%
ValueCountFrequency (%)
1174 4
4.0%
1304 2
 
2.0%
1371 3
3.0%
1385 4
4.0%
1406 2
 
2.0%
1475 3
3.0%
1521 1
 
1.0%
1554 2
 
2.0%
1557 5
5.0%
1569 1
 
1.0%
ValueCountFrequency (%)
2003 1
 
1.0%
2001 2
 
2.0%
2000 1
 
1.0%
1987 1
 
1.0%
1982 1
 
1.0%
1980 2
 
2.0%
1977 1
 
1.0%
1970 6
6.0%
1968 2
 
2.0%
1956 5
5.0%

REG_TS
Date

Distinct87
Distinct (%)87.0%
Missing0
Missing (%)0.0%
Memory size932.0 B
Minimum2020-09-17 15:34:12
Maximum2020-10-22 15:51:00
2023-12-13T07:20:54.824697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T07:20:54.939144image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Interactions

2023-12-13T07:20:53.305969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:20:55.016019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
GUARNT_NOSEQREG_ENOREG_TS
GUARNT_NO1.0000.0001.0001.000
SEQ0.0001.0000.0000.000
REG_ENO1.0000.0001.0001.000
REG_TS1.0000.0001.0001.000
2023-12-13T07:20:55.093796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REG_ENOSEQ
REG_ENO1.0000.000
SEQ0.0001.000

Missing values

2023-12-13T07:20:53.412159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:20:53.491773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

GUARNT_NOSEQREG_ENOREG_TS
0RTPA2020000519119822020/10/22 15:51:00
1RTHO2020000496119172020/10/22 13:53:17
2RTHB2020000590113712020/10/22 14:31:27
3RTOB2020000067116202020/10/22 11:45:18
4RTPB2020000155219802020/10/22 10:58:30
5RTAD2020000688216562020/10/22 10:34:13
6RTPB2020000155119802020/10/22 10:58:30
7RTHO2020000499116912020/10/22 09:51:06
8RTAD2020000688116562020/10/22 10:34:13
9RTPA2020000531118892020/10/22 09:26:53
GUARNT_NOSEQREG_ENOREG_TS
90RTAB2020000727113852020/09/21 17:35:17
91RTAB2020000724116892020/09/23 11:53:16
92RTAA2020000554117992020/09/18 14:58:14
93RTAC2020000673117882020/09/18 14:55:00
94RTBA2020000607117202020/09/22 11:23:05
95RTAC2020000662217532020/09/21 16:49:15
96RTBB2020000178113042020/09/18 14:16:57
97RTAC2020000662117532020/09/21 16:49:15
98RTQA2020000271118742020/09/18 10:27:21
99RTHA2020000645115692020/09/17 15:34:12