Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells984
Missing cells (%)32.8%
Duplicate rows138
Duplicate rows (%)13.8%
Total size in memory23.6 KiB
Average record size in memory24.1 B

Variable types

DateTime1
Text2

Dataset

Description한국주택금융공사 주택연금부 업무 관련 공개 공공데이터 (해당 부서의 업무와 관련된 데이터베이스에서 공개 가능한 원천 데이터) 입니다. 보증해지기표일자(구)거래일자,보증번호,비고 칼럼과 관련값이 존재합니다.
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15072859/fileData.do

Alerts

Dataset has 138 (13.8%) duplicate rowsDuplicates
비고 has 983 (98.3%) missing valuesMissing

Reproduction

Analysis started2023-12-12 00:42:05.669199
Analysis finished2023-12-12 00:42:06.056636
Duration0.39 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct70
Distinct (%)7.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Minimum2020-06-17 00:00:00
Maximum2020-10-26 00:00:00
2023-12-12T09:42:06.171726image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T09:42:06.369788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct809
Distinct (%)81.0%
Missing1
Missing (%)0.1%
Memory size7.9 KiB
2023-12-12T09:42:06.659137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length14
Mean length14
Min length14

Characters and Unicode

Total characters13986
Distinct characters22
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique710 ?
Unique (%)71.1%

Sample

1st rowRTPA2020000504
2nd rowRTPA2020000504
3rd rowRTPA2020000435
4th rowRTPA2020000435
5th rowRTAB2019000098
ValueCountFrequency (%)
rtab2019000888 10
 
1.0%
rtab2019000651 8
 
0.8%
rqad2019000922 8
 
0.8%
rtad2020000099 8
 
0.8%
rtab2019000098 8
 
0.8%
rtma2020000140 8
 
0.8%
rtqa2019000485 8
 
0.8%
rtaa2019000456 8
 
0.8%
rtad2019001015 8
 
0.8%
rtma2019000089 8
 
0.8%
Other values (799) 917
91.8%
2023-12-12T09:42:07.167451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4580
32.7%
2 1593
 
11.4%
1 1136
 
8.1%
A 1033
 
7.4%
R 1002
 
7.2%
T 914
 
6.5%
P 706
 
5.0%
4 435
 
3.1%
9 434
 
3.1%
3 385
 
2.8%
Other values (12) 1768
 
12.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 9989
71.4%
Uppercase Letter 3997
28.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 1033
25.8%
R 1002
25.1%
T 914
22.9%
P 706
17.7%
D 112
 
2.8%
Q 94
 
2.4%
B 75
 
1.9%
M 17
 
0.4%
H 17
 
0.4%
C 12
 
0.3%
Other values (2) 15
 
0.4%
Decimal Number
ValueCountFrequency (%)
0 4580
45.9%
2 1593
 
15.9%
1 1136
 
11.4%
4 435
 
4.4%
9 434
 
4.3%
3 385
 
3.9%
8 379
 
3.8%
6 365
 
3.7%
5 358
 
3.6%
7 324
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Common 9989
71.4%
Latin 3997
28.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 1033
25.8%
R 1002
25.1%
T 914
22.9%
P 706
17.7%
D 112
 
2.8%
Q 94
 
2.4%
B 75
 
1.9%
M 17
 
0.4%
H 17
 
0.4%
C 12
 
0.3%
Other values (2) 15
 
0.4%
Common
ValueCountFrequency (%)
0 4580
45.9%
2 1593
 
15.9%
1 1136
 
11.4%
4 435
 
4.4%
9 434
 
4.3%
3 385
 
3.9%
8 379
 
3.8%
6 365
 
3.7%
5 358
 
3.6%
7 324
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13986
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4580
32.7%
2 1593
 
11.4%
1 1136
 
8.1%
A 1033
 
7.4%
R 1002
 
7.2%
T 914
 
6.5%
P 706
 
5.0%
4 435
 
3.1%
9 434
 
3.1%
3 385
 
2.8%
Other values (12) 1768
 
12.6%

비고
Text

MISSING 

Distinct15
Distinct (%)88.2%
Missing983
Missing (%)98.3%
Memory size7.9 KiB
2023-12-12T09:42:07.519630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length41
Median length30
Mean length25.882353
Min length7

Characters and Unicode

Total characters440
Distinct characters93
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13 ?
Unique (%)76.5%

Sample

1st row9/22일자 기산일거래(기표 10/16)에 의한 581원 납부
2nd row10/22,581원 수납
3rd row보증서 발급 오류건으로 수납 확인(9.14) 및 기표 처리(9/23) 완료
4th row9/11일자 은행 실행 오류 발생하여 공사 원장에만 기표된 건
5th row은행 실행 오류로 인한 9/11일자 기표건 취소
ValueCountFrequency (%)
환급 5
 
5.0%
4
 
4.0%
수납 3
 
3.0%
따른 3
 
3.0%
취소 3
 
3.0%
기표건 3
 
3.0%
보증서 2
 
2.0%
9/11일자 2
 
2.0%
발생(환출 2
 
2.0%
차액 2
 
2.0%
Other values (55) 71
71.0%
2023-12-12T09:42:08.089433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
84
 
19.1%
1 17
 
3.9%
/ 17
 
3.9%
13
 
3.0%
7 11
 
2.5%
2 10
 
2.3%
( 10
 
2.3%
) 10
 
2.3%
10
 
2.3%
10
 
2.3%
Other values (83) 248
56.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 245
55.7%
Space Separator 84
 
19.1%
Decimal Number 68
 
15.5%
Other Punctuation 23
 
5.2%
Open Punctuation 10
 
2.3%
Close Punctuation 10
 
2.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
8
 
3.3%
8
 
3.3%
8
 
3.3%
7
 
2.9%
6
 
2.4%
Other values (67) 157
64.1%
Decimal Number
ValueCountFrequency (%)
1 17
25.0%
7 11
16.2%
2 10
14.7%
8 8
11.8%
9 7
10.3%
5 5
 
7.4%
0 3
 
4.4%
3 3
 
4.4%
4 2
 
2.9%
6 2
 
2.9%
Other Punctuation
ValueCountFrequency (%)
/ 17
73.9%
. 5
 
21.7%
, 1
 
4.3%
Space Separator
ValueCountFrequency (%)
84
100.0%
Open Punctuation
ValueCountFrequency (%)
( 10
100.0%
Close Punctuation
ValueCountFrequency (%)
) 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 245
55.7%
Common 195
44.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
8
 
3.3%
8
 
3.3%
8
 
3.3%
7
 
2.9%
6
 
2.4%
Other values (67) 157
64.1%
Common
ValueCountFrequency (%)
84
43.1%
1 17
 
8.7%
/ 17
 
8.7%
7 11
 
5.6%
2 10
 
5.1%
( 10
 
5.1%
) 10
 
5.1%
8 8
 
4.1%
9 7
 
3.6%
5 5
 
2.6%
Other values (6) 16
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 245
55.7%
ASCII 195
44.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
84
43.1%
1 17
 
8.7%
/ 17
 
8.7%
7 11
 
5.6%
2 10
 
5.1%
( 10
 
5.1%
) 10
 
5.1%
8 8
 
4.1%
9 7
 
3.6%
5 5
 
2.6%
Other values (6) 16
 
8.2%
Hangul
ValueCountFrequency (%)
13
 
5.3%
10
 
4.1%
10
 
4.1%
9
 
3.7%
9
 
3.7%
8
 
3.3%
8
 
3.3%
8
 
3.3%
7
 
2.9%
6
 
2.4%
Other values (67) 157
64.1%

Correlations

2023-12-12T09:42:08.216193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
보증해지기표일자(구)거래일자비고
보증해지기표일자(구)거래일자1.0000.842
비고0.8421.000

Missing values

2023-12-12T09:42:05.840442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T09:42:05.921098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T09:42:06.002656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

보증해지기표일자(구)거래일자보증번호비고
02020-10-26RTPA2020000504<NA>
12020-10-26RTPA2020000504<NA>
22020-10-26RTPA2020000435<NA>
32020-10-26RTPA2020000435<NA>
42020-10-26RTAB2019000098<NA>
52020-10-26RTAB2019000098<NA>
62020-10-23RTPA2020000468<NA>
72020-10-23RTPA2020000468<NA>
82020-10-23RTQA2019000485<NA>
92020-10-23RTQA2019000485<NA>
보증해지기표일자(구)거래일자보증번호비고
9902020-06-18RTAC2016000945<NA>
9912020-06-18RTAC2016000145<NA>
9922020-06-18RTAC2015000127<NA>
9932020-06-18RTAC2013000153<NA>
9942020-06-18RTAC2013000141<NA>
9952020-06-18RTAB2020000068<NA>
9962020-06-18RTAB2018000895<NA>
9972020-06-18RTAB2018000018<NA>
9982020-06-18RTAB2017000927<NA>
9992020-06-18RTAB2017000902<NA>

Duplicate rows

Most frequently occurring

보증해지기표일자(구)거래일자보증번호비고# duplicates
222020-07-16RTPA2020000363<NA>3
392020-07-29RTPA2020000369<NA>3
422020-07-30RTPA2020000322<NA>3
1162020-09-29RTPA2020000475<NA>3
02020-06-25RTAB2019000888<NA>2
12020-06-29RTAA2019000350<NA>2
22020-06-30RTAB2019000651<NA>2
32020-06-30RTPA2020000317<NA>2
42020-06-30RTPA2020000319<NA>2
52020-07-01RTPA2020000186<NA>2