Overview

Dataset statistics

Number of variables4
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory32.4 KiB
Average record size in memory33.1 B

Variable types

Categorical3
Text1

Dataset

Description한국주택금융공사의 모기지연계보증요율참고에 대한 정보이며 기준일자 법인명 등록사번이 포함된 데이터를 제공합니다.
Author한국주택금융공사
URLhttps://www.data.go.kr/data/15049771/fileData.do

Alerts

Dataset has 1 (0.1%) duplicate rowsDuplicates
등록사번 is highly overall correlated with 기준일자 and 1 other fieldsHigh correlation
등록일시 is highly overall correlated with 기준일자 and 1 other fieldsHigh correlation
기준일자 is highly overall correlated with 등록사번 and 1 other fieldsHigh correlation
등록사번 is highly imbalanced (75.8%)Imbalance

Reproduction

Analysis started2023-12-12 06:49:10.707362
Analysis finished2023-12-12 06:49:11.136061
Duration0.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기준일자
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2020-08-01
320 
2019-08-01
320 
2018-08-01
320 
2017-08-01
40 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-08-01
2nd row2020-08-01
3rd row2020-08-01
4th row2020-08-01
5th row2020-08-01

Common Values

ValueCountFrequency (%)
2020-08-01 320
32.0%
2019-08-01 320
32.0%
2018-08-01 320
32.0%
2017-08-01 40
 
4.0%

Length

2023-12-12T15:49:11.215853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:49:11.340454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-08-01 320
32.0%
2019-08-01 320
32.0%
2018-08-01 320
32.0%
2017-08-01 40
 
4.0%
Distinct395
Distinct (%)39.5%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2023-12-12T15:49:11.596172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length13
Mean length7.835
Min length4

Characters and Unicode

Total characters7835
Distinct characters211
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique75 ?
Unique (%)7.5%

Sample

1st row동신건설(주)
2nd row신해공영(주)
3rd row래미안건설(주)
4th row(주)대양산업건설
5th row동호건설(주)
ValueCountFrequency (%)
주식회사 16
 
1.5%
개발공사 15
 
1.4%
도시개발공사 12
 
1.1%
지방공사 6
 
0.6%
유)한백종합건설 4
 
0.4%
주)신 4
 
0.4%
주)대창건설 4
 
0.4%
피엔지건설(주 4
 
0.4%
동성건설(주 4
 
0.4%
정우개발(주 4
 
0.4%
Other values (402) 1010
93.3%
2023-12-12T15:49:12.079955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
968
 
12.4%
( 918
 
11.7%
) 918
 
11.7%
632
 
8.1%
561
 
7.2%
140
 
1.8%
140
 
1.8%
140
 
1.8%
111
 
1.4%
111
 
1.4%
Other values (201) 3196
40.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 5853
74.7%
Open Punctuation 918
 
11.7%
Close Punctuation 918
 
11.7%
Space Separator 140
 
1.8%
Uppercase Letter 6
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
968
 
16.5%
632
 
10.8%
561
 
9.6%
140
 
2.4%
140
 
2.4%
111
 
1.9%
111
 
1.9%
105
 
1.8%
103
 
1.8%
102
 
1.7%
Other values (196) 2880
49.2%
Uppercase Letter
ValueCountFrequency (%)
H 3
50.0%
S 3
50.0%
Open Punctuation
ValueCountFrequency (%)
( 918
100.0%
Close Punctuation
ValueCountFrequency (%)
) 918
100.0%
Space Separator
ValueCountFrequency (%)
140
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 5853
74.7%
Common 1976
 
25.2%
Latin 6
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
968
 
16.5%
632
 
10.8%
561
 
9.6%
140
 
2.4%
140
 
2.4%
111
 
1.9%
111
 
1.9%
105
 
1.8%
103
 
1.8%
102
 
1.7%
Other values (196) 2880
49.2%
Common
ValueCountFrequency (%)
( 918
46.5%
) 918
46.5%
140
 
7.1%
Latin
ValueCountFrequency (%)
H 3
50.0%
S 3
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 5853
74.7%
ASCII 1982
 
25.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
968
 
16.5%
632
 
10.8%
561
 
9.6%
140
 
2.4%
140
 
2.4%
111
 
1.9%
111
 
1.9%
105
 
1.8%
103
 
1.8%
102
 
1.7%
Other values (196) 2880
49.2%
ASCII
ValueCountFrequency (%)
( 918
46.3%
) 918
46.3%
140
 
7.1%
H 3
 
0.2%
S 3
 
0.2%

등록사번
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1505
960 
1249
 
40

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1505
2nd row1505
3rd row1505
4th row1505
5th row1505

Common Values

ValueCountFrequency (%)
1505 960
96.0%
1249 40
 
4.0%

Length

2023-12-12T15:49:12.253189image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:49:12.346325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1505 960
96.0%
1249 40
 
4.0%

등록일시
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2020-07-31 17:04
320 
2019-08-01 10:17
320 
2018-07-31 17:47
200 
2018-07-31 17:49
100 
2017-08-07 13:19
40 

Length

Max length16
Median length16
Mean length16
Min length16

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-07-31 17:04
2nd row2020-07-31 17:04
3rd row2020-07-31 17:04
4th row2020-07-31 17:04
5th row2020-07-31 17:04

Common Values

ValueCountFrequency (%)
2020-07-31 17:04 320
32.0%
2019-08-01 10:17 320
32.0%
2018-07-31 17:47 200
20.0%
2018-07-31 17:49 100
 
10.0%
2017-08-07 13:19 40
 
4.0%
2018-07-31 17:50 20
 
2.0%

Length

2023-12-12T15:49:12.438448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T15:49:12.567139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-07-31 320
16.0%
17:04 320
16.0%
2019-08-01 320
16.0%
10:17 320
16.0%
2018-07-31 320
16.0%
17:47 200
10.0%
17:49 100
 
5.0%
2017-08-07 40
 
2.0%
13:19 40
 
2.0%
17:50 20
 
1.0%

Correlations

2023-12-12T15:49:12.661526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준일자등록사번등록일시
기준일자1.0001.0001.000
등록사번1.0001.0001.000
등록일시1.0001.0001.000
2023-12-12T15:49:13.076079image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록사번등록일시기준일자
등록사번1.0000.9980.999
등록일시0.9981.0000.999
기준일자0.9990.9991.000
2023-12-12T15:49:13.161950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준일자등록사번등록일시
기준일자1.0000.9990.999
등록사번0.9991.0000.998
등록일시0.9990.9981.000

Missing values

2023-12-12T15:49:10.988256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T15:49:11.096732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

기준일자법인명등록사번등록일시
02020-08-01동신건설(주)15052020-07-31 17:04
12020-08-01신해공영(주)15052020-07-31 17:04
22020-08-01래미안건설(주)15052020-07-31 17:04
32020-08-01(주)대양산업건설15052020-07-31 17:04
42020-08-01동호건설(주)15052020-07-31 17:04
52020-08-01은일종합건설(주)15052020-07-31 17:04
62020-08-01(주)광양종합건설15052020-07-31 17:04
72020-08-01(주)두손건설15052020-07-31 17:04
82020-08-01서림종합건설(주)15052020-07-31 17:04
92020-08-01화성종합건설(주)15052020-07-31 17:04
기준일자법인명등록사번등록일시
9902017-08-01고운시티아이(주)12492017-08-07 13:19
9912017-08-01(주)삼희종합건설12492017-08-07 13:19
9922017-08-01(주)문영엔지니어링12492017-08-07 13:19
9932017-08-01정상종합건설(주)12492017-08-07 13:19
9942017-08-01(유)한백종합건설12492017-08-07 13:19
9952017-08-01(주)신화종합건설12492017-08-07 13:19
9962017-08-01(주)송학건설12492017-08-07 13:19
9972017-08-01(주)대창건설12492017-08-07 13:19
9982017-08-01신안종합건설12492017-08-07 13:19
9992017-08-01(주)대건12492017-08-07 13:19

Duplicate rows

Most frequently occurring

기준일자법인명등록사번등록일시# duplicates
02020-08-01우경건설(주)15052020-07-31 17:042