Overview

Dataset statistics

Number of variables9
Number of observations501
Missing cells1284
Missing cells (%)28.5%
Duplicate rows1
Duplicate rows (%)0.2%
Total size in memory36.8 KiB
Average record size in memory75.3 B

Variable types

Numeric2
Categorical6
Text1

Dataset

Description서울특별시 구로구 빈집 현황으로 행정동, 상세주소, 용도, 면적(㎡), 소유자, 등급, 빈집판정일의 정보를 제공합니다.
Author서울특별시 구로구
URLhttps://www.data.go.kr/data/15127329/fileData.do

Alerts

Dataset has 1 (0.2%) duplicate rowsDuplicates
용도 is highly overall correlated with 데이터기준일자High correlation
행정동 is highly overall correlated with 빈집판정일 and 1 other fieldsHigh correlation
소유자 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
데이터기준일자 is highly overall correlated with 연번 and 6 other fieldsHigh correlation
등급 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
빈집판정일 is highly overall correlated with 행정동 and 1 other fieldsHigh correlation
연번 is highly overall correlated with 소유자 and 2 other fieldsHigh correlation
면적 is highly overall correlated with 데이터기준일자High correlation
행정동 is highly imbalanced (70.0%)Imbalance
용도 is highly imbalanced (66.2%)Imbalance
소유자 is highly imbalanced (63.8%)Imbalance
등급 is highly imbalanced (62.0%)Imbalance
빈집판정일 is highly imbalanced (72.6%)Imbalance
연번 has 428 (85.4%) missing valuesMissing
상세주소 has 428 (85.4%) missing valuesMissing
면적 has 428 (85.4%) missing valuesMissing

Reproduction

Analysis started2024-04-06 08:21:27.198847
Analysis finished2024-04-06 08:21:30.482278
Duration3.28 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct73
Distinct (%)100.0%
Missing428
Missing (%)85.4%
Infinite0
Infinite (%)0.0%
Mean37
Minimum1
Maximum73
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-06T17:21:30.643103image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.6
Q119
median37
Q355
95-th percentile69.4
Maximum73
Range72
Interquartile range (IQR)36

Descriptive statistics

Standard deviation21.217131
Coefficient of variation (CV)0.57343598
Kurtosis-1.2
Mean37
Median Absolute Deviation (MAD)18
Skewness0
Sum2701
Variance450.16667
MonotonicityStrictly increasing
2024-04-06T17:21:30.933574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
56 1
 
0.2%
54 1
 
0.2%
53 1
 
0.2%
52 1
 
0.2%
51 1
 
0.2%
50 1
 
0.2%
49 1
 
0.2%
48 1
 
0.2%
47 1
 
0.2%
46 1
 
0.2%
Other values (63) 63
 
12.6%
(Missing) 428
85.4%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
ValueCountFrequency (%)
73 1
0.2%
72 1
0.2%
71 1
0.2%
70 1
0.2%
69 1
0.2%
68 1
0.2%
67 1
0.2%
66 1
0.2%
65 1
0.2%
64 1
0.2%

행정동
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct11
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
428 
고척동
 
17
구로동
 
12
오류동
 
10
항동
 
8
Other values (6)
 
26

Length

Max length4
Median length4
Mean length3.8403194
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row고척동
2nd row오류동
3rd row고척동
4th row고척동
5th row고척동

Common Values

ValueCountFrequency (%)
<NA> 428
85.4%
고척동 17
 
3.4%
구로동 12
 
2.4%
오류동 10
 
2.0%
항동 8
 
1.6%
개봉동 7
 
1.4%
가리봉동 7
 
1.4%
궁동 7
 
1.4%
온수동 2
 
0.4%
천왕동 2
 
0.4%

Length

2024-04-06T17:21:31.713492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 428
85.4%
고척동 17
 
3.4%
구로동 12
 
2.4%
오류동 10
 
2.0%
항동 8
 
1.6%
개봉동 7
 
1.4%
가리봉동 7
 
1.4%
궁동 7
 
1.4%
온수동 2
 
0.4%
천왕동 2
 
0.4%

상세주소
Text

MISSING 

Distinct66
Distinct (%)90.4%
Missing428
Missing (%)85.4%
Memory size4.0 KiB
2024-04-06T17:21:32.187686image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length16
Mean length9.8767123
Min length7

Characters and Unicode

Total characters721
Distinct characters39
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique65 ?
Unique (%)89.0%

Sample

1st row고척동 52-68
2nd row오류동 279-1
3rd row고척동 98-30
4th row고척동 145-21
5th row고척동 98-31
ValueCountFrequency (%)
고척동 17
 
11.1%
구로동 12
 
7.8%
오류동 10
 
6.5%
16-13 8
 
5.2%
항동 8
 
5.2%
개봉동 7
 
4.6%
궁동 7
 
4.6%
가리봉동 7
 
4.6%
429-71 2
 
1.3%
202 2
 
1.3%
Other values (71) 73
47.7%
2024-04-06T17:21:33.078854image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 85
 
11.8%
83
 
11.5%
73
 
10.1%
- 72
 
10.0%
2 48
 
6.7%
4 34
 
4.7%
3 32
 
4.4%
0 28
 
3.9%
9 27
 
3.7%
7 23
 
3.2%
Other values (29) 216
30.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 335
46.5%
Other Letter 216
30.0%
Space Separator 84
 
11.7%
Dash Punctuation 72
 
10.0%
Open Punctuation 6
 
0.8%
Close Punctuation 6
 
0.8%
Uppercase Letter 2
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
73
33.8%
17
 
7.9%
17
 
7.9%
14
 
6.5%
12
 
5.6%
12
 
5.6%
10
 
4.6%
10
 
4.6%
8
 
3.7%
8
 
3.7%
Other values (13) 35
16.2%
Decimal Number
ValueCountFrequency (%)
1 85
25.4%
2 48
14.3%
4 34
 
10.1%
3 32
 
9.6%
0 28
 
8.4%
9 27
 
8.1%
7 23
 
6.9%
5 23
 
6.9%
6 18
 
5.4%
8 17
 
5.1%
Space Separator
ValueCountFrequency (%)
83
98.8%
  1
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
- 72
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6
100.0%
Uppercase Letter
ValueCountFrequency (%)
B 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 503
69.8%
Hangul 216
30.0%
Latin 2
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
73
33.8%
17
 
7.9%
17
 
7.9%
14
 
6.5%
12
 
5.6%
12
 
5.6%
10
 
4.6%
10
 
4.6%
8
 
3.7%
8
 
3.7%
Other values (13) 35
16.2%
Common
ValueCountFrequency (%)
1 85
16.9%
83
16.5%
- 72
14.3%
2 48
9.5%
4 34
 
6.8%
3 32
 
6.4%
0 28
 
5.6%
9 27
 
5.4%
7 23
 
4.6%
5 23
 
4.6%
Other values (5) 48
9.5%
Latin
ValueCountFrequency (%)
B 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 504
69.9%
Hangul 216
30.0%
None 1
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 85
16.9%
83
16.5%
- 72
14.3%
2 48
9.5%
4 34
 
6.7%
3 32
 
6.3%
0 28
 
5.6%
9 27
 
5.4%
7 23
 
4.6%
5 23
 
4.6%
Other values (5) 49
9.7%
Hangul
ValueCountFrequency (%)
73
33.8%
17
 
7.9%
17
 
7.9%
14
 
6.5%
12
 
5.6%
12
 
5.6%
10
 
4.6%
10
 
4.6%
8
 
3.7%
8
 
3.7%
Other values (13) 35
16.2%
None
ValueCountFrequency (%)
  1
100.0%

용도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
428 
단독
52 
다세대
 
12
다가구
 
5
연립
 
4

Length

Max length4
Median length4
Mean length3.742515
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row단독
2nd row단독
3rd row단독
4th row단독
5th row단독

Common Values

ValueCountFrequency (%)
<NA> 428
85.4%
단독 52
 
10.4%
다세대 12
 
2.4%
다가구 5
 
1.0%
연립 4
 
0.8%

Length

2024-04-06T17:21:33.356949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:21:33.581735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 428
85.4%
단독 52
 
10.4%
다세대 12
 
2.4%
다가구 5
 
1.0%
연립 4
 
0.8%

면적
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct52
Distinct (%)71.2%
Missing428
Missing (%)85.4%
Infinite0
Infinite (%)0.0%
Mean163.39726
Minimum30
Maximum750
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-04-06T17:21:33.830037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum30
5-th percentile47.8
Q179
median138.8
Q3215
95-th percentile388.14
Maximum750
Range720
Interquartile range (IQR)136

Descriptive statistics

Standard deviation120.76064
Coefficient of variation (CV)0.73906163
Kurtosis8.0492005
Mean163.39726
Median Absolute Deviation (MAD)65.8
Skewness2.3952954
Sum11928
Variance14583.133
MonotonicityNot monotonic
2024-04-06T17:21:34.117059image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
215.0 8
 
1.6%
69.0 3
 
0.6%
63.0 2
 
0.4%
218.0 2
 
0.4%
102.0 2
 
0.4%
165.0 2
 
0.4%
93.0 2
 
0.4%
166.0 2
 
0.4%
112.0 2
 
0.4%
33.0 2
 
0.4%
Other values (42) 46
 
9.2%
(Missing) 428
85.4%
ValueCountFrequency (%)
30.0 1
 
0.2%
33.0 2
0.4%
40.0 1
 
0.2%
53.0 2
0.4%
56.0 1
 
0.2%
59.5 1
 
0.2%
60.0 1
 
0.2%
63.0 2
0.4%
63.1 1
 
0.2%
69.0 3
0.6%
ValueCountFrequency (%)
750.0 1
0.2%
502.0 1
0.2%
499.0 1
0.2%
465.0 1
0.2%
336.9 1
0.2%
331.0 1
0.2%
299.0 1
0.2%
251.0 1
0.2%
245.0 1
0.2%
225.0 1
0.2%

소유자
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
428 
개인
56 
SH공사
 
16
서울시
 
1

Length

Max length4
Median length4
Mean length3.7744511
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowSH공사
2nd row개인
3rd rowSH공사
4th row개인
5th row개인

Common Values

ValueCountFrequency (%)
<NA> 428
85.4%
개인 56
 
11.2%
SH공사 16
 
3.2%
서울시 1
 
0.2%

Length

2024-04-06T17:21:34.411590image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:21:34.634934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 428
85.4%
개인 56
 
11.2%
sh공사 16
 
3.2%
서울시 1
 
0.2%

등급
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
428 
3
 
26
4
 
18
2
 
17
1
 
12

Length

Max length4
Median length4
Mean length3.5628743
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
<NA> 428
85.4%
3 26
 
5.2%
4 18
 
3.6%
2 17
 
3.4%
1 12
 
2.4%

Length

2024-04-06T17:21:34.840944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:21:35.059163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 428
85.4%
3 26
 
5.2%
4 18
 
3.6%
2 17
 
3.4%
1 12
 
2.4%

빈집판정일
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct14
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
431 
2019-07-23
 
15
2019-05-02
 
12
2019-05-26
 
8
2019-04-25
 
8
Other values (9)
 
27

Length

Max length10
Median length4
Mean length4.8383234
Min length4

Unique

Unique2 ?
Unique (%)0.4%

Sample

1st row2019-05-15
2nd row2019-05-24
3rd row2019-05-26
4th row2019-07-23
5th row2019-05-26

Common Values

ValueCountFrequency (%)
<NA> 431
86.0%
2019-07-23 15
 
3.0%
2019-05-02 12
 
2.4%
2019-05-26 8
 
1.6%
2019-04-25 8
 
1.6%
2019-04-02 6
 
1.2%
2019-07-24 6
 
1.2%
2019-05-15 3
 
0.6%
2019-02-08 3
 
0.6%
2019-04-19 3
 
0.6%
Other values (4) 6
 
1.2%

Length

2024-04-06T17:21:35.289731image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 431
86.0%
2019-07-23 15
 
3.0%
2019-05-02 12
 
2.4%
2019-05-26 8
 
1.6%
2019-04-25 8
 
1.6%
2019-04-02 6
 
1.2%
2019-07-24 6
 
1.2%
2019-05-15 3
 
0.6%
2019-02-08 3
 
0.6%
2019-04-19 3
 
0.6%
Other values (4) 6
 
1.2%

데이터기준일자
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
<NA>
428 
2024-03-22
73 

Length

Max length10
Median length4
Mean length4.8742515
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-03-22
2nd row2024-03-22
3rd row2024-03-22
4th row2024-03-22
5th row2024-03-22

Common Values

ValueCountFrequency (%)
<NA> 428
85.4%
2024-03-22 73
 
14.6%

Length

2024-04-06T17:21:35.556679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-06T17:21:35.770118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 428
85.4%
2024-03-22 73
 
14.6%

Interactions

2024-04-06T17:21:28.856028image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:21:28.293769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:21:29.171688image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-04-06T17:21:28.566298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-04-06T17:21:35.921887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번행정동상세주소용도면적소유자등급빈집판정일
연번1.0000.7860.9610.6350.5340.7270.8690.539
행정동0.7861.0001.0000.6310.5000.2560.4280.879
상세주소0.9611.0001.0001.0001.0001.0001.0001.000
용도0.6350.6311.0001.0000.5950.3250.6090.534
면적0.5340.5001.0000.5951.0000.0000.3900.000
소유자0.7270.2561.0000.3250.0001.0000.1070.488
등급0.8690.4281.0000.6090.3900.1071.0000.000
빈집판정일0.5390.8791.0000.5340.0000.4880.0001.000
2024-04-06T17:21:36.237170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용도행정동소유자데이터기준일자등급빈집판정일
용도1.0000.4120.3111.0000.2750.313
행정동0.4121.0000.1441.0000.2540.609
소유자0.3110.1441.0001.0000.0980.289
데이터기준일자1.0001.0001.0001.0001.0001.000
등급0.2750.2540.0981.0001.0000.000
빈집판정일0.3130.6090.2891.0000.0001.000
2024-04-06T17:21:36.477247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번면적행정동용도소유자등급빈집판정일데이터기준일자
연번1.0000.1950.3490.4150.5600.6950.2461.000
면적0.1951.0000.2720.4410.0000.2760.0001.000
행정동0.3490.2721.0000.4120.1440.2540.6091.000
용도0.4150.4410.4121.0000.3110.2750.3131.000
소유자0.5600.0000.1440.3111.0000.0980.2891.000
등급0.6950.2760.2540.2750.0981.0000.0001.000
빈집판정일0.2460.0000.6090.3130.2890.0001.0001.000
데이터기준일자1.0001.0001.0001.0001.0001.0001.0001.000

Missing values

2024-04-06T17:21:29.600621image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-06T17:21:29.951915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-06T17:21:30.239674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번행정동상세주소용도면적소유자등급빈집판정일데이터기준일자
01고척동고척동 52-68단독33.0SH공사22019-05-152024-03-22
12오류동오류동 279-1단독166.0개인32019-05-242024-03-22
23고척동고척동 98-30단독73.0SH공사42019-05-262024-03-22
34고척동고척동 145-21단독76.0개인42019-07-232024-03-22
45고척동고척동 98-31단독69.0개인42019-05-262024-03-22
56고척동고척동 52-191단독33.0개인42019-05-262024-03-22
67개봉동개봉동 55-20다가구99.0SH공사22019-07-232024-03-22
78고척동고척동 249-77단독205.0SH공사22019-05-022024-03-22
89오류동오류동 1-34단독136.0SH공사12019-05-022024-03-22
910오류동오류동 19-93단독155.0SH공사32019-05-022024-03-22
연번행정동상세주소용도면적소유자등급빈집판정일데이터기준일자
491<NA><NA><NA><NA><NA><NA><NA><NA><NA>
492<NA><NA><NA><NA><NA><NA><NA><NA><NA>
493<NA><NA><NA><NA><NA><NA><NA><NA><NA>
494<NA><NA><NA><NA><NA><NA><NA><NA><NA>
495<NA><NA><NA><NA><NA><NA><NA><NA><NA>
496<NA><NA><NA><NA><NA><NA><NA><NA><NA>
497<NA><NA><NA><NA><NA><NA><NA><NA><NA>
498<NA><NA><NA><NA><NA><NA><NA><NA><NA>
499<NA><NA><NA><NA><NA><NA><NA><NA><NA>
500<NA><NA><NA><NA><NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

연번행정동상세주소용도면적소유자등급빈집판정일데이터기준일자# duplicates
0<NA><NA><NA><NA><NA><NA><NA><NA><NA>428