Overview

Dataset statistics

Number of variables5
Number of observations39
Missing cells24
Missing cells (%)12.3%
Duplicate rows1
Duplicate rows (%)2.6%
Total size in memory1.7 KiB
Average record size in memory45.4 B

Variable types

Text2
Categorical2
Numeric1

Dataset

Description대구광역시 동구 관내 다가구주택,다세대주택,숙박시설 공사현황 데이터입니다. 대지위치, 용도, 세대수, 가구수 등의 항목을 포함합니다.
Author대구광역시 동구
URLhttps://www.data.go.kr/data/3055355/fileData.do

Alerts

Dataset has 1 (2.6%) duplicate rowsDuplicates
세대수 is highly overall correlated with 주용도High correlation
주용도 is highly overall correlated with 가구수 and 1 other fieldsHigh correlation
가구수 is highly overall correlated with 주용도High correlation
주용도 is highly imbalanced (82.8%)Imbalance
세대수 is highly imbalanced (63.6%)Imbalance
부속용도 has 19 (48.7%) missing valuesMissing
가구수 has 5 (12.8%) missing valuesMissing

Reproduction

Analysis started2024-03-16 04:13:19.568213
Analysis finished2024-03-16 04:13:20.534540
Duration0.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct38
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size444.0 B
2024-03-16T13:13:20.791802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length33
Median length23
Mean length20.230769
Min length15

Characters and Unicode

Total characters789
Distinct characters61
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37 ?
Unique (%)94.9%

Sample

1st row대구광역시 동구 상매동 506
2nd row대구광역시 동구 율암동 1312
3rd row대구광역시 동구 도동 445
4th row대구광역시 동구 진인동 산 203-7
5th row대구광역시 동구 봉무동 1042
ValueCountFrequency (%)
대구광역시 39
21.9%
동구 39
21.9%
외1필지 8
 
4.5%
지묘동 6
 
3.4%
중대동 5
 
2.8%
숙천동 4
 
2.2%
각산동 4
 
2.2%
453-11 2
 
1.1%
신암동 2
 
1.1%
율암동 2
 
1.1%
Other values (62) 67
37.6%
2024-03-16T13:13:21.380752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
139
17.6%
85
 
10.8%
78
 
9.9%
47
 
6.0%
1 40
 
5.1%
39
 
4.9%
39
 
4.9%
39
 
4.9%
- 27
 
3.4%
2 24
 
3.0%
Other values (51) 232
29.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 452
57.3%
Decimal Number 168
 
21.3%
Space Separator 139
 
17.6%
Dash Punctuation 27
 
3.4%
Uppercase Letter 3
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
85
18.8%
78
17.3%
47
10.4%
39
8.6%
39
8.6%
39
8.6%
21
 
4.6%
11
 
2.4%
11
 
2.4%
6
 
1.3%
Other values (38) 76
16.8%
Decimal Number
ValueCountFrequency (%)
1 40
23.8%
2 24
14.3%
3 21
12.5%
5 18
10.7%
4 16
 
9.5%
7 14
 
8.3%
0 14
 
8.3%
6 10
 
6.0%
8 7
 
4.2%
9 4
 
2.4%
Space Separator
ValueCountFrequency (%)
139
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 27
100.0%
Uppercase Letter
ValueCountFrequency (%)
A 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 452
57.3%
Common 334
42.3%
Latin 3
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
85
18.8%
78
17.3%
47
10.4%
39
8.6%
39
8.6%
39
8.6%
21
 
4.6%
11
 
2.4%
11
 
2.4%
6
 
1.3%
Other values (38) 76
16.8%
Common
ValueCountFrequency (%)
139
41.6%
1 40
 
12.0%
- 27
 
8.1%
2 24
 
7.2%
3 21
 
6.3%
5 18
 
5.4%
4 16
 
4.8%
7 14
 
4.2%
0 14
 
4.2%
6 10
 
3.0%
Other values (2) 11
 
3.3%
Latin
ValueCountFrequency (%)
A 3
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 452
57.3%
ASCII 337
42.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
139
41.2%
1 40
 
11.9%
- 27
 
8.0%
2 24
 
7.1%
3 21
 
6.2%
5 18
 
5.3%
4 16
 
4.7%
7 14
 
4.2%
0 14
 
4.2%
6 10
 
3.0%
Other values (3) 14
 
4.2%
Hangul
ValueCountFrequency (%)
85
18.8%
78
17.3%
47
10.4%
39
8.6%
39
8.6%
39
8.6%
21
 
4.6%
11
 
2.4%
11
 
2.4%
6
 
1.3%
Other values (38) 76
16.8%

주용도
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Memory size444.0 B
단독주택
38 
숙박시설
 
1

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique1 ?
Unique (%)2.6%

Sample

1st row숙박시설
2nd row단독주택
3rd row단독주택
4th row단독주택
5th row단독주택

Common Values

ValueCountFrequency (%)
단독주택 38
97.4%
숙박시설 1
 
2.6%

Length

2024-03-16T13:13:21.601901image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-16T13:13:21.738603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
단독주택 38
97.4%
숙박시설 1
 
2.6%

부속용도
Text

MISSING 

Distinct15
Distinct (%)75.0%
Missing19
Missing (%)48.7%
Memory size444.0 B
2024-03-16T13:13:21.944695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length21
Mean length9.8
Min length2

Characters and Unicode

Total characters196
Distinct characters39
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11 ?
Unique (%)55.0%

Sample

1st row관광호텔
2nd row다가구주택,제1종근린생활시설(소매점,휴게음식점)
3rd row다가구(10가구), 제2종근린생활시설(사무소)
4th row사무소
5th row1가구
ValueCountFrequency (%)
다가구주택 6
22.2%
1가구 2
 
7.4%
단독주택 2
 
7.4%
제1종근린생활시설 2
 
7.4%
2
 
7.4%
제1종근린생활시설(소매점 2
 
7.4%
근린생활시설 2
 
7.4%
관광호텔 1
 
3.7%
다가구주택,제1종근린생활시설(소매점,휴게음식점 1
 
3.7%
다가구(10가구 1
 
3.7%
Other values (6) 6
22.2%
2024-03-16T13:13:22.419660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
14
 
7.1%
13
 
6.6%
12
 
6.1%
11
 
5.6%
11
 
5.6%
1 9
 
4.6%
9
 
4.6%
9
 
4.6%
8
 
4.1%
8
 
4.1%
Other values (29) 92
46.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 162
82.7%
Decimal Number 11
 
5.6%
Space Separator 7
 
3.6%
Other Punctuation 6
 
3.1%
Close Punctuation 5
 
2.6%
Open Punctuation 5
 
2.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
14
 
8.6%
13
 
8.0%
12
 
7.4%
11
 
6.8%
11
 
6.8%
9
 
5.6%
9
 
5.6%
8
 
4.9%
8
 
4.9%
8
 
4.9%
Other values (21) 59
36.4%
Decimal Number
ValueCountFrequency (%)
1 9
81.8%
2 1
 
9.1%
0 1
 
9.1%
Other Punctuation
ValueCountFrequency (%)
, 5
83.3%
/ 1
 
16.7%
Space Separator
ValueCountFrequency (%)
7
100.0%
Close Punctuation
ValueCountFrequency (%)
) 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 162
82.7%
Common 34
 
17.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
14
 
8.6%
13
 
8.0%
12
 
7.4%
11
 
6.8%
11
 
6.8%
9
 
5.6%
9
 
5.6%
8
 
4.9%
8
 
4.9%
8
 
4.9%
Other values (21) 59
36.4%
Common
ValueCountFrequency (%)
1 9
26.5%
7
20.6%
, 5
14.7%
) 5
14.7%
( 5
14.7%
2 1
 
2.9%
0 1
 
2.9%
/ 1
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 162
82.7%
ASCII 34
 
17.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
14
 
8.6%
13
 
8.0%
12
 
7.4%
11
 
6.8%
11
 
6.8%
9
 
5.6%
9
 
5.6%
8
 
4.9%
8
 
4.9%
8
 
4.9%
Other values (21) 59
36.4%
ASCII
ValueCountFrequency (%)
1 9
26.5%
7
20.6%
, 5
14.7%
) 5
14.7%
( 5
14.7%
2 1
 
2.9%
0 1
 
2.9%
/ 1
 
2.9%

세대수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)10.3%
Missing0
Missing (%)0.0%
Memory size444.0 B
<NA>
34 
1
 
3
0
 
1
13
 
1

Length

Max length4
Median length4
Mean length3.6410256
Min length1

Unique

Unique2 ?
Unique (%)5.1%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 34
87.2%
1 3
 
7.7%
0 1
 
2.6%
13 1
 
2.6%

Length

2024-03-16T13:13:22.630232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-16T13:13:22.964544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 34
87.2%
1 3
 
7.7%
0 1
 
2.6%
13 1
 
2.6%

가구수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct6
Distinct (%)17.6%
Missing5
Missing (%)12.8%
Infinite0
Infinite (%)0.0%
Mean2.7058824
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size483.0 B
2024-03-16T13:13:23.106354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q33
95-th percentile10
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.9800286
Coefficient of variation (CV)1.1013149
Kurtosis3.5582826
Mean2.7058824
Median Absolute Deviation (MAD)0
Skewness2.1065062
Sum92
Variance8.8805704
MonotonicityNot monotonic
2024-03-16T13:13:23.241585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 20
51.3%
3 9
23.1%
10 2
 
5.1%
12 1
 
2.6%
4 1
 
2.6%
9 1
 
2.6%
(Missing) 5
 
12.8%
ValueCountFrequency (%)
1 20
51.3%
3 9
23.1%
4 1
 
2.6%
9 1
 
2.6%
10 2
 
5.1%
12 1
 
2.6%
ValueCountFrequency (%)
12 1
 
2.6%
10 2
 
5.1%
9 1
 
2.6%
4 1
 
2.6%
3 9
23.1%
1 20
51.3%

Interactions

2024-03-16T13:13:19.900646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-16T13:13:23.358927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대지위치주용도부속용도세대수가구수
대지위치1.0001.0001.0001.0001.000
주용도1.0001.0001.000NaNNaN
부속용도1.0001.0001.0000.0000.000
세대수1.000NaN0.0001.000NaN
가구수1.000NaN0.000NaN1.000
2024-03-16T13:13:23.505383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
세대수주용도
세대수1.0001.000
주용도1.0001.000
2024-03-16T13:13:23.632605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가구수주용도세대수
가구수1.0001.000NaN
주용도1.0001.0001.000
세대수NaN1.0001.000

Missing values

2024-03-16T13:13:20.104599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-16T13:13:20.289935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-16T13:13:20.430445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

대지위치주용도부속용도세대수가구수
0대구광역시 동구 상매동 506숙박시설관광호텔<NA><NA>
1대구광역시 동구 율암동 1312단독주택다가구주택,제1종근린생활시설(소매점,휴게음식점)<NA>3
2대구광역시 동구 도동 445단독주택<NA><NA>1
3대구광역시 동구 진인동 산 203-7단독주택<NA><NA>1
4대구광역시 동구 봉무동 1042단독주택<NA><NA>1
5대구광역시 동구 신암동 603-405단독주택다가구(10가구), 제2종근린생활시설(사무소)<NA>10
6대구광역시 동구 각산동 921-11단독주택사무소01
7대구광역시 동구 신평동 760-38 외1필지단독주택<NA><NA>12
8대구광역시 동구 능성동 160-1단독주택<NA><NA>1
9대구광역시 동구 미곡동 215-16단독주택<NA><NA>1
대지위치주용도부속용도세대수가구수
29대구광역시 동구 지묘동 대구연경 공공주택지구 A1블록 9로트단독주택다가구주택,일반음식점<NA>3
30대구광역시 동구 신암동 183-42단독주택1가구<NA>1
31대구광역시 동구 각산동 1102-7단독주택다가구주택, 제1종근린생활시설(소매점)<NA>3
32대구광역시 동구 숙천동 380-2단독주택다가구및 근린생활시설<NA>3
33대구광역시 동구 각산동 370-24단독주택다가구주택<NA>9
34대구광역시 동구 덕곡동 217 외1필지단독주택단독주택13<NA>
35대구광역시 동구 중대동 655-2 외1필지단독주택<NA>1<NA>
36대구광역시 동구 숙천동 362-5단독주택단독<NA>1
37대구광역시 동구 송정동 730 외1필지단독주택<NA><NA>1
38대구광역시 동구 숙천동 376-7단독주택다가주주택/1종근생<NA>3

Duplicate rows

Most frequently occurring

대지위치주용도부속용도세대수가구수# duplicates
0대구광역시 동구 중대동 453-11단독주택<NA><NA>12