Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows7
Duplicate rows (%)0.1%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Categorical3
Text1
Numeric3

Dataset

Description대전광역시 동구 개별주택가격정보에 관한 데이터로서,지번, 대지면적, 건물연면적, 건물구조 및 주택가격 등에 관한 정보를 포함하고 있습니다.
Author대전광역시 동구
URLhttps://www.data.go.kr/data/15013449/fileData.do

Alerts

Dataset has 7 (0.1%) duplicate rowsDuplicates
건물연면적 is highly overall correlated with 개별주택가격High correlation
개별주택가격 is highly overall correlated with 건물연면적High correlation
주택소재지 is highly overall correlated with 용도지역High correlation
용도지역 is highly overall correlated with 주택소재지High correlation
용도지역 is highly imbalanced (67.7%)Imbalance
대지면적 is highly skewed (γ1 = 50.82520388)Skewed

Reproduction

Analysis started2023-12-12 11:24:37.033405
Analysis finished2023-12-12 11:24:39.822370
Duration2.79 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

주택소재지
Categorical

HIGH CORRELATION 

Distinct43
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
가양동
2074 
삼성동
916 
자양동
884 
용운동
796 
대동
698 
Other values (38)
4632 

Length

Max length3
Median length3
Mean length2.8733
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가양동
2nd row자양동
3rd row장척동
4th row소제동
5th row삼성동

Common Values

ValueCountFrequency (%)
가양동 2074
20.7%
삼성동 916
9.2%
자양동 884
8.8%
용운동 796
 
8.0%
대동 698
 
7.0%
용전동 672
 
6.7%
성남동 625
 
6.2%
판암동 424
 
4.2%
홍도동 394
 
3.9%
소제동 351
 
3.5%
Other values (33) 2166
21.7%

Length

2023-12-12T20:24:39.981116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
가양동 2074
20.7%
삼성동 916
9.2%
자양동 884
8.8%
용운동 796
 
8.0%
대동 698
 
7.0%
용전동 672
 
6.7%
성남동 625
 
6.2%
판암동 424
 
4.2%
홍도동 394
 
3.9%
소제동 351
 
3.5%
Other values (33) 2166
21.7%

지번
Text

Distinct8156
Distinct (%)81.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T20:24:40.619690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length6
Mean length5.3209
Min length3

Characters and Unicode

Total characters53209
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6699 ?
Unique (%)67.0%

Sample

1st row445-27
2nd row188-14
3rd row35-0
4th row299-151
5th row314-10
ValueCountFrequency (%)
10-4 14
 
0.1%
104-1 11
 
0.1%
10-19 9
 
0.1%
389-1 8
 
0.1%
60-1 8
 
0.1%
97-3 7
 
0.1%
40-1 6
 
0.1%
14-4 6
 
0.1%
142-1 5
 
< 0.1%
299-47 5
 
< 0.1%
Other values (8146) 9921
99.2%
2023-12-12T20:24:41.665857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 10000
18.8%
1 8124
15.3%
2 5762
10.8%
3 5337
10.0%
4 4424
8.3%
5 3989
 
7.5%
0 3652
 
6.9%
9 3056
 
5.7%
7 2978
 
5.6%
6 2974
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 43209
81.2%
Dash Punctuation 10000
 
18.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 8124
18.8%
2 5762
13.3%
3 5337
12.4%
4 4424
10.2%
5 3989
9.2%
0 3652
8.5%
9 3056
 
7.1%
7 2978
 
6.9%
6 2974
 
6.9%
8 2913
 
6.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 53209
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 10000
18.8%
1 8124
15.3%
2 5762
10.8%
3 5337
10.0%
4 4424
8.3%
5 3989
 
7.5%
0 3652
 
6.9%
9 3056
 
5.7%
7 2978
 
5.6%
6 2974
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 53209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 10000
18.8%
1 8124
15.3%
2 5762
10.8%
3 5337
10.0%
4 4424
8.3%
5 3989
 
7.5%
0 3652
 
6.9%
9 3056
 
5.7%
7 2978
 
5.6%
6 2974
 
5.6%

대지면적
Real number (ℝ)

SKEWED 

Distinct3029
Distinct (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean792.00183
Minimum10.6
Maximum1334568
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T20:24:41.972695image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10.6
5-th percentile88.2
Q1147.8
median193.7
Q3265.4
95-th percentile621.81
Maximum1334568
Range1334557.4
Interquartile range (IQR)117.6

Descriptive statistics

Standard deviation18341.234
Coefficient of variation (CV)23.158071
Kurtosis3137.6474
Mean792.00183
Median Absolute Deviation (MAD)53.7
Skewness50.825204
Sum7920018.3
Variance3.3640088 × 108
MonotonicityNot monotonic
2023-12-12T20:24:42.253269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
165.0 45
 
0.4%
132.0 45
 
0.4%
119.0 42
 
0.4%
109.0 38
 
0.4%
129.0 38
 
0.4%
122.0 37
 
0.4%
116.0 37
 
0.4%
195.0 35
 
0.4%
152.0 35
 
0.4%
136.0 35
 
0.4%
Other values (3019) 9613
96.1%
ValueCountFrequency (%)
10.6 1
 
< 0.1%
13.0 1
 
< 0.1%
13.2 4
< 0.1%
15.5 1
 
< 0.1%
16.8 1
 
< 0.1%
17.5 1
 
< 0.1%
18.5 1
 
< 0.1%
19.8 1
 
< 0.1%
20.0 3
< 0.1%
20.8 2
< 0.1%
ValueCountFrequency (%)
1334568.0 1
 
< 0.1%
600992.0 1
 
< 0.1%
513119.0 3
 
< 0.1%
393168.0 1
 
< 0.1%
339405.0 2
 
< 0.1%
139185.0 1
 
< 0.1%
77885.0 1
 
< 0.1%
73062.0 3
 
< 0.1%
56459.0 1
 
< 0.1%
24119.0 8
0.1%

건물연면적
Real number (ℝ)

HIGH CORRELATION 

Distinct7996
Distinct (%)80.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean195.46797
Minimum5
Maximum3193.59
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T20:24:42.519065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile38.7745
Q179.9575
median126.565
Q3216.085
95-th percentile606.7915
Maximum3193.59
Range3188.59
Interquartile range (IQR)136.1275

Descriptive statistics

Standard deviation198.28586
Coefficient of variation (CV)1.0144161
Kurtosis17.429879
Mean195.46797
Median Absolute Deviation (MAD)56.885
Skewness2.9947942
Sum1954679.7
Variance39317.282
MonotonicityNot monotonic
2023-12-12T20:24:42.781054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
26.45 18
 
0.2%
26.4 15
 
0.1%
45.61 12
 
0.1%
42.98 9
 
0.1%
49.59 9
 
0.1%
33.06 9
 
0.1%
66.12 9
 
0.1%
46.28 9
 
0.1%
59.5 8
 
0.1%
56.2 8
 
0.1%
Other values (7986) 9894
98.9%
ValueCountFrequency (%)
5.0 1
< 0.1%
5.6 1
< 0.1%
8.4 2
< 0.1%
9.5 1
< 0.1%
9.92 1
< 0.1%
10.45 1
< 0.1%
10.7 1
< 0.1%
11.0 2
< 0.1%
11.01 1
< 0.1%
11.4 1
< 0.1%
ValueCountFrequency (%)
3193.59 1
< 0.1%
3149.16 1
< 0.1%
2142.24 1
< 0.1%
1844.53 1
< 0.1%
1621.74 1
< 0.1%
1612.4 1
< 0.1%
1497.84 1
< 0.1%
1478.13 1
< 0.1%
1471.11 1
< 0.1%
1468.23 1
< 0.1%

용도지역
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
주거지역
8536 
자연환경보전지역
 
803
상업지역
 
402
개발제한구역
 
151
관리지역
 
107

Length

Max length8
Median length4
Mean length4.3514
Min length4

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row주거지역
2nd row주거지역
3rd row개발제한구역
4th row주거지역
5th row주거지역

Common Values

ValueCountFrequency (%)
주거지역 8536
85.4%
자연환경보전지역 803
 
8.0%
상업지역 402
 
4.0%
개발제한구역 151
 
1.5%
관리지역 107
 
1.1%
녹지지역 1
 
< 0.1%

Length

2023-12-12T20:24:43.061596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T20:24:43.295298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
주거지역 8536
85.4%
자연환경보전지역 803
 
8.0%
상업지역 402
 
4.0%
개발제한구역 151
 
1.5%
관리지역 107
 
1.1%
녹지지역 1
 
< 0.1%

건물구조
Categorical

Distinct20
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
연와
3900 
철근
2303 
벽돌
1760 
블록
764 
743 
Other values (15)
530 

Length

Max length3
Median length2
Mean length1.9264
Min length1

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row연와
2nd row연와
3rd row경철
4th row벽돌
5th row연와

Common Values

ValueCountFrequency (%)
연와 3900
39.0%
철근 2303
23.0%
벽돌 1760
17.6%
블록 764
 
7.6%
743
 
7.4%
경철 234
 
2.3%
석회 177
 
1.8%
철골 46
 
0.5%
목구 31
 
0.3%
조판 14
 
0.1%
Other values (10) 28
 
0.3%

Length

2023-12-12T20:24:43.546561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
연와 3900
39.0%
철근 2303
23.0%
벽돌 1760
17.6%
블록 764
 
7.6%
743
 
7.4%
경철 234
 
2.3%
석회 177
 
1.8%
철골 46
 
0.5%
목구 31
 
0.3%
조판 14
 
0.1%
Other values (10) 28
 
0.3%

개별주택가격
Real number (ℝ)

HIGH CORRELATION 

Distinct1557
Distinct (%)15.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6621418 × 108
Minimum992000
Maximum1.177 × 109
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T20:24:43.805752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum992000
5-th percentile36000000
Q182600000
median1.18 × 108
Q31.81 × 108
95-th percentile4.65 × 108
Maximum1.177 × 109
Range1.176008 × 109
Interquartile range (IQR)98400000

Descriptive statistics

Standard deviation1.4799819 × 108
Coefficient of variation (CV)0.89040652
Kurtosis6.932132
Mean1.6621418 × 108
Median Absolute Deviation (MAD)42700000
Skewness2.41107
Sum1.6621418 × 1012
Variance2.1903463 × 1016
MonotonicityNot monotonic
2023-12-12T20:24:44.071202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
107000000 82
 
0.8%
104000000 79
 
0.8%
103000000 78
 
0.8%
118000000 77
 
0.8%
102000000 77
 
0.8%
114000000 73
 
0.7%
110000000 73
 
0.7%
111000000 72
 
0.7%
100000000 71
 
0.7%
124000000 70
 
0.7%
Other values (1547) 9248
92.5%
ValueCountFrequency (%)
992000 1
< 0.1%
1000000 1
< 0.1%
1010000 1
< 0.1%
1070000 2
< 0.1%
1200000 1
< 0.1%
1390000 1
< 0.1%
1410000 1
< 0.1%
1520000 1
< 0.1%
1670000 1
< 0.1%
1690000 1
< 0.1%
ValueCountFrequency (%)
1177000000 1
< 0.1%
1106000000 1
< 0.1%
1071000000 1
< 0.1%
1070000000 1
< 0.1%
1054000000 1
< 0.1%
1040000000 1
< 0.1%
1028000000 1
< 0.1%
1022000000 1
< 0.1%
1007000000 1
< 0.1%
1006000000 1
< 0.1%

Interactions

2023-12-12T20:24:38.944391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:37.920618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:38.390964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:39.171736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:38.084967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:38.559736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:39.328102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:38.245962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T20:24:38.723544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T20:24:44.265286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주택소재지대지면적건물연면적용도지역건물구조개별주택가격
주택소재지1.0000.3090.3050.9230.5800.401
대지면적0.3091.0000.0000.2420.0850.000
건물연면적0.3050.0001.0000.1310.5770.614
용도지역0.9230.2420.1311.0000.4090.152
건물구조0.5800.0850.5770.4091.0000.663
개별주택가격0.4010.0000.6140.1520.6631.000
2023-12-12T20:24:44.470753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주택소재지용도지역건물구조
주택소재지1.0000.7050.178
용도지역0.7051.0000.203
건물구조0.1780.2031.000
2023-12-12T20:24:44.655448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대지면적건물연면적개별주택가격주택소재지용도지역건물구조
대지면적1.0000.4020.4750.1370.0900.039
건물연면적0.4021.0000.6190.1210.0730.272
개별주택가격0.4750.6191.0000.1490.0800.270
주택소재지0.1370.1210.1491.0000.7050.178
용도지역0.0900.0730.0800.7051.0000.203
건물구조0.0390.2720.2700.1780.2031.000

Missing values

2023-12-12T20:24:39.532716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T20:24:39.722556image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

주택소재지지번대지면적건물연면적용도지역건물구조개별주택가격
9801가양동445-27198.099.09주거지역연와116000000
5509자양동188-14118.5136.01주거지역연와103000000
17111장척동35-0579.038.24개발제한구역경철41000000
6430소제동299-151228.051.5주거지역벽돌199000000
14141삼성동314-10252.9225.43주거지역연와193000000
15099삼성동396-17153.480.32주거지역연와137000000
17373삼괴동688-0294.081.86자연환경보전지역목구36000000
3233용운동700-0244.0451.49주거지역철근369000000
5674자양동199-6313.7597.45주거지역철근445000000
8638가양동317-21173.7172.23주거지역연와134000000
주택소재지지번대지면적건물연면적용도지역건물구조개별주택가격
6630소제동305-20999.058.48주거지역블록74800000
154인동101-14198.3264.27주거지역벽돌153000000
762신흥동12-1165.4143.4주거지역연와110000000
6099신안동254-6102.025.32주거지역93600000
13499삼성동100-12218.0119.6주거지역벽돌241000000
7271가양동47-15191.7148.69주거지역연와122000000
1794용운동178-6222.1548.18주거지역철근326000000
13797삼성동282-6106.4136.72상업지역벽돌29200000
3285대동1-45883.0102.6주거지역연와61700000
1612판암동630-6615.047.58자연환경보전지역철골206000000

Duplicate rows

Most frequently occurring

주택소재지지번대지면적건물연면적용도지역건물구조개별주택가격# duplicates
1삼성동389-1310.750.0주거지역철근208000004
2삼성동389-1310.753.6주거지역철근211000003
6자양동97-3479.122.3주거지역블록371000003
0대별동284-11884.029.75자연환경보전지역421000002
3신안동245-15109.028.8주거지역블록928000002
4인동40-1162.666.12상업지역292000002
5자양동281-41218.070.72주거지역벽돌819000002