Overview

Dataset statistics

Number of variables7
Number of observations713
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory40.5 KiB
Average record size in memory58.2 B

Variable types

Categorical2
Text2
Numeric2
Unsupported1

Dataset

Description파일 다운로드
AuthorSH공사
URLhttps://data.seoul.go.kr/dataList/OA-12027/F/1/datasetView.do

Alerts

우편번호 is highly overall correlated with 자치구명High correlation
자치구명 is highly overall correlated with 우편번호High correlation
단지명 has unique valuesUnique
입주개시일 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-11 03:56:43.278883
Analysis finished2023-12-11 03:56:44.772803
Duration1.49 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

자치구명
Categorical

HIGH CORRELATION 

Distinct26
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
은평구
61 
강서구
50 
성북구
 
46
서초구
 
43
마포구
 
43
Other values (21)
470 

Length

Max length4
Median length3
Mean length3.1206171
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강남구
2nd row강남구
3rd row강남구
4th row강남구
5th row강남구

Common Values

ValueCountFrequency (%)
은평구 61
 
8.6%
강서구 50
 
7.0%
성북구 46
 
6.5%
서초구 43
 
6.0%
마포구 43
 
6.0%
강동구 42
 
5.9%
동대문구 40
 
5.6%
강남구 36
 
5.0%
송파구 33
 
4.6%
서대문구 33
 
4.6%
Other values (16) 286
40.1%

Length

2023-12-11T12:56:44.878773image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
은평구 61
 
8.6%
강서구 50
 
7.0%
성북구 46
 
6.5%
서초구 43
 
6.0%
마포구 43
 
6.0%
강동구 42
 
5.9%
동대문구 40
 
5.6%
강남구 36
 
5.0%
송파구 33
 
4.6%
서대문구 33
 
4.6%
Other values (16) 286
40.1%

단지명
Text

UNIQUE 

Distinct713
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
2023-12-11T12:56:45.181822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length30
Median length23
Mean length12.757363
Min length2

Characters and Unicode

Total characters9096
Distinct characters377
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique713 ?
Unique (%)100.0%

Sample

1st row대치1
2nd row수서1-1단지
3rd row수서6
4th row신사삼지래미안
5th row래미안그레이튼
ValueCountFrequency (%)
역세권 9
 
1.0%
두레주택 8
 
0.9%
종로구 6
 
0.6%
충신동 6
 
0.6%
연극인 6
 
0.6%
e편한세상 5
 
0.5%
마곡지구 5
 
0.5%
행복주택(다세대 5
 
0.5%
행복주택(빈집정비 5
 
0.5%
힐스테이트 4
 
0.4%
Other values (846) 865
93.6%
2023-12-11T12:56:45.756255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 492
 
5.4%
( 492
 
5.4%
1 292
 
3.2%
272
 
3.0%
2 221
 
2.4%
211
 
2.3%
- 191
 
2.1%
176
 
1.9%
149
 
1.6%
3 141
 
1.6%
Other values (367) 6459
71.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6368
70.0%
Decimal Number 1101
 
12.1%
Close Punctuation 492
 
5.4%
Open Punctuation 492
 
5.4%
Space Separator 211
 
2.3%
Dash Punctuation 191
 
2.1%
Uppercase Letter 128
 
1.4%
Lowercase Letter 79
 
0.9%
Other Punctuation 31
 
0.3%
Connector Punctuation 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
272
 
4.3%
176
 
2.8%
149
 
2.3%
140
 
2.2%
136
 
2.1%
132
 
2.1%
113
 
1.8%
106
 
1.7%
101
 
1.6%
97
 
1.5%
Other values (328) 4946
77.7%
Uppercase Letter
ValueCountFrequency (%)
S 24
18.8%
K 13
10.2%
H 12
9.4%
C 11
8.6%
M 11
8.6%
D 11
8.6%
V 10
7.8%
I 8
 
6.2%
E 7
 
5.5%
L 6
 
4.7%
Other values (7) 15
11.7%
Decimal Number
ValueCountFrequency (%)
1 292
26.5%
2 221
20.1%
3 141
12.8%
4 96
 
8.7%
6 78
 
7.1%
5 72
 
6.5%
7 62
 
5.6%
8 48
 
4.4%
9 46
 
4.2%
0 45
 
4.1%
Lowercase Letter
ValueCountFrequency (%)
e 28
35.4%
l 24
30.4%
i 12
15.2%
v 9
 
11.4%
h 3
 
3.8%
s 3
 
3.8%
Close Punctuation
ValueCountFrequency (%)
) 492
100.0%
Open Punctuation
ValueCountFrequency (%)
( 492
100.0%
Space Separator
ValueCountFrequency (%)
211
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 191
100.0%
Other Punctuation
ValueCountFrequency (%)
, 31
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6368
70.0%
Common 2521
 
27.7%
Latin 207
 
2.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
272
 
4.3%
176
 
2.8%
149
 
2.3%
140
 
2.2%
136
 
2.1%
132
 
2.1%
113
 
1.8%
106
 
1.7%
101
 
1.6%
97
 
1.5%
Other values (328) 4946
77.7%
Latin
ValueCountFrequency (%)
e 28
13.5%
S 24
11.6%
l 24
11.6%
K 13
 
6.3%
i 12
 
5.8%
H 12
 
5.8%
C 11
 
5.3%
M 11
 
5.3%
D 11
 
5.3%
V 10
 
4.8%
Other values (13) 51
24.6%
Common
ValueCountFrequency (%)
) 492
19.5%
( 492
19.5%
1 292
11.6%
2 221
8.8%
211
8.4%
- 191
 
7.6%
3 141
 
5.6%
4 96
 
3.8%
6 78
 
3.1%
5 72
 
2.9%
Other values (6) 235
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6368
70.0%
ASCII 2728
30.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 492
18.0%
( 492
18.0%
1 292
10.7%
2 221
8.1%
211
7.7%
- 191
 
7.0%
3 141
 
5.2%
4 96
 
3.5%
6 78
 
2.9%
5 72
 
2.6%
Other values (29) 442
16.2%
Hangul
ValueCountFrequency (%)
272
 
4.3%
176
 
2.8%
149
 
2.3%
140
 
2.2%
136
 
2.1%
132
 
2.1%
113
 
1.8%
106
 
1.7%
101
 
1.6%
97
 
1.5%
Other values (328) 4946
77.7%

주택유형
Categorical

Distinct28
Distinct (%)3.9%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
재개발
199 
국민,장기
106 
행복(재건축)
82 
재건축
72 
재개발,행복(리츠2호)
48 
Other values (23)
206 

Length

Max length20
Median length16
Mean length5.2215989
Min length2

Unique

Unique6 ?
Unique (%)0.8%

Sample

1st row영구임대
2nd row영구,공공
3rd row영구임대
4th row재건축
5th row재건축

Common Values

ValueCountFrequency (%)
재개발 199
27.9%
국민,장기 106
14.9%
행복(재건축) 82
11.5%
재건축 72
 
10.1%
재개발,행복(리츠2호) 48
 
6.7%
행복(공사 건설형) 32
 
4.5%
기타임대 32
 
4.5%
역세권청년 30
 
4.2%
국민 18
 
2.5%
공공임대 17
 
2.4%
Other values (18) 77
 
10.8%

Length

2023-12-11T12:56:45.977182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
재개발 199
26.6%
국민,장기 106
14.2%
행복(재건축 82
11.0%
재건축 72
 
9.6%
재개발,행복(리츠2호 48
 
6.4%
건설형 34
 
4.6%
행복(공사 32
 
4.3%
기타임대 32
 
4.3%
역세권청년 30
 
4.0%
국민 18
 
2.4%
Other values (19) 94
12.6%

세대수
Real number (ℝ)

Distinct538
Distinct (%)75.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean596.44881
Minimum2
Maximum9510
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2023-12-11T12:56:46.138200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile16
Q1173
median410
Q3799
95-th percentile1769.8
Maximum9510
Range9508
Interquartile range (IQR)626

Descriptive statistics

Standard deviation698.80742
Coefficient of variation (CV)1.1716134
Kurtosis41.582848
Mean596.44881
Median Absolute Deviation (MAD)283
Skewness4.5311634
Sum425268
Variance488331.81
MonotonicityNot monotonic
2023-12-11T12:56:46.326613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4 6
 
0.8%
6 5
 
0.7%
188 4
 
0.6%
30 4
 
0.6%
497 4
 
0.6%
15 4
 
0.6%
16 4
 
0.6%
48 4
 
0.6%
297 4
 
0.6%
87 3
 
0.4%
Other values (528) 671
94.1%
ValueCountFrequency (%)
2 1
 
0.1%
3 1
 
0.1%
4 6
0.8%
5 3
0.4%
6 5
0.7%
7 3
0.4%
8 2
 
0.3%
9 2
 
0.3%
10 3
0.4%
11 2
 
0.3%
ValueCountFrequency (%)
9510 1
0.1%
4932 1
0.1%
4300 1
0.1%
4066 1
0.1%
3658 1
0.1%
3410 1
0.1%
3293 1
0.1%
3045 1
0.1%
2998 1
0.1%
2652 1
0.1%

입주개시일
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size5.7 KiB

주소
Text

Distinct711
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
2023-12-11T12:56:46.764088image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length42
Median length34
Mean length26.409537
Min length14

Characters and Unicode

Total characters18830
Distinct characters384
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique709 ?
Unique (%)99.4%

Sample

1st row개포로109길 9 (개포동 12, 대치아파트)
2nd row양재대로55길 10 (일원동 711, 수서1단지 에스에이치빌)
3rd row광평로56길 11 (수서동 723, 수서6단지아파트)
4th row압구정로2길 20 (신사동 506-9, 래미안 신사)
5th row도곡로43길 20 (역삼동 763-16, 래미안 그레이튼)
ValueCountFrequency (%)
진관동 26
 
0.8%
신정동 19
 
0.6%
강일동 17
 
0.5%
20 15
 
0.4%
마곡동 15
 
0.4%
11 14
 
0.4%
봉천동 14
 
0.4%
천왕동 13
 
0.4%
50 13
 
0.4%
미아동 13
 
0.4%
Other values (2024) 3272
95.4%
2023-12-11T12:56:47.391259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2726
 
14.5%
1 1033
 
5.5%
793
 
4.2%
) 716
 
3.8%
( 715
 
3.8%
697
 
3.7%
2 648
 
3.4%
3 557
 
3.0%
, 477
 
2.5%
4 467
 
2.5%
Other values (374) 10001
53.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8784
46.6%
Decimal Number 5050
26.8%
Space Separator 2728
 
14.5%
Close Punctuation 716
 
3.8%
Open Punctuation 715
 
3.8%
Other Punctuation 477
 
2.5%
Dash Punctuation 277
 
1.5%
Uppercase Letter 68
 
0.4%
Lowercase Letter 12
 
0.1%
Letter Number 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
793
 
9.0%
697
 
7.9%
407
 
4.6%
298
 
3.4%
276
 
3.1%
253
 
2.9%
172
 
2.0%
166
 
1.9%
133
 
1.5%
128
 
1.5%
Other values (339) 5461
62.2%
Uppercase Letter
ValueCountFrequency (%)
S 11
16.2%
L 8
11.8%
K 7
10.3%
I 6
8.8%
C 6
8.8%
D 5
7.4%
M 5
7.4%
H 4
 
5.9%
V 4
 
5.9%
E 3
 
4.4%
Other values (4) 9
13.2%
Decimal Number
ValueCountFrequency (%)
1 1033
20.5%
2 648
12.8%
3 557
11.0%
4 467
9.2%
5 459
9.1%
7 431
8.5%
0 425
8.4%
6 407
 
8.1%
8 320
 
6.3%
9 303
 
6.0%
Lowercase Letter
ValueCountFrequency (%)
e 9
75.0%
l 2
 
16.7%
i 1
 
8.3%
Space Separator
ValueCountFrequency (%)
2726
99.9%
  2
 
0.1%
Letter Number
ValueCountFrequency (%)
2
66.7%
1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 716
100.0%
Open Punctuation
ValueCountFrequency (%)
( 715
100.0%
Other Punctuation
ValueCountFrequency (%)
, 477
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 277
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9963
52.9%
Hangul 8784
46.6%
Latin 83
 
0.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
793
 
9.0%
697
 
7.9%
407
 
4.6%
298
 
3.4%
276
 
3.1%
253
 
2.9%
172
 
2.0%
166
 
1.9%
133
 
1.5%
128
 
1.5%
Other values (339) 5461
62.2%
Latin
ValueCountFrequency (%)
S 11
13.3%
e 9
10.8%
L 8
9.6%
K 7
 
8.4%
I 6
 
7.2%
C 6
 
7.2%
D 5
 
6.0%
M 5
 
6.0%
H 4
 
4.8%
V 4
 
4.8%
Other values (9) 18
21.7%
Common
ValueCountFrequency (%)
2726
27.4%
1 1033
 
10.4%
) 716
 
7.2%
( 715
 
7.2%
2 648
 
6.5%
3 557
 
5.6%
, 477
 
4.8%
4 467
 
4.7%
5 459
 
4.6%
7 431
 
4.3%
Other values (6) 1734
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 10041
53.3%
Hangul 8784
46.6%
Number Forms 3
 
< 0.1%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2726
27.1%
1 1033
 
10.3%
) 716
 
7.1%
( 715
 
7.1%
2 648
 
6.5%
3 557
 
5.5%
, 477
 
4.8%
4 467
 
4.7%
5 459
 
4.6%
7 431
 
4.3%
Other values (22) 1812
18.0%
Hangul
ValueCountFrequency (%)
793
 
9.0%
697
 
7.9%
407
 
4.6%
298
 
3.4%
276
 
3.1%
253
 
2.9%
172
 
2.0%
166
 
1.9%
133
 
1.5%
128
 
1.5%
Other values (339) 5461
62.2%
None
ValueCountFrequency (%)
  2
100.0%
Number Forms
ValueCountFrequency (%)
2
66.7%
1
33.3%

우편번호
Real number (ℝ)

HIGH CORRELATION 

Distinct584
Distinct (%)81.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4981.9776
Minimum1005
Maximum11727
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 KiB
2023-12-11T12:56:47.536281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1005
5-th percentile1665.8
Q13166
median4734
Q36919
95-th percentile8365
Maximum11727
Range10722
Interquartile range (IQR)3753

Descriptive statistics

Standard deviation2224.5544
Coefficient of variation (CV)0.44652036
Kurtosis-0.99346588
Mean4981.9776
Median Absolute Deviation (MAD)1936
Skewness0.15993934
Sum3552150
Variance4948642.4
MonotonicityNot monotonic
2023-12-11T12:56:47.709602image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8368 7
 
1.0%
3098 6
 
0.8%
3303 5
 
0.7%
5748 5
 
0.7%
6764 5
 
0.7%
3310 4
 
0.6%
8365 4
 
0.6%
8364 4
 
0.6%
8350 4
 
0.6%
3304 4
 
0.6%
Other values (574) 665
93.3%
ValueCountFrequency (%)
1005 1
0.1%
1029 1
0.1%
1040 1
0.1%
1096 1
0.1%
1127 1
0.1%
1129 1
0.1%
1133 1
0.1%
1155 1
0.1%
1180 1
0.1%
1183 1
0.1%
ValueCountFrequency (%)
11727 2
0.3%
11650 1
0.1%
8865 1
0.1%
8861 1
0.1%
8830 2
0.3%
8800 1
0.1%
8788 1
0.1%
8775 1
0.1%
8757 1
0.1%
8754 1
0.1%

Interactions

2023-12-11T12:56:44.210152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:56:43.973220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:56:44.328449image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T12:56:44.076945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T12:56:47.809377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자치구명주택유형세대수우편번호
자치구명1.0000.6120.0000.991
주택유형0.6121.0000.2790.467
세대수0.0000.2791.0000.068
우편번호0.9910.4670.0681.000
2023-12-11T12:56:47.901401image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
주택유형자치구명
주택유형1.0000.188
자치구명0.1881.000
2023-12-11T12:56:48.008008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
세대수우편번호자치구명주택유형
세대수1.0000.1040.0000.106
우편번호0.1041.0000.9240.190
자치구명0.0000.9241.0000.188
주택유형0.1060.1900.1881.000

Missing values

2023-12-11T12:56:44.520503image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T12:56:44.701336image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

자치구명단지명주택유형세대수입주개시일주소우편번호
0강남구대치1영구임대16231992-01-15 00:00:00개포로109길 9 (개포동 12, 대치아파트)6335
1강남구수서1-1단지영구,공공22141992-11-01 00:00:00양재대로55길 10 (일원동 711, 수서1단지 에스에이치빌)6341
2강남구수서6영구임대15081992-12-01 00:00:00광평로56길 11 (수서동 723, 수서6단지아파트)6368
3강남구신사삼지래미안재건축632009-03-20 00:00:00압구정로2길 20 (신사동 506-9, 래미안 신사)6027
4강남구래미안그레이튼재건축4762010-03-05 00:00:00도곡로43길 20 (역삼동 763-16, 래미안 그레이튼)6219
5강남구래미안그레이튼2차재건축4642011-01-20 00:00:00도곡로43길 21 (역삼동 762-3, 래미안그레이튼)6219
6강남구강남신동아파밀리에1단지(세곡리엔파크1단지)국민,장기3952011-03-01 00:00:00헌릉로590길 100 (세곡동 508, 세곡리엔파크1단지)6366
7강남구강남신동아파밀리에2단지(세곡리엔파크2단지)국민,장기4102011-03-01 00:00:00헌릉로590길 10 (세곡동 511, 세곡리엔파크2단지)6365
8강남구세곡리엔파크3단지국민,장기3632011-03-01 00:00:00헌릉로590길 11 (세곡동 516, 세곡리엔파크3단지)6365
9강남구세곡리엔파크4단지국민,장기4072011-06-22 00:00:00헌릉로590길 88 (세곡동 522, 세곡리엔파크4단지)6366
자치구명단지명주택유형세대수입주개시일주소우편번호
703중랑구우디안 4단지(신내3지구 4단지)행복(공사 건설형)2892018-09-28 00:00:00신내역로 160 (신내동 831)2263
704중랑구신내 글로리움(신내동 640)행복(공사 건설형)2292019-10-17 00:00:00신내로 208 (신내동 640)2052
705중랑구한양수자인 사가정파크(면목1)행복(재건축)4972019-12-16 00:00:00사가정로72길 26 (면목동 1543)2260
706중랑구사가정센트럴아이파크(면목3)행복(재건축)15052020-08-01 00:00:00동일로 92길 40 (면목동 1545)2226
707중랑구면목라온프라이빗(면목5)행복(재건축)4532020-12-28 00:00:00동일로91길 23 (면목동 171-7)2234
708중랑구제이스타상봉역세권청년62021-12-15 00:00:00봉우재로 111 (상봉동 109-34)2138
709중랑구블리스다임빌(상봉동 104-33)행복(재건축)122022-01-03 00:00:00면목로92가길 15-11 (상봉동 104-33)2154
710중랑구칼튼테라스(묵동 176-39외 3필지)역세권청년242022-05-31 00:00:00공릉로2길 8-4 (묵동 176-39)2037
711중랑구상봉태솔행복(재건축)442023-01-09 00:00:00봉우재로 41길 29 (상봉동 105-97)2153
712중랑구용마산모아엘가파크포레행복(재건축)2432023-01-31 00:00:00용마산로 84길 9-16 (면목동 55-14)2193