Overview

Dataset statistics

Number of variables6
Number of observations27
Missing cells3
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 KiB
Average record size in memory53.9 B

Variable types

Text2
Numeric1
DateTime1
Categorical2

Dataset

Description경기주택도시공사 경기도 행복주택 현황
Author경기주택도시공사
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=QTK09GUQ7B70TW8QCWRV30324653&infSeq=1

Alerts

데이터기준일자 has constant value ""Constant
공급세대수 is highly overall correlated with 유형High correlation
유형 is highly overall correlated with 공급세대수High correlation
준공일자 has 3 (11.1%) missing valuesMissing
사업지구명 has unique valuesUnique
위치정보 has unique valuesUnique

Reproduction

Analysis started2024-03-23 02:37:12.622745
Analysis finished2024-03-23 02:37:13.951364
Duration1.33 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

사업지구명
Text

UNIQUE 

Distinct27
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size348.0 B
2024-03-23T02:37:14.245958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length8
Mean length5.962963
Min length3

Characters and Unicode

Total characters161
Distinct characters78
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)100.0%

Sample

1st row평택 BIX
2nd row광교 원천
3rd row동탄 호수공원
4th row성남 판교
5th row오산 가장
ValueCountFrequency (%)
화성 2
 
4.2%
광교 2
 
4.2%
bix 2
 
4.2%
양평 2
 
4.2%
수원 2
 
4.2%
고덕 1
 
2.1%
남한강 1
 
2.1%
연천 1
 
2.1%
용인영덕(중고층 1
 
2.1%
모듈러 1
 
2.1%
Other values (33) 33
68.8%
2024-03-23T02:37:15.246783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
21
 
13.0%
6
 
3.7%
4
 
2.5%
4
 
2.5%
4
 
2.5%
2 4
 
2.5%
4
 
2.5%
4
 
2.5%
4
 
2.5%
4
 
2.5%
Other values (68) 102
63.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 120
74.5%
Space Separator 21
 
13.0%
Decimal Number 9
 
5.6%
Uppercase Letter 9
 
5.6%
Close Punctuation 1
 
0.6%
Open Punctuation 1
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6
 
5.0%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
Other values (57) 79
65.8%
Decimal Number
ValueCountFrequency (%)
2 4
44.4%
1 2
22.2%
5 2
22.2%
0 1
 
11.1%
Uppercase Letter
ValueCountFrequency (%)
A 3
33.3%
B 2
22.2%
I 2
22.2%
X 2
22.2%
Space Separator
ValueCountFrequency (%)
21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 120
74.5%
Common 32
 
19.9%
Latin 9
 
5.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6
 
5.0%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
Other values (57) 79
65.8%
Common
ValueCountFrequency (%)
21
65.6%
2 4
 
12.5%
1 2
 
6.2%
5 2
 
6.2%
) 1
 
3.1%
( 1
 
3.1%
0 1
 
3.1%
Latin
ValueCountFrequency (%)
A 3
33.3%
B 2
22.2%
I 2
22.2%
X 2
22.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 120
74.5%
ASCII 41
 
25.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21
51.2%
2 4
 
9.8%
A 3
 
7.3%
1 2
 
4.9%
B 2
 
4.9%
5 2
 
4.9%
I 2
 
4.9%
X 2
 
4.9%
) 1
 
2.4%
( 1
 
2.4%
Hangul
ValueCountFrequency (%)
6
 
5.0%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
4
 
3.3%
3
 
2.5%
Other values (57) 79
65.8%

위치정보
Text

UNIQUE 

Distinct27
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size348.0 B
2024-03-23T02:37:15.728163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length24
Mean length20.333333
Min length14

Characters and Unicode

Total characters549
Distinct characters106
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)100.0%

Sample

1st row경기도 평택시 포승읍 황해희곡6로 36
2nd row경기도 수원시 영통구 광교중앙로49번길 40
3rd row경기도 화성시 동탄순환대로10길 20
4th row경기도 성남시 분당구 판교로319번길 14
5th row경기도 오산시 가장산업동로 38
ValueCountFrequency (%)
경기도 27
 
21.4%
화성시 4
 
3.2%
성남시 3
 
2.4%
수원시 3
 
2.4%
영통구 3
 
2.4%
38 2
 
1.6%
남양주시 2
 
1.6%
평택시 2
 
1.6%
40 2
 
1.6%
용인시 2
 
1.6%
Other values (73) 76
60.3%
2024-03-23T02:37:16.677630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
99
 
18.0%
28
 
5.1%
28
 
5.1%
27
 
4.9%
25
 
4.6%
1 24
 
4.4%
18
 
3.3%
0 15
 
2.7%
4 15
 
2.7%
11
 
2.0%
Other values (96) 259
47.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 333
60.7%
Decimal Number 103
 
18.8%
Space Separator 99
 
18.0%
Uppercase Letter 6
 
1.1%
Dash Punctuation 5
 
0.9%
Open Punctuation 1
 
0.2%
Close Punctuation 1
 
0.2%
Other Punctuation 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
28
 
8.4%
28
 
8.4%
27
 
8.1%
25
 
7.5%
18
 
5.4%
11
 
3.3%
11
 
3.3%
9
 
2.7%
8
 
2.4%
8
 
2.4%
Other values (76) 160
48.0%
Decimal Number
ValueCountFrequency (%)
1 24
23.3%
0 15
14.6%
4 15
14.6%
3 10
9.7%
9 9
 
8.7%
7 7
 
6.8%
2 7
 
6.8%
8 6
 
5.8%
6 6
 
5.8%
5 4
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
B 2
33.3%
I 1
16.7%
X 1
16.7%
A 1
16.7%
L 1
16.7%
Space Separator
ValueCountFrequency (%)
99
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 333
60.7%
Common 210
38.3%
Latin 6
 
1.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
28
 
8.4%
28
 
8.4%
27
 
8.1%
25
 
7.5%
18
 
5.4%
11
 
3.3%
11
 
3.3%
9
 
2.7%
8
 
2.4%
8
 
2.4%
Other values (76) 160
48.0%
Common
ValueCountFrequency (%)
99
47.1%
1 24
 
11.4%
0 15
 
7.1%
4 15
 
7.1%
3 10
 
4.8%
9 9
 
4.3%
7 7
 
3.3%
2 7
 
3.3%
8 6
 
2.9%
6 6
 
2.9%
Other values (5) 12
 
5.7%
Latin
ValueCountFrequency (%)
B 2
33.3%
I 1
16.7%
X 1
16.7%
A 1
16.7%
L 1
16.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 333
60.7%
ASCII 216
39.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
99
45.8%
1 24
 
11.1%
0 15
 
6.9%
4 15
 
6.9%
3 10
 
4.6%
9 9
 
4.2%
7 7
 
3.2%
2 7
 
3.2%
8 6
 
2.8%
6 6
 
2.8%
Other values (10) 18
 
8.3%
Hangul
ValueCountFrequency (%)
28
 
8.4%
28
 
8.4%
27
 
8.1%
25
 
7.5%
18
 
5.4%
11
 
3.3%
11
 
3.3%
9
 
2.7%
8
 
2.4%
8
 
2.4%
Other values (76) 160
48.0%

공급세대수
Real number (ℝ)

HIGH CORRELATION 

Distinct22
Distinct (%)81.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean348.62963
Minimum14
Maximum2078
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size375.0 B
2024-03-23T02:37:17.073013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile15.3
Q150
median106
Q3315
95-th percentile1348.5
Maximum2078
Range2064
Interquartile range (IQR)265

Descriptive statistics

Standard deviation505.90354
Coefficient of variation (CV)1.4511203
Kurtosis4.8916296
Mean348.62963
Median Absolute Deviation (MAD)91
Skewness2.2211786
Sum9413
Variance255938.4
MonotonicityNot monotonic
2024-03-23T02:37:17.519493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
50 3
 
11.1%
300 3
 
11.1%
100 2
 
7.4%
330 1
 
3.7%
1500 1
 
3.7%
232 1
 
3.7%
500 1
 
3.7%
131 1
 
3.7%
2078 1
 
3.7%
800 1
 
3.7%
Other values (12) 12
44.4%
ValueCountFrequency (%)
14 1
 
3.7%
15 1
 
3.7%
16 1
 
3.7%
40 1
 
3.7%
42 1
 
3.7%
49 1
 
3.7%
50 3
11.1%
56 1
 
3.7%
85 1
 
3.7%
100 2
7.4%
ValueCountFrequency (%)
2078 1
 
3.7%
1500 1
 
3.7%
995 1
 
3.7%
970 1
 
3.7%
800 1
 
3.7%
500 1
 
3.7%
330 1
 
3.7%
300 3
11.1%
232 1
 
3.7%
204 1
 
3.7%

준공일자
Date

MISSING 

Distinct23
Distinct (%)95.8%
Missing3
Missing (%)11.1%
Memory size348.0 B
Minimum2017-12-21 00:00:00
Maximum2023-05-04 00:00:00
2024-03-23T02:37:17.857149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T02:37:18.235889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)

유형
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)18.5%
Missing0
Missing (%)0.0%
Memory size348.0 B
청년
10 
신혼부부
산업단지형
실버형
 
1
청년, 대학생, 신혼부부 등
 
1

Length

Max length15
Median length5
Mean length3.8518519
Min length2

Unique

Unique2 ?
Unique (%)7.4%

Sample

1st row산업단지형
2nd row청년
3rd row신혼부부
4th row산업단지형
5th row산업단지형

Common Values

ValueCountFrequency (%)
청년 10
37.0%
신혼부부 9
33.3%
산업단지형 6
22.2%
실버형 1
 
3.7%
청년, 대학생, 신혼부부 등 1
 
3.7%

Length

2024-03-23T02:37:18.783809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T02:37:19.100682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
청년 11
36.7%
신혼부부 10
33.3%
산업단지형 6
20.0%
실버형 1
 
3.3%
대학생 1
 
3.3%
1
 
3.3%

데이터기준일자
Categorical

CONSTANT 

Distinct1
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size348.0 B
2024-03-05
27 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2024-03-05
2nd row2024-03-05
3rd row2024-03-05
4th row2024-03-05
5th row2024-03-05

Common Values

ValueCountFrequency (%)
2024-03-05 27
100.0%

Length

2024-03-23T02:37:19.474414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T02:37:19.778045image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2024-03-05 27
100.0%

Interactions

2024-03-23T02:37:13.052215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-23T02:37:19.968004image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
사업지구명위치정보공급세대수준공일자유형
사업지구명1.0001.0001.0001.0001.000
위치정보1.0001.0001.0001.0001.000
공급세대수1.0001.0001.0001.0000.701
준공일자1.0001.0001.0001.0001.000
유형1.0001.0000.7011.0001.000
2024-03-23T02:37:20.243071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
공급세대수유형
공급세대수1.0000.518
유형0.5181.000

Missing values

2024-03-23T02:37:13.405135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-23T02:37:13.840015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업지구명위치정보공급세대수준공일자유형데이터기준일자
0평택 BIX경기도 평택시 포승읍 황해희곡6로 363302021-04-30산업단지형2024-03-05
1광교 원천경기도 수원시 영통구 광교중앙로49번길 403002020-10-27청년2024-03-05
2동탄 호수공원경기도 화성시 동탄순환대로10길 209952020-10-16신혼부부2024-03-05
3성남 판교경기도 성남시 분당구 판교로319번길 143002020-08-10산업단지형2024-03-05
4오산 가장경기도 오산시 가장산업동로 38502020-02-04산업단지형2024-03-05
5의왕역경기도 의왕시 부곡시장1길 38502020-01-22청년2024-03-05
6다산역A2경기도 남양주시 다산중앙로145번길 369702019-09-02신혼부부2024-03-05
7파주병원복합경기도 파주시 황골로 90502019-08-22실버형2024-03-05
8성남하대원경기도 성남시 중원구 둔촌대로217번길 4142019-06-26청년2024-03-05
9가평청사복합경기도 가평군 가평읍 석봉로191번길 10422019-06-17청년2024-03-05
사업지구명위치정보공급세대수준공일자유형데이터기준일자
17양평 남한강경기도 양평군 창대리 701, 700-349<NA>신혼부부2024-03-05
18연천 BIX경기도 연천군 통현리 812번지(연천BIX 주거1)100<NA>산업단지형2024-03-05
19용인영덕(중고층 모듈러)경기도 용인시 기흥구 흥덕2로 131062023-05-04청년2024-03-05
20용인 죽전경기도 용인시 수지구 죽전동 494-5852022-08-01청년2024-03-05
21고덕 서정리역경기도 평택시 고덕갈평3로 408002022-05-31신혼부부2024-03-05
22다산지금 A5경기도 남양주시 다산동 611020782022-04-01신혼부부2024-03-05
23판교2밸리경기도 성남시 수정구 금토동 411-63002022-01-01산업단지형2024-03-05
24하남 덕풍경기도 하남시 덕풍동로 351312021-12-01신혼부부2024-03-05
25경기 광주역경기도 광주시 역동 169-115002021-11-01신혼부부2024-03-05
26안산 스마트허브경기도 안산시 단원구 산단로 942322021-04-30산업단지형2024-03-05