Overview

Dataset statistics

Number of variables8
Number of observations3084
Missing cells6146
Missing cells (%)24.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory192.9 KiB
Average record size in memory64.0 B

Variable types

Categorical3
Text2
DateTime3

Dataset

Description대전광역시 부동산중개업(상호, 소재지, 등록일자, 운영상태 등) 현황 입니다.
Author대전광역시
URLhttps://www.data.go.kr/data/15073578/fileData.do

Alerts

시도 has constant value ""Constant
상태 is highly imbalanced (97.3%)Imbalance
행정처분 시작일자 has 3073 (99.6%) missing valuesMissing
행정처분 종료일자 has 3073 (99.6%) missing valuesMissing

Reproduction

Analysis started2023-12-12 12:33:21.336070
Analysis finished2023-12-12 12:33:22.552828
Duration1.22 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시도
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
대전광역시
3084 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row대전광역시
2nd row대전광역시
3rd row대전광역시
4th row대전광역시
5th row대전광역시

Common Values

ValueCountFrequency (%)
대전광역시 3084
100.0%

Length

2023-12-12T21:33:22.640125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:33:22.738134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
대전광역시 3084
100.0%

시군구
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
서구
1071 
유성구
892 
동구
432 
중구
429 
대덕구
260 

Length

Max length3
Median length2
Mean length2.3735409
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동구
2nd row동구
3rd row동구
4th row동구
5th row동구

Common Values

ValueCountFrequency (%)
서구 1071
34.7%
유성구 892
28.9%
동구 432
14.0%
중구 429
13.9%
대덕구 260
 
8.4%

Length

2023-12-12T21:33:22.852133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:33:22.986148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서구 1071
34.7%
유성구 892
28.9%
동구 432
14.0%
중구 429
13.9%
대덕구 260
 
8.4%

상호
Text

Distinct2635
Distinct (%)85.4%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
2023-12-12T21:33:23.250090image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length23
Mean length11.438716
Min length7

Characters and Unicode

Total characters35277
Distinct characters542
Distinct categories11 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2327 ?
Unique (%)75.5%

Sample

1st row114골드부동산중개인사무소
2nd rowNew한밭공인중개사사무소
3rd rowOK공인중개사사무소
4th rowOK한숲공인중개사사무소
5th rowON공인중개사사무소
ValueCountFrequency (%)
사무소 41
 
1.3%
공인중개사무소 21
 
0.7%
공인중개사 16
 
0.5%
공인중개사사무소 12
 
0.4%
대전공인중개사사무소 6
 
0.2%
현대공인중개사사무소 6
 
0.2%
스마트공인중개사사무소 5
 
0.2%
대박공인중개사사무소 5
 
0.2%
장원공인중개사사무소 5
 
0.2%
한빛공인중개사사무소 5
 
0.2%
Other values (2635) 3064
96.2%
2023-12-12T21:33:23.838730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
6033
17.1%
3106
 
8.8%
3098
 
8.8%
3079
 
8.7%
3068
 
8.7%
3034
 
8.6%
2966
 
8.4%
410
 
1.2%
350
 
1.0%
347
 
1.0%
Other values (532) 9786
27.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 34706
98.4%
Decimal Number 190
 
0.5%
Uppercase Letter 152
 
0.4%
Space Separator 102
 
0.3%
Lowercase Letter 84
 
0.2%
Close Punctuation 14
 
< 0.1%
Open Punctuation 14
 
< 0.1%
Dash Punctuation 8
 
< 0.1%
Other Punctuation 5
 
< 0.1%
Letter Number 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
6033
17.4%
3106
 
8.9%
3098
 
8.9%
3079
 
8.9%
3068
 
8.8%
3034
 
8.7%
2966
 
8.5%
410
 
1.2%
350
 
1.0%
347
 
1.0%
Other values (482) 9215
26.6%
Uppercase Letter
ValueCountFrequency (%)
K 45
29.6%
S 22
14.5%
O 19
12.5%
T 16
 
10.5%
N 9
 
5.9%
E 9
 
5.9%
B 6
 
3.9%
H 4
 
2.6%
C 3
 
2.0%
L 3
 
2.0%
Other values (10) 16
 
10.5%
Lowercase Letter
ValueCountFrequency (%)
e 44
52.4%
h 12
 
14.3%
w 6
 
7.1%
n 5
 
6.0%
s 4
 
4.8%
a 4
 
4.8%
t 3
 
3.6%
o 2
 
2.4%
p 1
 
1.2%
y 1
 
1.2%
Other values (2) 2
 
2.4%
Decimal Number
ValueCountFrequency (%)
1 94
49.5%
4 38
20.0%
2 25
 
13.2%
5 10
 
5.3%
8 6
 
3.2%
3 6
 
3.2%
7 5
 
2.6%
9 4
 
2.1%
0 2
 
1.1%
Other Punctuation
ValueCountFrequency (%)
. 3
60.0%
/ 1
 
20.0%
& 1
 
20.0%
Space Separator
ValueCountFrequency (%)
102
100.0%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Open Punctuation
ValueCountFrequency (%)
( 14
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 34703
98.4%
Common 334
 
0.9%
Latin 237
 
0.7%
Han 3
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
6033
17.4%
3106
 
9.0%
3098
 
8.9%
3079
 
8.9%
3068
 
8.8%
3034
 
8.7%
2966
 
8.5%
410
 
1.2%
350
 
1.0%
347
 
1.0%
Other values (479) 9212
26.5%
Latin
ValueCountFrequency (%)
K 45
19.0%
e 44
18.6%
S 22
9.3%
O 19
 
8.0%
T 16
 
6.8%
h 12
 
5.1%
N 9
 
3.8%
E 9
 
3.8%
B 6
 
2.5%
w 6
 
2.5%
Other values (23) 49
20.7%
Common
ValueCountFrequency (%)
102
30.5%
1 94
28.1%
4 38
 
11.4%
2 25
 
7.5%
) 14
 
4.2%
( 14
 
4.2%
5 10
 
3.0%
- 8
 
2.4%
8 6
 
1.8%
3 6
 
1.8%
Other values (7) 17
 
5.1%
Han
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 34703
98.4%
ASCII 570
 
1.6%
CJK 3
 
< 0.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
6033
17.4%
3106
 
9.0%
3098
 
8.9%
3079
 
8.9%
3068
 
8.8%
3034
 
8.7%
2966
 
8.5%
410
 
1.2%
350
 
1.0%
347
 
1.0%
Other values (479) 9212
26.5%
ASCII
ValueCountFrequency (%)
102
17.9%
1 94
16.5%
K 45
 
7.9%
e 44
 
7.7%
4 38
 
6.7%
2 25
 
4.4%
S 22
 
3.9%
O 19
 
3.3%
T 16
 
2.8%
) 14
 
2.5%
Other values (39) 151
26.5%
CJK
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct2944
Distinct (%)95.5%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
2023-12-12T21:33:24.429345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length60
Median length51
Mean length31.976654
Min length14

Characters and Unicode

Total characters98616
Distinct characters414
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2812 ?
Unique (%)91.2%

Sample

1st row대전광역시 동구 계족로382번길 9 (성남동)
2nd row대전광역시 동구 우암로 323 1층(가양동)
3rd row대전광역시 동구 은어송로 47 (가오동)
4th row대전광역시 동구 계족로489번길 73 상가동 105호(용전동, 한숲아파트)
5th row대전광역시 동구 한밭대로1254번길 106 101호(용전동)
ValueCountFrequency (%)
대전광역시 3084
 
17.5%
서구 1071
 
6.1%
유성구 892
 
5.1%
동구 432
 
2.5%
중구 429
 
2.4%
상가동 373
 
2.1%
1층 274
 
1.6%
대덕구 260
 
1.5%
상가 188
 
1.1%
15 65
 
0.4%
Other values (3375) 10549
59.9%
2023-12-12T21:33:25.219369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
17617
 
17.9%
1 5270
 
5.3%
4289
 
4.3%
4219
 
4.3%
3310
 
3.4%
3245
 
3.3%
3189
 
3.2%
3097
 
3.1%
3089
 
3.1%
3056
 
3.1%
Other values (404) 48235
48.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 56352
57.1%
Space Separator 17617
 
17.9%
Decimal Number 16925
 
17.2%
Close Punctuation 3019
 
3.1%
Open Punctuation 3016
 
3.1%
Other Punctuation 1126
 
1.1%
Dash Punctuation 417
 
0.4%
Uppercase Letter 131
 
0.1%
Lowercase Letter 12
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4289
 
7.6%
4219
 
7.5%
3310
 
5.9%
3245
 
5.8%
3189
 
5.7%
3097
 
5.5%
3089
 
5.5%
3056
 
5.4%
1681
 
3.0%
1300
 
2.3%
Other values (361) 25877
45.9%
Uppercase Letter
ValueCountFrequency (%)
B 32
24.4%
A 21
16.0%
C 10
 
7.6%
K 10
 
7.6%
L 8
 
6.1%
H 8
 
6.1%
S 8
 
6.1%
T 7
 
5.3%
D 6
 
4.6%
V 5
 
3.8%
Other values (8) 16
12.2%
Decimal Number
ValueCountFrequency (%)
1 5270
31.1%
0 2218
13.1%
2 1952
 
11.5%
3 1447
 
8.5%
4 1326
 
7.8%
5 1259
 
7.4%
6 1093
 
6.5%
7 921
 
5.4%
8 771
 
4.6%
9 668
 
3.9%
Lowercase Letter
ValueCountFrequency (%)
n 4
33.3%
a 3
25.0%
e 2
16.7%
t 1
 
8.3%
o 1
 
8.3%
w 1
 
8.3%
Other Punctuation
ValueCountFrequency (%)
, 1084
96.3%
@ 37
 
3.3%
/ 3
 
0.3%
. 2
 
0.2%
Space Separator
ValueCountFrequency (%)
17617
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3019
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3016
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 417
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 56352
57.1%
Common 42121
42.7%
Latin 143
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4289
 
7.6%
4219
 
7.5%
3310
 
5.9%
3245
 
5.8%
3189
 
5.7%
3097
 
5.5%
3089
 
5.5%
3056
 
5.4%
1681
 
3.0%
1300
 
2.3%
Other values (361) 25877
45.9%
Latin
ValueCountFrequency (%)
B 32
22.4%
A 21
14.7%
C 10
 
7.0%
K 10
 
7.0%
L 8
 
5.6%
H 8
 
5.6%
S 8
 
5.6%
T 7
 
4.9%
D 6
 
4.2%
V 5
 
3.5%
Other values (14) 28
19.6%
Common
ValueCountFrequency (%)
17617
41.8%
1 5270
 
12.5%
) 3019
 
7.2%
( 3016
 
7.2%
0 2218
 
5.3%
2 1952
 
4.6%
3 1447
 
3.4%
4 1326
 
3.1%
5 1259
 
3.0%
6 1093
 
2.6%
Other values (9) 3904
 
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 56352
57.1%
ASCII 42264
42.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
17617
41.7%
1 5270
 
12.5%
) 3019
 
7.1%
( 3016
 
7.1%
0 2218
 
5.2%
2 1952
 
4.6%
3 1447
 
3.4%
4 1326
 
3.1%
5 1259
 
3.0%
6 1093
 
2.6%
Other values (33) 4047
 
9.6%
Hangul
ValueCountFrequency (%)
4289
 
7.6%
4219
 
7.5%
3310
 
5.9%
3245
 
5.8%
3189
 
5.7%
3097
 
5.5%
3089
 
5.5%
3056
 
5.4%
1681
 
3.0%
1300
 
2.3%
Other values (361) 25877
45.9%
Distinct1964
Distinct (%)63.7%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
Minimum1980-01-01 00:00:00
Maximum2020-11-17 00:00:00
2023-12-12T21:33:25.438795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:33:25.660499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

상태
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size24.2 KiB
영업
3071 
휴업
 
10
정지
 
3

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row영업
2nd row영업
3rd row영업
4th row영업
5th row영업

Common Values

ValueCountFrequency (%)
영업 3071
99.6%
휴업 10
 
0.3%
정지 3
 
0.1%

Length

2023-12-12T21:33:25.855070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:33:26.007127image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
영업 3071
99.6%
휴업 10
 
0.3%
정지 3
 
0.1%
Distinct9
Distinct (%)81.8%
Missing3073
Missing (%)99.6%
Memory size24.2 KiB
Minimum2020-05-19 00:00:00
Maximum2020-11-09 00:00:00
2023-12-12T21:33:26.140765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:33:26.303894image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
Distinct11
Distinct (%)100.0%
Missing3073
Missing (%)99.6%
Memory size24.2 KiB
Minimum2020-11-18 00:00:00
Maximum2021-06-28 00:00:00
2023-12-12T21:33:26.472056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:33:26.652938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)

Correlations

2023-12-12T21:33:26.781611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군구상태행정처분 시작일자행정처분 종료일자
시군구1.0000.0310.7501.000
상태0.0311.0001.0001.000
행정처분 시작일자0.7501.0001.0001.000
행정처분 종료일자1.0001.0001.0001.000
2023-12-12T21:33:26.910394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
상태시군구
상태1.0000.023
시군구0.0231.000
2023-12-12T21:33:27.020998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군구상태
시군구1.0000.023
상태0.0231.000

Missing values

2023-12-12T21:33:22.203914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:33:22.381605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:33:22.493664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시도시군구상호소재지등록일자상태행정처분 시작일자행정처분 종료일자
0대전광역시동구114골드부동산중개인사무소대전광역시 동구 계족로382번길 9 (성남동)2009.07.17영업<NA><NA>
1대전광역시동구New한밭공인중개사사무소대전광역시 동구 우암로 323 1층(가양동)2020.01.16영업<NA><NA>
2대전광역시동구OK공인중개사사무소대전광역시 동구 은어송로 47 (가오동)2008.01.28영업<NA><NA>
3대전광역시동구OK한숲공인중개사사무소대전광역시 동구 계족로489번길 73 상가동 105호(용전동, 한숲아파트)2017.02.24영업<NA><NA>
4대전광역시동구ON공인중개사사무소대전광역시 동구 한밭대로1254번길 106 101호(용전동)2019.03.08영업<NA><NA>
5대전광역시동구SK공인중개사사무소대전광역시 동구 충무로 252 (신흥동)2015.03.24영업<NA><NA>
6대전광역시동구SK대박공인중개사사무소대전광역시 동구 계족로 144-1 (대동)2018.11.27정지2020-11-092021-02-08
7대전광역시동구SK뷰공인중개사사무소대전광역시 동구 새터1길 17 1층(신흥동)2017.12.20영업<NA><NA>
8대전광역시동구SK시티공인중개사사무소대전광역시 동구 계족로 105 1층(신흥동)2019.07.30영업<NA><NA>
9대전광역시동구SK신흥공인중개사사무소대전광역시 동구 충무로 268 102호(신흥동)2019.11.25영업<NA><NA>
시도시군구상호소재지등록일자상태행정처분 시작일자행정처분 종료일자
3074대전광역시대덕구홈즈공인중개사사무소대전광역시 대덕구 중리북로 46 상가동 111호2020.06.04영업<NA><NA>
3075대전광역시대덕구화신공인중개사사무소대전광역시 대덕구 비래서로25번길 59 (비래동)2014.12.30영업<NA><NA>
3076대전광역시대덕구황금공인중개사 사무소대전광역시 대덕구 대덕대로1470번길 61 (목상동)2011.07.05영업<NA><NA>
3077대전광역시대덕구황금공인중개사사무소대전광역시 대덕구 중리남로40번길 1 (중리동)2006.04.24영업<NA><NA>
3078대전광역시대덕구효성공인중개사사무소대전광역시 대덕구 오정로63번길 12 2층(오정동)2014.03.19영업<NA><NA>
3079대전광역시대덕구효자지구공인중개사사무소대전광역시 대덕구 대전로 1387 (읍내동)2018.03.02영업<NA><NA>
3080대전광역시대덕구효진공인중개사 사무소대전광역시 대덕구 신탄진북로 23 (신탄진동)2011.11.10영업<NA><NA>
3081대전광역시대덕구효진합동공인중개사사무소대전광역시 대덕구 신탄진북로 23 (신탄진동)2018.07.24영업<NA><NA>
3082대전광역시대덕구휴플러스 공인중개사 사무소대전광역시 대덕구 비래서로 25 (비래동)2013.03.13영업<NA><NA>
3083대전광역시대덕구희망공인중개사 사무소대전광역시 대덕구 석봉로 28 (석봉동)2013.12.27영업<NA><NA>