Overview

Dataset statistics

Number of variables5
Number of observations494
Missing cells490
Missing cells (%)19.8%
Duplicate rows3
Duplicate rows (%)0.6%
Total size in memory20.9 KiB
Average record size in memory43.3 B

Variable types

Text2
Numeric3

Dataset

Description경기도 평택시 기계설비 성능점검 대상(건축물 연면적 1만㎡ 이상 일반건축물과 500세대 이상 공동주택 )현황에 대한 정보로 건물명, 도로명주소, 우편번호, 연면적 m2, 세대수 등에 대한 항목을 제공합니다.※ 문의 : 평택시 건축허가과(031-8024-4182)
Author경기도 평택시
URLhttps://www.data.go.kr/data/15124643/fileData.do

Alerts

Dataset has 3 (0.6%) duplicate rowsDuplicates
연면적 has 151 (30.6%) missing valuesMissing
세대수 has 339 (68.6%) missing valuesMissing

Reproduction

Analysis started2024-03-14 10:46:38.015610
Analysis finished2024-03-14 10:46:40.455044
Duration2.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct429
Distinct (%)86.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2024-03-14T19:46:41.172126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length45
Median length29
Mean length10.362348
Min length3

Characters and Unicode

Total characters5119
Distinct characters383
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique396 ?
Unique (%)80.2%

Sample

1st row평택대학교
2nd row한국단자공업㈜
3rd row롯데알미늄㈜ 평택공장
4th row㈜새한산업
5th row율촌화학㈜ 평택공장
ValueCountFrequency (%)
아파트 35
 
5.1%
평택공장 25
 
3.7%
엘지전자㈜ 15
 
2.2%
입주자대표회의 11
 
1.6%
쌍용자동차㈜ 8
 
1.2%
한국토지주택공사 6
 
0.9%
평택발전본부 6
 
0.9%
한국서부발전㈜ 6
 
0.9%
동우화인켐㈜ 5
 
0.7%
평택 4
 
0.6%
Other values (498) 563
82.3%
2024-03-14T19:46:42.227300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
216
 
4.2%
192
 
3.8%
155
 
3.0%
144
 
2.8%
139
 
2.7%
138
 
2.7%
133
 
2.6%
131
 
2.6%
123
 
2.4%
113
 
2.2%
Other values (373) 3635
71.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4364
85.3%
Other Symbol 216
 
4.2%
Space Separator 192
 
3.8%
Decimal Number 91
 
1.8%
Uppercase Letter 85
 
1.7%
Open Punctuation 80
 
1.6%
Close Punctuation 78
 
1.5%
Lowercase Letter 5
 
0.1%
Dash Punctuation 4
 
0.1%
Other Punctuation 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
155
 
3.6%
144
 
3.3%
139
 
3.2%
138
 
3.2%
133
 
3.0%
131
 
3.0%
123
 
2.8%
113
 
2.6%
81
 
1.9%
66
 
1.5%
Other values (337) 3141
72.0%
Uppercase Letter
ValueCountFrequency (%)
K 12
14.1%
L 11
12.9%
S 8
 
9.4%
E 7
 
8.2%
A 5
 
5.9%
I 5
 
5.9%
G 5
 
5.9%
R 4
 
4.7%
O 4
 
4.7%
P 4
 
4.7%
Other values (9) 20
23.5%
Decimal Number
ValueCountFrequency (%)
2 28
30.8%
1 27
29.7%
3 16
17.6%
5 8
 
8.8%
4 5
 
5.5%
6 3
 
3.3%
7 3
 
3.3%
0 1
 
1.1%
Other Punctuation
ValueCountFrequency (%)
/ 2
66.7%
& 1
33.3%
Other Symbol
ValueCountFrequency (%)
216
100.0%
Space Separator
ValueCountFrequency (%)
192
100.0%
Open Punctuation
ValueCountFrequency (%)
( 80
100.0%
Close Punctuation
ValueCountFrequency (%)
) 78
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 5
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4580
89.5%
Common 448
 
8.8%
Latin 91
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
216
 
4.7%
155
 
3.4%
144
 
3.1%
139
 
3.0%
138
 
3.0%
133
 
2.9%
131
 
2.9%
123
 
2.7%
113
 
2.5%
81
 
1.8%
Other values (338) 3207
70.0%
Latin
ValueCountFrequency (%)
K 12
13.2%
L 11
12.1%
S 8
 
8.8%
E 7
 
7.7%
e 5
 
5.5%
A 5
 
5.5%
I 5
 
5.5%
G 5
 
5.5%
R 4
 
4.4%
O 4
 
4.4%
Other values (11) 25
27.5%
Common
ValueCountFrequency (%)
192
42.9%
( 80
17.9%
) 78
17.4%
2 28
 
6.2%
1 27
 
6.0%
3 16
 
3.6%
5 8
 
1.8%
4 5
 
1.1%
- 4
 
0.9%
6 3
 
0.7%
Other values (4) 7
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4364
85.3%
ASCII 538
 
10.5%
None 216
 
4.2%
Number Forms 1
 
< 0.1%

Most frequent character per block

None
ValueCountFrequency (%)
216
100.0%
ASCII
ValueCountFrequency (%)
192
35.7%
( 80
14.9%
) 78
14.5%
2 28
 
5.2%
1 27
 
5.0%
3 16
 
3.0%
K 12
 
2.2%
L 11
 
2.0%
S 8
 
1.5%
5 8
 
1.5%
Other values (24) 78
14.5%
Hangul
ValueCountFrequency (%)
155
 
3.6%
144
 
3.3%
139
 
3.2%
138
 
3.2%
133
 
3.0%
131
 
3.0%
123
 
2.8%
113
 
2.6%
81
 
1.9%
66
 
1.5%
Other values (337) 3141
72.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct434
Distinct (%)87.9%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2024-03-14T19:46:43.274211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length29
Mean length21.495951
Min length15

Characters and Unicode

Total characters10619
Distinct characters160
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique404 ?
Unique (%)81.8%

Sample

1st row경기도 평택시 서동대로 3825 (용이동)
2nd row경기도 평택시 포승읍 평택항로156번길 36
3rd row경기도 평택시 산단로16번길 43(모곡동)
4th row경기도 평택시 포승읍 포승공단로118번길 103
5th row경기도 평택시 포승읍 평택항만길 246
ValueCountFrequency (%)
경기도 494
21.5%
평택시 494
21.5%
포승읍 75
 
3.3%
진위면 48
 
2.1%
청북읍 44
 
1.9%
고덕동 33
 
1.4%
안중읍 22
 
1.0%
경기대로 18
 
0.8%
엘지로 15
 
0.7%
222외 15
 
0.7%
Other values (569) 1044
45.4%
2024-03-14T19:46:44.681397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1808
 
17.0%
564
 
5.3%
541
 
5.1%
512
 
4.8%
512
 
4.8%
505
 
4.8%
502
 
4.7%
424
 
4.0%
1 357
 
3.4%
303
 
2.9%
Other values (150) 4591
43.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 6560
61.8%
Space Separator 1808
 
17.0%
Decimal Number 1672
 
15.7%
Close Punctuation 254
 
2.4%
Open Punctuation 254
 
2.4%
Dash Punctuation 58
 
0.5%
Other Punctuation 10
 
0.1%
Uppercase Letter 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
564
 
8.6%
541
 
8.2%
512
 
7.8%
512
 
7.8%
505
 
7.7%
502
 
7.7%
424
 
6.5%
303
 
4.6%
164
 
2.5%
155
 
2.4%
Other values (132) 2378
36.2%
Decimal Number
ValueCountFrequency (%)
1 357
21.4%
2 281
16.8%
5 180
10.8%
3 175
10.5%
4 131
 
7.8%
6 127
 
7.6%
7 112
 
6.7%
0 111
 
6.6%
9 103
 
6.2%
8 95
 
5.7%
Uppercase Letter
ValueCountFrequency (%)
L 1
33.3%
I 1
33.3%
G 1
33.3%
Space Separator
ValueCountFrequency (%)
1808
100.0%
Close Punctuation
ValueCountFrequency (%)
) 254
100.0%
Open Punctuation
ValueCountFrequency (%)
( 254
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 58
100.0%
Other Punctuation
ValueCountFrequency (%)
, 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 6560
61.8%
Common 4056
38.2%
Latin 3
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
564
 
8.6%
541
 
8.2%
512
 
7.8%
512
 
7.8%
505
 
7.7%
502
 
7.7%
424
 
6.5%
303
 
4.6%
164
 
2.5%
155
 
2.4%
Other values (132) 2378
36.2%
Common
ValueCountFrequency (%)
1808
44.6%
1 357
 
8.8%
2 281
 
6.9%
) 254
 
6.3%
( 254
 
6.3%
5 180
 
4.4%
3 175
 
4.3%
4 131
 
3.2%
6 127
 
3.1%
7 112
 
2.8%
Other values (5) 377
 
9.3%
Latin
ValueCountFrequency (%)
L 1
33.3%
I 1
33.3%
G 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 6560
61.8%
ASCII 4059
38.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1808
44.5%
1 357
 
8.8%
2 281
 
6.9%
) 254
 
6.3%
( 254
 
6.3%
5 180
 
4.4%
3 175
 
4.3%
4 131
 
3.2%
6 127
 
3.1%
7 112
 
2.8%
Other values (8) 380
 
9.4%
Hangul
ValueCountFrequency (%)
564
 
8.6%
541
 
8.2%
512
 
7.8%
512
 
7.8%
505
 
7.7%
502
 
7.7%
424
 
6.5%
303
 
4.6%
164
 
2.5%
155
 
2.4%
Other values (132) 2378
36.2%

우편번호
Real number (ℝ)

Distinct169
Distinct (%)34.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17869.166
Minimum17700
Maximum18034
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-03-14T19:46:44.922452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum17700
5-th percentile17709
Q117780.5
median17858
Q317960
95-th percentile18021
Maximum18034
Range334
Interquartile range (IQR)179.5

Descriptive statistics

Standard deviation107.05556
Coefficient of variation (CV)0.0059910777
Kurtosis-1.3714919
Mean17869.166
Median Absolute Deviation (MAD)101
Skewness-0.0070986801
Sum8827368
Variance11460.893
MonotonicityNot monotonic
2024-03-14T19:46:45.180833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18014 23
 
4.7%
17709 20
 
4.0%
17959 16
 
3.2%
17960 14
 
2.8%
18008 12
 
2.4%
17956 12
 
2.4%
17749 9
 
1.8%
17812 9
 
1.8%
17792 8
 
1.6%
18021 8
 
1.6%
Other values (159) 363
73.5%
ValueCountFrequency (%)
17700 1
 
0.2%
17703 1
 
0.2%
17704 5
 
1.0%
17706 2
 
0.4%
17708 6
 
1.2%
17709 20
4.0%
17712 6
 
1.2%
17713 2
 
0.4%
17714 3
 
0.6%
17715 1
 
0.2%
ValueCountFrequency (%)
18034 1
 
0.2%
18033 2
 
0.4%
18032 6
1.2%
18031 3
0.6%
18030 1
 
0.2%
18029 1
 
0.2%
18028 2
 
0.4%
18027 1
 
0.2%
18026 1
 
0.2%
18025 5
1.0%

연면적
Real number (ℝ)

MISSING 

Distinct336
Distinct (%)98.0%
Missing151
Missing (%)30.6%
Infinite0
Infinite (%)0.0%
Mean20435.729
Minimum730
Maximum101671.15
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-03-14T19:46:45.446705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum730
5-th percentile10357.384
Q112150.22
median14908.79
Q322811.99
95-th percentile49210.114
Maximum101671.15
Range100941.15
Interquartile range (IQR)10661.77

Descriptive statistics

Standard deviation13985.095
Coefficient of variation (CV)0.68434529
Kurtosis7.7337391
Mean20435.729
Median Absolute Deviation (MAD)3736.77
Skewness2.5084178
Sum7009455.2
Variance1.9558289 × 108
MonotonicityNot monotonic
2024-03-14T19:46:45.788537image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17483.16 4
 
0.8%
38128.8 2
 
0.4%
13723.77 2
 
0.4%
12894.62 2
 
0.4%
11337.23 2
 
0.4%
24106.24 1
 
0.2%
22682.35 1
 
0.2%
19061.5 1
 
0.2%
22941.63 1
 
0.2%
23675.29 1
 
0.2%
Other values (326) 326
66.0%
(Missing) 151
30.6%
ValueCountFrequency (%)
730.0 1
0.2%
3266.33 1
0.2%
10034.42 1
0.2%
10121.24 1
0.2%
10132.55 1
0.2%
10134.47 1
0.2%
10172.56 1
0.2%
10175.09 1
0.2%
10225.0 1
0.2%
10236.29 1
0.2%
ValueCountFrequency (%)
101671.15 1
0.2%
88750.9 1
0.2%
84669.36 1
0.2%
81264.73 1
0.2%
73580.41 1
0.2%
71421.44 1
0.2%
66560.64 1
0.2%
61415.76 1
0.2%
59627.71 1
0.2%
57806.13 1
0.2%

세대수
Real number (ℝ)

MISSING 

Distinct138
Distinct (%)89.0%
Missing339
Missing (%)68.6%
Infinite0
Infinite (%)0.0%
Mean837.43226
Minimum13
Maximum2530
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2024-03-14T19:46:46.037385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile394.2
Q1612.5
median737
Q3946.5
95-th percentile1593.7
Maximum2530
Range2517
Interquartile range (IQR)334

Descriptive statistics

Standard deviation401.1253
Coefficient of variation (CV)0.47899433
Kurtosis3.6085066
Mean837.43226
Median Absolute Deviation (MAD)154
Skewness1.540428
Sum129802
Variance160901.51
MonotonicityNot monotonic
2024-03-14T19:46:46.292015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
715 3
 
0.6%
550 3
 
0.6%
632 2
 
0.4%
1035 2
 
0.4%
690 2
 
0.4%
684 2
 
0.4%
761 2
 
0.4%
383 2
 
0.4%
650 2
 
0.4%
29 2
 
0.4%
Other values (128) 133
 
26.9%
(Missing) 339
68.6%
ValueCountFrequency (%)
13 1
0.2%
29 2
0.4%
48 1
0.2%
291 1
0.2%
350 1
0.2%
383 2
0.4%
399 1
0.2%
447 1
0.2%
500 1
0.2%
506 1
0.2%
ValueCountFrequency (%)
2530 1
0.2%
2324 1
0.2%
2124 1
0.2%
1999 1
0.2%
1943 1
0.2%
1884 1
0.2%
1674 1
0.2%
1600 1
0.2%
1591 1
0.2%
1590 1
0.2%

Interactions

2024-03-14T19:46:39.507868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:38.443593image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:39.070364image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:39.662302image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:38.762670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:39.227782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:39.866723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:38.915340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T19:46:39.370155image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T19:46:46.446623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
우편번호연면적세대수
우편번호1.0000.0000.296
연면적0.0001.000NaN
세대수0.296NaN1.000
2024-03-14T19:46:46.604341image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
우편번호연면적세대수
우편번호1.0000.0440.214
연면적0.0441.0000.051
세대수0.2140.0511.000

Missing values

2024-03-14T19:46:40.058973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T19:46:40.223642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T19:46:40.376767image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

건물명도로명주소우편번호연면적세대수
0평택대학교경기도 평택시 서동대로 3825 (용이동)1786910461.48<NA>
1한국단자공업㈜경기도 평택시 포승읍 평택항로156번길 361796010617.23<NA>
2롯데알미늄㈜ 평택공장경기도 평택시 산단로16번길 43(모곡동)1774610421.99<NA>
3㈜새한산업경기도 평택시 포승읍 포승공단로118번길 1031796013618.77<NA>
4율촌화학㈜ 평택공장경기도 평택시 포승읍 평택항만길 2461795811163.9<NA>
5EPS KOREA㈜경기도 평택시 포승읍 평택항로 2941795613154.62<NA>
6LG전자㈜ 평택칠러공장경기도 평택시 진위면 동부대로 1201771212919.54<NA>
7(주)경동나비엔(㈜경동나비엔 서탄공장)경기도 평택시 서탄면 수월암길 951770414948.12<NA>
8롯데제과㈜ 평택공장경기도 평택시 진위면 경기대로 19521771312629.87<NA>
9한국서부발전㈜ 평택발전본부경기도 평택시 포승읍 남양만로 2271794911159.46<NA>
건물명도로명주소우편번호연면적세대수
484동우화인켐㈜ 평택공장경기도 평택시 포승읍 포승공단로117번길 351795652497.12<NA>
485AKS&D㈜ AK평택점경기도 평택시 평택로 51(평택동)1791781264.73<NA>
486삼성전자㈜ 평택캠퍼스경기도 평택시 삼성로 114 (고덕동)17786<NA><NA>
487마제스트타워경기도 평택시 산단로16번길 12 (모곡동)1774655555.76<NA>
488(주)경동나비엔(㈜경동나비엔 서탄공장)경기도 평택시 서탄면 수월암길 951770440353.33<NA>
489㈜코람코자산신탁(고덕지식공작소아이타워)경기도 평택시 고덕면 도시지원로 1211802130814.81<NA>
490우리자산신탁㈜(G1지식산업센터)경기도 평택시 고덕면 도시지원1길 1161802154458.74<NA>
491㈜아이엠코퍼레이션(고덕헤리움시그니어)경기도 평택시 고덕여염9길 51 (고덕동)1801484669.36<NA>
492LG전자㈜ 평택칠러공장경기도 평택시 진위면 동부대로 1201771239432.47<NA>
493삼아알미늄㈜경기도 평택시 포승읍 평택항로 921796038347.18<NA>

Duplicate rows

Most frequently occurring

건물명도로명주소우편번호연면적세대수# duplicates
2한국서부발전㈜ 평택발전본부경기도 평택시 포승읍 남양만로 2271794917483.16<NA>4
0㈜베스트원(평택고덕아이파크)경기도 평택시 경기대로 945(장당동)1778738128.8<NA>2
1고덕국제신도시금호어울림경기도 평택시 고덕로 191 (고덕동)18019<NA>15822