Overview

Dataset statistics

Number of variables6
Number of observations54
Missing cells54
Missing cells (%)16.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.8 KiB
Average record size in memory53.4 B

Variable types

Numeric3
Text2
Categorical1

Dataset

Description대구광역시 서구_기계설비유지관리자 선임 대상 건축물 현황_20231228
Author대구광역시 서구
URLhttp://data.daegu.go.kr/open/data/dataView.do?dataSetId=15107600&dataSetDetailId=15107600198c8bec40466&provdMethod=FILE

Alerts

연번 is highly overall correlated with 연면적(제곱미터) and 1 other fieldsHigh correlation
연면적(제곱미터) is highly overall correlated with 연번High correlation
세대수 is highly overall correlated with 연번 and 1 other fieldsHigh correlation
용도 is highly overall correlated with 세대수High correlation
연면적(제곱미터) has 12 (22.2%) missing valuesMissing
세대수 has 42 (77.8%) missing valuesMissing
연번 has unique valuesUnique
건물명 has unique valuesUnique
주소 has unique valuesUnique

Reproduction

Analysis started2023-12-29 21:34:02.816350
Analysis finished2023-12-29 21:34:11.668508
Duration8.85 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

연번
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct54
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean27.5
Minimum1
Maximum54
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size618.0 B
2023-12-29T21:34:11.923717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.65
Q114.25
median27.5
Q340.75
95-th percentile51.35
Maximum54
Range53
Interquartile range (IQR)26.5

Descriptive statistics

Standard deviation15.732133
Coefficient of variation (CV)0.57207755
Kurtosis-1.2
Mean27.5
Median Absolute Deviation (MAD)13.5
Skewness0
Sum1485
Variance247.5
MonotonicityStrictly increasing
2023-12-29T21:34:12.527333image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.9%
42 1
 
1.9%
31 1
 
1.9%
32 1
 
1.9%
33 1
 
1.9%
34 1
 
1.9%
35 1
 
1.9%
36 1
 
1.9%
37 1
 
1.9%
38 1
 
1.9%
Other values (44) 44
81.5%
ValueCountFrequency (%)
1 1
1.9%
2 1
1.9%
3 1
1.9%
4 1
1.9%
5 1
1.9%
6 1
1.9%
7 1
1.9%
8 1
1.9%
9 1
1.9%
10 1
1.9%
ValueCountFrequency (%)
54 1
1.9%
53 1
1.9%
52 1
1.9%
51 1
1.9%
50 1
1.9%
49 1
1.9%
48 1
1.9%
47 1
1.9%
46 1
1.9%
45 1
1.9%

건물명
Text

UNIQUE 

Distinct54
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size564.0 B
2023-12-29T21:34:13.469605image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length19
Median length14
Mean length8.462963
Min length3

Characters and Unicode

Total characters457
Distinct characters167
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)100.0%

Sample

1st rowM-월드
2nd row디센터 1976 지식산업센터
3rd row대구의료원
4th row한국폴리텍대학 대구캠퍼스
5th row대한방직
ValueCountFrequency (%)
m-월드 1
 
1.3%
sk텔레콤 1
 
1.3%
광장코아 1
 
1.3%
중리중학교 1
 
1.3%
평리중학교 1
 
1.3%
경운중학교 1
 
1.3%
대구서부경찰서 1
 
1.3%
제일고등학교 1
 
1.3%
대구지사 1
 
1.3%
달성초등학교 1
 
1.3%
Other values (66) 66
86.8%
2023-12-29T21:34:14.959579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
22
 
4.8%
22
 
4.8%
20
 
4.4%
15
 
3.3%
14
 
3.1%
13
 
2.8%
12
 
2.6%
8
 
1.8%
8
 
1.8%
7
 
1.5%
Other values (157) 316
69.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 411
89.9%
Space Separator 22
 
4.8%
Uppercase Letter 12
 
2.6%
Decimal Number 5
 
1.1%
Open Punctuation 2
 
0.4%
Close Punctuation 2
 
0.4%
Lowercase Letter 1
 
0.2%
Other Symbol 1
 
0.2%
Dash Punctuation 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
22
 
5.4%
20
 
4.9%
15
 
3.6%
14
 
3.4%
13
 
3.2%
12
 
2.9%
8
 
1.9%
8
 
1.9%
7
 
1.7%
7
 
1.7%
Other values (140) 285
69.3%
Uppercase Letter
ValueCountFrequency (%)
K 3
25.0%
S 2
16.7%
T 2
16.7%
M 2
16.7%
X 1
 
8.3%
B 1
 
8.3%
D 1
 
8.3%
Decimal Number
ValueCountFrequency (%)
1 2
40.0%
6 1
20.0%
9 1
20.0%
7 1
20.0%
Space Separator
ValueCountFrequency (%)
22
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Lowercase Letter
ValueCountFrequency (%)
e 1
100.0%
Other Symbol
ValueCountFrequency (%)
1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 412
90.2%
Common 32
 
7.0%
Latin 13
 
2.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
22
 
5.3%
20
 
4.9%
15
 
3.6%
14
 
3.4%
13
 
3.2%
12
 
2.9%
8
 
1.9%
8
 
1.9%
7
 
1.7%
7
 
1.7%
Other values (141) 286
69.4%
Common
ValueCountFrequency (%)
22
68.8%
( 2
 
6.2%
) 2
 
6.2%
1 2
 
6.2%
6 1
 
3.1%
9 1
 
3.1%
7 1
 
3.1%
- 1
 
3.1%
Latin
ValueCountFrequency (%)
K 3
23.1%
S 2
15.4%
T 2
15.4%
M 2
15.4%
e 1
 
7.7%
X 1
 
7.7%
B 1
 
7.7%
D 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 411
89.9%
ASCII 45
 
9.8%
None 1
 
0.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
22
 
5.4%
20
 
4.9%
15
 
3.6%
14
 
3.4%
13
 
3.2%
12
 
2.9%
8
 
1.9%
8
 
1.9%
7
 
1.7%
7
 
1.7%
Other values (140) 285
69.3%
ASCII
ValueCountFrequency (%)
22
48.9%
K 3
 
6.7%
S 2
 
4.4%
( 2
 
4.4%
T 2
 
4.4%
) 2
 
4.4%
1 2
 
4.4%
M 2
 
4.4%
6 1
 
2.2%
9 1
 
2.2%
Other values (6) 6
 
13.3%
None
ValueCountFrequency (%)
1
100.0%

용도
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)25.9%
Missing0
Missing (%)0.0%
Memory size564.0 B
교육연구시설
13 
공동주택
12 
공장
판매시설
교육연구및복지시설
Other values (9)
15 

Length

Max length10
Median length9
Mean length5.1851852
Min length2

Unique

Unique5 ?
Unique (%)9.3%

Sample

1st row자동차관련시설
2nd row공장
3rd row의료시설
4th row교육연구시설
5th row공장

Common Values

ValueCountFrequency (%)
교육연구시설 13
24.1%
공동주택 12
22.2%
공장 7
13.0%
판매시설 4
 
7.4%
교육연구및복지시설 3
 
5.6%
제1종근린생활시설 3
 
5.6%
업무시설 3
 
5.6%
자동차관련시설 2
 
3.7%
의료시설 2
 
3.7%
위험물저장및처리시설 1
 
1.9%
Other values (4) 4
 
7.4%

Length

2023-12-29T21:34:15.421173image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
교육연구시설 13
24.1%
공동주택 12
22.2%
공장 7
13.0%
판매시설 4
 
7.4%
교육연구및복지시설 3
 
5.6%
제1종근린생활시설 3
 
5.6%
업무시설 3
 
5.6%
자동차관련시설 2
 
3.7%
의료시설 2
 
3.7%
위험물저장및처리시설 1
 
1.9%
Other values (4) 4
 
7.4%

연면적(제곱미터)
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct42
Distinct (%)100.0%
Missing12
Missing (%)22.2%
Infinite0
Infinite (%)0.0%
Mean22462.185
Minimum10031.735
Maximum105101.65
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size618.0 B
2023-12-29T21:34:15.869923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10031.735
5-th percentile10743.623
Q112222.903
median15795.212
Q325812.285
95-th percentile48712.784
Maximum105101.65
Range95069.911
Interquartile range (IQR)13589.382

Descriptive statistics

Standard deviation18875.394
Coefficient of variation (CV)0.84031871
Kurtosis10.14583
Mean22462.185
Median Absolute Deviation (MAD)4293.2075
Skewness2.9890054
Sum943411.77
Variance3.5628051 × 108
MonotonicityStrictly decreasing
2023-12-29T21:34:16.200560image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=42)
ValueCountFrequency (%)
11918.03 1
 
1.9%
13960.86 1
 
1.9%
13157.38 1
 
1.9%
13073.56 1
 
1.9%
12726.604 1
 
1.9%
12369.211 1
 
1.9%
12349.3 1
 
1.9%
12313.27 1
 
1.9%
12192.78 1
 
1.9%
11884.6 1
 
1.9%
Other values (32) 32
59.3%
(Missing) 12
 
22.2%
ValueCountFrequency (%)
10031.735 1
1.9%
10112.8 1
1.9%
10731.77 1
1.9%
10968.84 1
1.9%
11109.96 1
1.9%
11253.02 1
1.9%
11477.7 1
1.9%
11526.31 1
1.9%
11884.6 1
1.9%
11918.03 1
1.9%
ValueCountFrequency (%)
105101.646 1
1.9%
81858.15 1
1.9%
49020.37 1
1.9%
42868.66 1
1.9%
40101.01 1
1.9%
33661.73 1
1.9%
30697.24 1
1.9%
30580.63 1
1.9%
30049.83 1
1.9%
27838.68 1
1.9%

세대수
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct12
Distinct (%)100.0%
Missing42
Missing (%)77.8%
Infinite0
Infinite (%)0.0%
Mean1332.75
Minimum503
Maximum1968
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size618.0 B
2023-12-29T21:34:16.423955image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum503
5-th percentile595.95
Q1890.5
median1472
Q31702.5
95-th percentile1886.05
Maximum1968
Range1465
Interquartile range (IQR)812

Descriptive statistics

Standard deviation487.02289
Coefficient of variation (CV)0.36542704
Kurtosis-1.1529735
Mean1332.75
Median Absolute Deviation (MAD)325.5
Skewness-0.49385259
Sum15993
Variance237191.3
MonotonicityStrictly decreasing
2023-12-29T21:34:16.769052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1968 1
 
1.9%
1819 1
 
1.9%
1776 1
 
1.9%
1678 1
 
1.9%
1594 1
 
1.9%
1526 1
 
1.9%
1418 1
 
1.9%
1281 1
 
1.9%
902 1
 
1.9%
856 1
 
1.9%
Other values (2) 2
 
3.7%
(Missing) 42
77.8%
ValueCountFrequency (%)
503 1
1.9%
672 1
1.9%
856 1
1.9%
902 1
1.9%
1281 1
1.9%
1418 1
1.9%
1526 1
1.9%
1594 1
1.9%
1678 1
1.9%
1776 1
1.9%
ValueCountFrequency (%)
1968 1
1.9%
1819 1
1.9%
1776 1
1.9%
1678 1
1.9%
1594 1
1.9%
1526 1
1.9%
1418 1
1.9%
1281 1
1.9%
902 1
1.9%
856 1
1.9%

주소
Text

UNIQUE 

Distinct54
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size564.0 B
2023-12-29T21:34:17.518221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length17.888889
Min length13

Characters and Unicode

Total characters966
Distinct characters57
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique54 ?
Unique (%)100.0%

Sample

1st row대구광역시 서구 문화로 37
2nd row대구광역시 서구 와룡로 307
3rd row대구광역시 서구 평리로 157
4th row대구광역시 서구 국채보상로43길 15
5th row대구광역시 서구 염색공단로 26
ValueCountFrequency (%)
서구 54
25.0%
대구광역시 53
24.5%
국채보상로 7
 
3.2%
달구벌대로 6
 
2.8%
33 3
 
1.4%
국채보상로34길 3
 
1.4%
41 2
 
0.9%
국채보상로53길 2
 
0.9%
32 2
 
0.9%
와룡로 2
 
0.9%
Other values (73) 82
38.0%
2023-12-29T21:34:18.885326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
162
16.8%
119
12.3%
66
 
6.8%
59
 
6.1%
54
 
5.6%
53
 
5.5%
53
 
5.5%
53
 
5.5%
3 34
 
3.5%
1 30
 
3.1%
Other values (47) 283
29.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 630
65.2%
Decimal Number 173
 
17.9%
Space Separator 162
 
16.8%
Dash Punctuation 1
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
119
18.9%
66
10.5%
59
9.4%
54
8.6%
53
8.4%
53
8.4%
53
8.4%
17
 
2.7%
16
 
2.5%
16
 
2.5%
Other values (35) 124
19.7%
Decimal Number
ValueCountFrequency (%)
3 34
19.7%
1 30
17.3%
2 26
15.0%
5 17
9.8%
7 16
9.2%
4 14
8.1%
0 12
 
6.9%
6 10
 
5.8%
9 9
 
5.2%
8 5
 
2.9%
Space Separator
ValueCountFrequency (%)
162
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 630
65.2%
Common 336
34.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
119
18.9%
66
10.5%
59
9.4%
54
8.6%
53
8.4%
53
8.4%
53
8.4%
17
 
2.7%
16
 
2.5%
16
 
2.5%
Other values (35) 124
19.7%
Common
ValueCountFrequency (%)
162
48.2%
3 34
 
10.1%
1 30
 
8.9%
2 26
 
7.7%
5 17
 
5.1%
7 16
 
4.8%
4 14
 
4.2%
0 12
 
3.6%
6 10
 
3.0%
9 9
 
2.7%
Other values (2) 6
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 630
65.2%
ASCII 336
34.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
162
48.2%
3 34
 
10.1%
1 30
 
8.9%
2 26
 
7.7%
5 17
 
5.1%
7 16
 
4.8%
4 14
 
4.2%
0 12
 
3.6%
6 10
 
3.0%
9 9
 
2.7%
Other values (2) 6
 
1.8%
Hangul
ValueCountFrequency (%)
119
18.9%
66
10.5%
59
9.4%
54
8.6%
53
8.4%
53
8.4%
53
8.4%
17
 
2.7%
16
 
2.5%
16
 
2.5%
Other values (35) 124
19.7%

Interactions

2023-12-29T21:34:09.544550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:07.481772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:08.818005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:09.903949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:07.876210image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:09.057357image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:10.141119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:08.283986image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-29T21:34:09.288991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-29T21:34:19.342966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번건물명용도연면적(제곱미터)세대수주소
연번1.0001.0000.6950.6220.8871.000
건물명1.0001.0001.0001.0001.0001.000
용도0.6951.0001.0000.064NaN1.000
연면적(제곱미터)0.6221.0000.0641.000NaN1.000
세대수0.8871.000NaNNaN1.0001.000
주소1.0001.0001.0001.0001.0001.000
2023-12-29T21:34:19.831423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연번연면적(제곱미터)세대수용도
연번1.000-1.000-1.0000.344
연면적(제곱미터)-1.0001.000NaN0.023
세대수-1.000NaN1.0001.000
용도0.3440.0231.0001.000

Missing values

2023-12-29T21:34:10.492859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-29T21:34:10.962555image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-29T21:34:11.533296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

연번건물명용도연면적(제곱미터)세대수주소
01M-월드자동차관련시설105101.646<NA>대구광역시 서구 문화로 37
12디센터 1976 지식산업센터공장81858.15<NA>대구광역시 서구 와룡로 307
23대구의료원의료시설49020.37<NA>대구광역시 서구 평리로 157
34한국폴리텍대학 대구캠퍼스교육연구시설42868.66<NA>대구광역시 서구 국채보상로43길 15
45대한방직공장40101.01<NA>대구광역시 서구 염색공단로 26
56서대구 복합지식산업센터공장33661.73<NA>대구광역시 서구 와룡로90길 41
67상리동음식물쓰레기처리장위험물저장및처리시설30697.24<NA>대구광역시 서구 가르뱅이로10길 31
78이마트 트레이더스 홀세일클럽 비산점판매시설30580.63<NA>대구광역시 서구 팔달로 54
89대구과학기술고등학교교육연구및복지시설30049.83<NA>대구광역시 서구 당산로 228
910한국섬유개발연구원교육연구시설27838.68<NA>대구광역시 서구 국채보상로 136
연번건물명용도연면적(제곱미터)세대수주소
4445삼익뉴타운공동주택<NA>1776대구광역시 서구 평리로 236
4546서대구역반도유보라센텀공동주택<NA>1678대구광역시 서구 문화로 230
4647서대구역화성파크드림공동주택<NA>1594대구광역시 서구 서대구로29길 30
4748서대구센트럴자이공동주택<NA>1526대구시 서구 고성로 33
4849서대구KTX영무예다음공동주택<NA>1418대구광역시 서구 당산로 446
4950평리롯데캐슬공동주택<NA>1281대구광역시 서구 국채보상로 316
5051e편한세상 두류역공동주택<NA>902대구광역시 서구 달구벌대로361길 41
5152서대구역서한이다음 더 퍼스트공동주택<NA>856대구광역시 서구 국채보상로37길 38
5253내당광장1차아파트공동주택<NA>672대구광역시 서구 달구벌대로 1707
5354삼익맨션공동주택<NA>503대구광역시 서구 서대구로 25