Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells30
Missing cells (%)0.1%
Duplicate rows734
Duplicate rows (%)7.3%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Categorical1
Numeric2
Text2

Dataset

Description전북특별자치도 진안군 도시계획정보시스템 건축물대장 층별 개요에 대한 데이터로 층명, 구조, 구조명, 용도, 면적 정보를 제공합니다.
Author전북특별자치도 진안군
URLhttps://www.data.go.kr/data/15119152/fileData.do

Alerts

Dataset has 734 (7.3%) duplicate rowsDuplicates
층명 is highly imbalanced (90.1%)Imbalance

Reproduction

Analysis started2024-03-14 08:54:44.829867
Analysis finished2024-03-14 08:54:47.258614
Duration2.43 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

층명
Categorical

IMBALANCE 

Distinct23
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1층
9401 
2층
 
364
지상1층
 
113
3층
 
50
4층
 
12
Other values (18)
 
60

Length

Max length5
Median length2
Mean length2.0305
Min length2

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row1층
2nd row1층
3rd row1층
4th row1층
5th row1층

Common Values

ValueCountFrequency (%)
1층 9401
94.0%
2층 364
 
3.6%
지상1층 113
 
1.1%
3층 50
 
0.5%
4층 12
 
0.1%
옥탑1층 10
 
0.1%
지하1층 9
 
0.1%
옥탑층 7
 
0.1%
5층 5
 
0.1%
옥탑 4
 
< 0.1%
Other values (13) 25
 
0.2%

Length

2024-03-14T17:54:47.501668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1층 9403
94.0%
2층 365
 
3.6%
지상1층 113
 
1.1%
3층 50
 
0.5%
4층 12
 
0.1%
옥탑1층 10
 
0.1%
지하1층 9
 
0.1%
옥탑층 7
 
0.1%
옥탑 7
 
0.1%
5층 5
 
< 0.1%
Other values (11) 22
 
0.2%

구조
Real number (ℝ)

Distinct15
Distinct (%)0.2%
Missing23
Missing (%)0.2%
Infinite0
Infinite (%)0.0%
Mean36.88684
Minimum11
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-14T17:54:48.054376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum11
5-th percentile11
Q121
median51
Q351
95-th percentile51
Maximum99
Range88
Interquartile range (IQR)30

Descriptive statistics

Standard deviation16.614675
Coefficient of variation (CV)0.45042285
Kurtosis-1.2660234
Mean36.88684
Median Absolute Deviation (MAD)0
Skewness-0.48690296
Sum368020
Variance276.04744
MonotonicityNot monotonic
2024-03-14T17:54:48.418418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
51 5339
53.4%
11 1412
 
14.1%
32 1034
 
10.3%
12 723
 
7.2%
21 705
 
7.0%
31 303
 
3.0%
33 286
 
2.9%
19 121
 
1.2%
39 22
 
0.2%
99 9
 
0.1%
Other values (5) 23
 
0.2%
(Missing) 23
 
0.2%
ValueCountFrequency (%)
11 1412
14.1%
12 723
7.2%
13 7
 
0.1%
19 121
 
1.2%
21 705
7.0%
29 4
 
< 0.1%
31 303
 
3.0%
32 1034
10.3%
33 286
 
2.9%
39 22
 
0.2%
ValueCountFrequency (%)
99 9
 
0.1%
52 4
 
< 0.1%
51 5339
53.4%
49 1
 
< 0.1%
41 7
 
0.1%
39 22
 
0.2%
33 286
 
2.9%
32 1034
 
10.3%
31 303
 
3.0%
29 4
 
< 0.1%
Distinct765
Distinct (%)7.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-03-14T17:54:49.363330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length6.5202
Min length2

Characters and Unicode

Total characters65202
Distinct characters159
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique408 ?
Unique (%)4.1%

Sample

1st row조적조(적벽돌)
2nd row목조+스레트
3rd row목조+스레트
4th row목조+스레트
5th row적벽돌
ValueCountFrequency (%)
목조+스레트 3286
32.2%
목조+함석 427
 
4.2%
경량철골구조 391
 
3.8%
목조 382
 
3.7%
목조+스레이트 317
 
3.1%
목조+세멘기와 275
 
2.7%
철근콘크리트구조 225
 
2.2%
경량철골조 162
 
1.6%
일반철골구조 156
 
1.5%
강파이프구조 130
 
1.3%
Other values (738) 4442
43.6%
2024-03-14T17:54:50.776161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9060
 
13.9%
+ 7232
 
11.1%
5566
 
8.5%
5365
 
8.2%
4820
 
7.4%
4336
 
6.7%
2079
 
3.2%
1372
 
2.1%
1370
 
2.1%
1340
 
2.1%
Other values (149) 22662
34.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 56868
87.2%
Math Symbol 7232
 
11.1%
Open Punctuation 440
 
0.7%
Close Punctuation 440
 
0.7%
Space Separator 194
 
0.3%
Other Punctuation 21
 
< 0.1%
Uppercase Letter 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9060
15.9%
5566
 
9.8%
5365
 
9.4%
4820
 
8.5%
4336
 
7.6%
2079
 
3.7%
1372
 
2.4%
1370
 
2.4%
1340
 
2.4%
1326
 
2.3%
Other values (138) 20234
35.6%
Uppercase Letter
ValueCountFrequency (%)
H 3
42.9%
C 2
28.6%
O 1
 
14.3%
N 1
 
14.3%
Other Punctuation
ValueCountFrequency (%)
, 19
90.5%
: 1
 
4.8%
. 1
 
4.8%
Math Symbol
ValueCountFrequency (%)
+ 7232
100.0%
Open Punctuation
ValueCountFrequency (%)
( 440
100.0%
Close Punctuation
ValueCountFrequency (%)
) 440
100.0%
Space Separator
ValueCountFrequency (%)
194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 56868
87.2%
Common 8327
 
12.8%
Latin 7
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9060
15.9%
5566
 
9.8%
5365
 
9.4%
4820
 
8.5%
4336
 
7.6%
2079
 
3.7%
1372
 
2.4%
1370
 
2.4%
1340
 
2.4%
1326
 
2.3%
Other values (138) 20234
35.6%
Common
ValueCountFrequency (%)
+ 7232
86.9%
( 440
 
5.3%
) 440
 
5.3%
194
 
2.3%
, 19
 
0.2%
: 1
 
< 0.1%
. 1
 
< 0.1%
Latin
ValueCountFrequency (%)
H 3
42.9%
C 2
28.6%
O 1
 
14.3%
N 1
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 56868
87.2%
ASCII 8334
 
12.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
9060
15.9%
5566
 
9.8%
5365
 
9.4%
4820
 
8.5%
4336
 
7.6%
2079
 
3.7%
1372
 
2.4%
1370
 
2.4%
1340
 
2.4%
1326
 
2.3%
Other values (138) 20234
35.6%
ASCII
ValueCountFrequency (%)
+ 7232
86.8%
( 440
 
5.3%
) 440
 
5.3%
194
 
2.3%
, 19
 
0.2%
H 3
 
< 0.1%
C 2
 
< 0.1%
: 1
 
< 0.1%
O 1
 
< 0.1%
N 1
 
< 0.1%

용도
Text

Distinct667
Distinct (%)6.7%
Missing6
Missing (%)0.1%
Memory size156.2 KiB
2024-03-14T17:54:51.862051image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length32
Median length2
Mean length2.8412047
Min length1

Characters and Unicode

Total characters28395
Distinct characters294
Distinct categories7 ?
Distinct scripts4 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique430 ?
Unique (%)4.3%

Sample

1st row주택
2nd row주택
3rd row주택
4th row부속
5th row단독주택
ValueCountFrequency (%)
주택 3280
32.6%
부속 2201
21.9%
창고 822
 
8.2%
단독주택 750
 
7.5%
축사 418
 
4.2%
퇴비사 138
 
1.4%
부속(창고 137
 
1.4%
저온창고 118
 
1.2%
화장실 118
 
1.2%
근린생활시설 106
 
1.1%
Other values (644) 1977
19.6%
2024-03-14T17:54:53.437275image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4227
14.9%
4147
14.6%
2466
 
8.7%
2462
 
8.7%
1245
 
4.4%
1213
 
4.3%
1023
 
3.6%
783
 
2.8%
765
 
2.7%
) 505
 
1.8%
Other values (284) 9559
33.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 27117
95.5%
Close Punctuation 505
 
1.8%
Open Punctuation 505
 
1.8%
Other Punctuation 117
 
0.4%
Space Separator 73
 
0.3%
Decimal Number 67
 
0.2%
Uppercase Letter 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4227
15.6%
4147
15.3%
2466
 
9.1%
2462
 
9.1%
1245
 
4.6%
1213
 
4.5%
1023
 
3.8%
783
 
2.9%
765
 
2.8%
489
 
1.8%
Other values (261) 8297
30.6%
Decimal Number
ValueCountFrequency (%)
2 20
29.9%
1 15
22.4%
7 10
14.9%
6 7
 
10.4%
9 5
 
7.5%
3 3
 
4.5%
5 3
 
4.5%
4 3
 
4.5%
8 1
 
1.5%
Uppercase Letter
ValueCountFrequency (%)
E 4
36.4%
V 2
18.2%
M 1
 
9.1%
D 1
 
9.1%
F 1
 
9.1%
A 1
 
9.1%
L 1
 
9.1%
Other Punctuation
ValueCountFrequency (%)
, 85
72.6%
/ 17
 
14.5%
. 13
 
11.1%
: 2
 
1.7%
Close Punctuation
ValueCountFrequency (%)
) 505
100.0%
Open Punctuation
ValueCountFrequency (%)
( 505
100.0%
Space Separator
ValueCountFrequency (%)
73
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 27115
95.5%
Common 1267
 
4.5%
Latin 11
 
< 0.1%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4227
15.6%
4147
15.3%
2466
 
9.1%
2462
 
9.1%
1245
 
4.6%
1213
 
4.5%
1023
 
3.8%
783
 
2.9%
765
 
2.8%
489
 
1.8%
Other values (259) 8295
30.6%
Common
ValueCountFrequency (%)
) 505
39.9%
( 505
39.9%
, 85
 
6.7%
73
 
5.8%
2 20
 
1.6%
/ 17
 
1.3%
1 15
 
1.2%
. 13
 
1.0%
7 10
 
0.8%
6 7
 
0.6%
Other values (6) 17
 
1.3%
Latin
ValueCountFrequency (%)
E 4
36.4%
V 2
18.2%
M 1
 
9.1%
D 1
 
9.1%
F 1
 
9.1%
A 1
 
9.1%
L 1
 
9.1%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 27115
95.5%
ASCII 1278
 
4.5%
CJK 1
 
< 0.1%
CJK Compat Ideographs 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4227
15.6%
4147
15.3%
2466
 
9.1%
2462
 
9.1%
1245
 
4.6%
1213
 
4.5%
1023
 
3.8%
783
 
2.9%
765
 
2.8%
489
 
1.8%
Other values (259) 8295
30.6%
ASCII
ValueCountFrequency (%)
) 505
39.5%
( 505
39.5%
, 85
 
6.7%
73
 
5.7%
2 20
 
1.6%
/ 17
 
1.3%
1 15
 
1.2%
. 13
 
1.0%
7 10
 
0.8%
6 7
 
0.5%
Other values (13) 28
 
2.2%
CJK
ValueCountFrequency (%)
1
100.0%
CJK Compat Ideographs
ValueCountFrequency (%)
1
100.0%

면적
Real number (ℝ)

Distinct4860
Distinct (%)48.6%
Missing1
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean79.023692
Minimum0.88
Maximum3174.37
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-03-14T17:54:53.675610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0.88
5-th percentile11.157
Q123.235
median36.18
Q378.94
95-th percentile273.104
Maximum3174.37
Range3173.49
Interquartile range (IQR)55.705

Descriptive statistics

Standard deviation162.46269
Coefficient of variation (CV)2.0558731
Kurtosis68.383963
Mean79.023692
Median Absolute Deviation (MAD)17.7
Skewness7.1776871
Sum790157.89
Variance26394.124
MonotonicityNot monotonic
2024-03-14T17:54:53.925957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18.0 65
 
0.7%
32.0 41
 
0.4%
21.0 33
 
0.3%
27.0 32
 
0.3%
36.0 29
 
0.3%
32.4 27
 
0.3%
28.0 27
 
0.3%
24.0 25
 
0.2%
16.5 25
 
0.2%
33.6 23
 
0.2%
Other values (4850) 9672
96.7%
ValueCountFrequency (%)
0.88 1
 
< 0.1%
1.0 1
 
< 0.1%
1.21 1
 
< 0.1%
1.32 1
 
< 0.1%
1.43 1
 
< 0.1%
1.44 4
< 0.1%
1.5 1
 
< 0.1%
1.65 1
 
< 0.1%
1.8 2
< 0.1%
1.88 1
 
< 0.1%
ValueCountFrequency (%)
3174.37 1
< 0.1%
2442.0 1
< 0.1%
2129.3 1
< 0.1%
2122.33 1
< 0.1%
1974.94 1
< 0.1%
1922.94 1
< 0.1%
1909.05 1
< 0.1%
1900.8 1
< 0.1%
1893.0 1
< 0.1%
1892.88 1
< 0.1%

Interactions

2024-03-14T17:54:45.901229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T17:54:45.391337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T17:54:46.158850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-14T17:54:45.639454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-14T17:54:54.093269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층명구조면적
층명1.0000.5190.282
구조0.5191.0000.310
면적0.2820.3101.000
2024-03-14T17:54:54.342758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
구조면적층명
구조1.000-0.4300.264
면적-0.4301.0000.112
층명0.2640.1121.000

Missing values

2024-03-14T17:54:46.502887image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-14T17:54:46.819113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-14T17:54:47.110454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

층명구조구조명용도면적
116381층11조적조(적벽돌)주택98.07
118511층51목조+스레트주택37.15
29281층51목조+스레트주택36.98
31811층51목조+스레트부속24.64
123581층11적벽돌단독주택97.2
63101층51목조+스레트주택28.46
138121층12시멘트블록+스라브화장실2.0
60911층32경량철골구조제조업소60.0
238201층11조적조(적벽돌)+스라브주택86.98
10421층32경량파이프+갈바륨퇴비사67.2
층명구조구조명용도면적
191671층21철근콘크리트구조단독주택83.39
48341층51목조+스레트주택21.19
5031층51목조+스레이트부속26.5
206931층32경량철골구조+판넬총인처리실104.16
174431층33강파이프조+스레이트축사535.5
23921층51목조+스레트주택30.77
119711층19조적조+슬라브단독주택76.95
301401층31철골조+스레트축사184.0
201371층51목조주택39.14
185331층11시멘트벽돌조+칼라강판저온창고16.94

Duplicate rows

Most frequently occurring

층명구조구조명용도면적# duplicates
5861층51목조+스레트주택34.4412
5801층51목조+스레트주택34.0211
2611층51목조+스레트부속18.010
5661층51목조+스레트주택32.410
5721층51목조+스레트주택33.2110
2961층51목조+스레트부속21.09
3741층51목조+스레트부속27.09
5521층51목조+스레트주택31.29
5611층51목조+스레트주택32.09
841층33강파이프구조축사336.08