Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells4088
Missing cells (%)6.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory576.2 KiB
Average record size in memory59.0 B

Variable types

Text3
Categorical2
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15749/S/1/datasetView.do

Alerts

작업_일자" has constant value ""Constant
층_구분_코드 is highly imbalanced (80.5%)Imbalance
동명칭 has 3940 (39.4%) missing valuesMissing
호_명 has 148 (1.5%) missing valuesMissing
관리_폐쇄말소대장_PK has unique valuesUnique
층_번호 has 4372 (43.7%) zerosZeros

Reproduction

Analysis started2024-05-11 05:37:01.089673
Analysis finished2024-05-11 05:37:02.771217
Duration1.68 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:37:03.083989image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length11.1824
Min length10

Characters and Unicode

Total characters111824
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11620-12907
2nd row11470-13534
3rd row11620-20385
4th row11500-23424
5th row11650-15591
ValueCountFrequency (%)
11620-12907 1
 
< 0.1%
11500-10544 1
 
< 0.1%
11680-14238 1
 
< 0.1%
11470-12803 1
 
< 0.1%
11530-6202 1
 
< 0.1%
11470-15001 1
 
< 0.1%
11650-11279 1
 
< 0.1%
11500-19733 1
 
< 0.1%
11620-19913 1
 
< 0.1%
11560-100009255 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T14:37:03.647418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 31111
27.8%
0 18068
16.2%
5 10457
 
9.4%
- 10000
 
8.9%
6 7611
 
6.8%
4 7398
 
6.6%
2 7013
 
6.3%
9 5241
 
4.7%
7 5233
 
4.7%
3 4879
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 101824
91.1%
Dash Punctuation 10000
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 31111
30.6%
0 18068
17.7%
5 10457
 
10.3%
6 7611
 
7.5%
4 7398
 
7.3%
2 7013
 
6.9%
9 5241
 
5.1%
7 5233
 
5.1%
3 4879
 
4.8%
8 4813
 
4.7%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 111824
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 31111
27.8%
0 18068
16.2%
5 10457
 
9.4%
- 10000
 
8.9%
6 7611
 
6.8%
4 7398
 
6.6%
2 7013
 
6.3%
9 5241
 
4.7%
7 5233
 
4.7%
3 4879
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 111824
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 31111
27.8%
0 18068
16.2%
5 10457
 
9.4%
- 10000
 
8.9%
6 7611
 
6.8%
4 7398
 
6.6%
2 7013
 
6.3%
9 5241
 
4.7%
7 5233
 
4.7%
3 4879
 
4.4%

동명칭
Text

MISSING 

Distinct468
Distinct (%)7.7%
Missing3940
Missing (%)39.4%
Memory size156.2 KiB
2024-05-11T14:37:04.185026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length4
Mean length3.8858086
Min length1

Characters and Unicode

Total characters23548
Distinct characters263
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)2.1%

Sample

1st row109동
2nd row가동
3rd row5동
4th row223동
5th row1동
ValueCountFrequency (%)
101동 412
 
6.5%
102동 276
 
4.4%
1동 169
 
2.7%
201동 156
 
2.5%
2동 155
 
2.4%
가동 154
 
2.4%
103동 133
 
2.1%
여의도자이 114
 
1.8%
105동 98
 
1.5%
나동 92
 
1.5%
Other values (478) 4585
72.3%
2024-05-11T14:37:04.843917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5183
22.0%
1 3654
15.5%
0 2280
 
9.7%
2 1560
 
6.6%
3 1287
 
5.5%
4 596
 
2.5%
6 576
 
2.4%
5 524
 
2.2%
7 414
 
1.8%
8 372
 
1.6%
Other values (253) 7102
30.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 11525
48.9%
Other Letter 11515
48.9%
Space Separator 284
 
1.2%
Uppercase Letter 164
 
0.7%
Close Punctuation 20
 
0.1%
Open Punctuation 20
 
0.1%
Other Punctuation 14
 
0.1%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5183
45.0%
300
 
2.6%
236
 
2.0%
225
 
2.0%
205
 
1.8%
158
 
1.4%
150
 
1.3%
149
 
1.3%
140
 
1.2%
139
 
1.2%
Other values (227) 4630
40.2%
Decimal Number
ValueCountFrequency (%)
1 3654
31.7%
0 2280
19.8%
2 1560
13.5%
3 1287
 
11.2%
4 596
 
5.2%
6 576
 
5.0%
5 524
 
4.5%
7 414
 
3.6%
8 372
 
3.2%
9 262
 
2.3%
Uppercase Letter
ValueCountFrequency (%)
A 54
32.9%
T 32
19.5%
V 32
19.5%
B 32
19.5%
C 5
 
3.0%
D 3
 
1.8%
G 2
 
1.2%
S 2
 
1.2%
E 2
 
1.2%
Other Punctuation
ValueCountFrequency (%)
. 12
85.7%
, 1
 
7.1%
* 1
 
7.1%
Space Separator
ValueCountFrequency (%)
284
100.0%
Close Punctuation
ValueCountFrequency (%)
) 20
100.0%
Open Punctuation
ValueCountFrequency (%)
( 20
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 11869
50.4%
Hangul 11515
48.9%
Latin 164
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5183
45.0%
300
 
2.6%
236
 
2.0%
225
 
2.0%
205
 
1.8%
158
 
1.4%
150
 
1.3%
149
 
1.3%
140
 
1.2%
139
 
1.2%
Other values (227) 4630
40.2%
Common
ValueCountFrequency (%)
1 3654
30.8%
0 2280
19.2%
2 1560
13.1%
3 1287
 
10.8%
4 596
 
5.0%
6 576
 
4.9%
5 524
 
4.4%
7 414
 
3.5%
8 372
 
3.1%
284
 
2.4%
Other values (7) 322
 
2.7%
Latin
ValueCountFrequency (%)
A 54
32.9%
T 32
19.5%
V 32
19.5%
B 32
19.5%
C 5
 
3.0%
D 3
 
1.8%
G 2
 
1.2%
S 2
 
1.2%
E 2
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12033
51.1%
Hangul 11515
48.9%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5183
45.0%
300
 
2.6%
236
 
2.0%
225
 
2.0%
205
 
1.8%
158
 
1.4%
150
 
1.3%
149
 
1.3%
140
 
1.2%
139
 
1.2%
Other values (227) 4630
40.2%
ASCII
ValueCountFrequency (%)
1 3654
30.4%
0 2280
18.9%
2 1560
13.0%
3 1287
 
10.7%
4 596
 
5.0%
6 576
 
4.8%
5 524
 
4.4%
7 414
 
3.4%
8 372
 
3.1%
284
 
2.4%
Other values (16) 486
 
4.0%

호_명
Text

MISSING 

Distinct2011
Distinct (%)20.4%
Missing148
Missing (%)1.5%
Memory size156.2 KiB
2024-05-11T14:37:05.329498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length4
Mean length4.1098254
Min length1

Characters and Unicode

Total characters40490
Distinct characters78
Distinct categories9 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1238 ?
Unique (%)12.6%

Sample

1st row802
2nd row303호
3rd rowB01호
4th row312
5th row306호
ValueCountFrequency (%)
101호 262
 
2.6%
201호 257
 
2.6%
202호 199
 
2.0%
102호 183
 
1.8%
302호 148
 
1.5%
301호 142
 
1.4%
103호 134
 
1.3%
203호 125
 
1.3%
303호 112
 
1.1%
201 104
 
1.0%
Other values (1957) 8260
83.2%
2024-05-11T14:37:06.113912image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 8646
21.4%
1 7163
17.7%
5913
14.6%
2 4721
11.7%
3 3100
 
7.7%
4 2007
 
5.0%
5 1859
 
4.6%
1378
 
3.4%
6 1238
 
3.1%
7 909
 
2.2%
Other values (68) 3556
8.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 31100
76.8%
Other Letter 8523
 
21.0%
Dash Punctuation 525
 
1.3%
Uppercase Letter 236
 
0.6%
Space Separator 74
 
0.2%
Close Punctuation 13
 
< 0.1%
Open Punctuation 13
 
< 0.1%
Lowercase Letter 4
 
< 0.1%
Other Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5913
69.4%
1378
 
16.2%
419
 
4.9%
195
 
2.3%
106
 
1.2%
74
 
0.9%
73
 
0.9%
60
 
0.7%
51
 
0.6%
49
 
0.6%
Other values (40) 205
 
2.4%
Decimal Number
ValueCountFrequency (%)
0 8646
27.8%
1 7163
23.0%
2 4721
15.2%
3 3100
 
10.0%
4 2007
 
6.5%
5 1859
 
6.0%
6 1238
 
4.0%
7 909
 
2.9%
8 795
 
2.6%
9 662
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
B 201
85.2%
A 14
 
5.9%
D 10
 
4.2%
C 4
 
1.7%
O 2
 
0.8%
E 2
 
0.8%
L 1
 
0.4%
S 1
 
0.4%
F 1
 
0.4%
Lowercase Letter
ValueCountFrequency (%)
a 2
50.0%
b 1
25.0%
c 1
25.0%
Other Punctuation
ValueCountFrequency (%)
* 1
50.0%
/ 1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 525
100.0%
Space Separator
ValueCountFrequency (%)
74
100.0%
Close Punctuation
ValueCountFrequency (%)
) 13
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 31727
78.4%
Hangul 8523
 
21.0%
Latin 240
 
0.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5913
69.4%
1378
 
16.2%
419
 
4.9%
195
 
2.3%
106
 
1.2%
74
 
0.9%
73
 
0.9%
60
 
0.7%
51
 
0.6%
49
 
0.6%
Other values (40) 205
 
2.4%
Common
ValueCountFrequency (%)
0 8646
27.3%
1 7163
22.6%
2 4721
14.9%
3 3100
 
9.8%
4 2007
 
6.3%
5 1859
 
5.9%
6 1238
 
3.9%
7 909
 
2.9%
8 795
 
2.5%
9 662
 
2.1%
Other values (6) 627
 
2.0%
Latin
ValueCountFrequency (%)
B 201
83.8%
A 14
 
5.8%
D 10
 
4.2%
C 4
 
1.7%
a 2
 
0.8%
O 2
 
0.8%
E 2
 
0.8%
L 1
 
0.4%
S 1
 
0.4%
b 1
 
0.4%
Other values (2) 2
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31967
79.0%
Hangul 8523
 
21.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8646
27.0%
1 7163
22.4%
2 4721
14.8%
3 3100
 
9.7%
4 2007
 
6.3%
5 1859
 
5.8%
6 1238
 
3.9%
7 909
 
2.8%
8 795
 
2.5%
9 662
 
2.1%
Other values (18) 867
 
2.7%
Hangul
ValueCountFrequency (%)
5913
69.4%
1378
 
16.2%
419
 
4.9%
195
 
2.3%
106
 
1.2%
74
 
0.9%
73
 
0.9%
60
 
0.7%
51
 
0.6%
49
 
0.6%
Other values (40) 205
 
2.4%

층_구분_코드
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20
9698 
10
 
302

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20
2nd row20
3rd row20
4th row20
5th row20

Common Values

ValueCountFrequency (%)
20 9698
97.0%
10 302
 
3.0%

Length

2024-05-11T14:37:06.408693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:37:06.558881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20 9698
97.0%
10 302
 
3.0%

층_번호
Real number (ℝ)

ZEROS 

Distinct40
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8178
Minimum0
Maximum39
Zeros4372
Zeros (%)43.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T14:37:06.719309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q35
95-th percentile16
Maximum39
Range39
Interquartile range (IQR)5

Descriptive statistics

Standard deviation5.6528222
Coefficient of variation (CV)1.4806491
Kurtosis3.721426
Mean3.8178
Median Absolute Deviation (MAD)1
Skewness1.8930508
Sum38178
Variance31.954399
MonotonicityNot monotonic
2024-05-11T14:37:06.971096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
0 4372
43.7%
1 997
 
10.0%
2 734
 
7.3%
3 599
 
6.0%
4 428
 
4.3%
5 378
 
3.8%
7 286
 
2.9%
6 265
 
2.6%
9 210
 
2.1%
8 206
 
2.1%
Other values (30) 1525
 
15.2%
ValueCountFrequency (%)
0 4372
43.7%
1 997
 
10.0%
2 734
 
7.3%
3 599
 
6.0%
4 428
 
4.3%
5 378
 
3.8%
6 265
 
2.6%
7 286
 
2.9%
8 206
 
2.1%
9 210
 
2.1%
ValueCountFrequency (%)
39 3
< 0.1%
38 1
 
< 0.1%
37 2
< 0.1%
36 1
 
< 0.1%
35 3
< 0.1%
34 3
< 0.1%
33 3
< 0.1%
32 1
 
< 0.1%
31 4
< 0.1%
30 4
< 0.1%

작업_일자"
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
20111227
10000 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20111227
2nd row20111227
3rd row20111227
4th row20111227
5th row20111227

Common Values

ValueCountFrequency (%)
20111227 10000
100.0%

Length

2024-05-11T14:37:07.246128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:37:07.388551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20111227 10000
100.0%

Interactions

2024-05-11T14:37:02.099164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:37:07.472176image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층_구분_코드층_번호
층_구분_코드1.0000.155
층_번호0.1551.000
2024-05-11T14:37:07.630965image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
층_번호층_구분_코드
층_번호1.0000.119
층_구분_코드0.1191.000

Missing values

2024-05-11T14:37:02.337888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:37:02.544668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T14:37:02.693194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

관리_폐쇄말소대장_PK동명칭호_명층_구분_코드층_번호작업_일자"
3968711620-12907109동80220820111227
1188811470-13534가동303호20320111227
5243311620-20385<NA>B01호20020111227
798911500-234245동31220020111227
6407411650-15591223동306호20020111227
5565911650-215151동303호20020111227
6708911620-16219102동301호20020111227
1931211440-22980104동1004호201020111227
1167911500-100017854상가1동B12610120111227
4919111410-10054<NA>202호20020111227
관리_폐쇄말소대장_PK동명칭호_명층_구분_코드층_번호작업_일자"
4839811440-14578<NA>10120120111227
3690511500-1920632동2층205호20020111227
4318511590-14121104동1103201120111227
1663511470-7046<NA>202호20220111227
4712311590-173755동101호20020111227
4017011620-13651117동2102202120111227
4165011590-11755107동603호20620111227
6317011650-12767<NA>301호20020111227
2230811545-100009514<NA>2-1호20220111227
616811380-100021974<NA>3층220120111227