Overview

Dataset statistics

Number of variables10
Number of observations10000
Missing cells10000
Missing cells (%)10.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory947.3 KiB
Average record size in memory97.0 B

Variable types

Text1
Categorical7
Unsupported1
Numeric1

Dataset

Description관리_허가대장_pk,천장재_함유_유무,단열재_함유_유무,지붕재_함유_유무,보온재_함유_유무,기타_함유_유무,해당없음_유무,기타_유무,바닥재함유유무,작업_일자
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15661/S/1/datasetView.do

Alerts

천장재_함유_유무 is highly imbalanced (79.5%)Imbalance
단열재_함유_유무 is highly imbalanced (89.1%)Imbalance
지붕재_함유_유무 is highly imbalanced (78.9%)Imbalance
보온재_함유_유무 is highly imbalanced (87.5%)Imbalance
해당없음_유무 is highly imbalanced (60.1%)Imbalance
기타_유무 is highly imbalanced (81.8%)Imbalance
기타_함유_유무 has 10000 (100.0%) missing valuesMissing
관리_허가대장_pk has unique valuesUnique
기타_함유_유무 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-05-11 06:47:22.947548
Analysis finished2024-05-11 06:47:24.511557
Duration1.56 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:47:24.717585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length14.7184
Min length8

Characters and Unicode

Total characters147184
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11470-100012901
2nd row11380-100023714
3rd row11500-100036400
4th row11110-100031568
5th row11440-100061225
ValueCountFrequency (%)
11470-100012901 1
 
< 0.1%
11440-100030057 1
 
< 0.1%
11500-100007689 1
 
< 0.1%
11410-100048627 1
 
< 0.1%
11110-100012608 1
 
< 0.1%
11260-2314 1
 
< 0.1%
11140-100081180 1
 
< 0.1%
11200-1000000000000000020754 1
 
< 0.1%
11200-1000000000000000070683 1
 
< 0.1%
11170-100076112 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T15:47:25.258679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 43465
29.5%
1 38559
26.2%
2 10031
 
6.8%
- 10000
 
6.8%
4 8308
 
5.6%
3 8238
 
5.6%
5 7673
 
5.2%
6 5742
 
3.9%
7 5233
 
3.6%
8 4990
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 137184
93.2%
Dash Punctuation 10000
 
6.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 43465
31.7%
1 38559
28.1%
2 10031
 
7.3%
4 8308
 
6.1%
3 8238
 
6.0%
5 7673
 
5.6%
6 5742
 
4.2%
7 5233
 
3.8%
8 4990
 
3.6%
9 4945
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 147184
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 43465
29.5%
1 38559
26.2%
2 10031
 
6.8%
- 10000
 
6.8%
4 8308
 
5.6%
3 8238
 
5.6%
5 7673
 
5.2%
6 5742
 
3.9%
7 5233
 
3.6%
8 4990
 
3.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 147184
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 43465
29.5%
1 38559
26.2%
2 10031
 
6.8%
- 10000
 
6.8%
4 8308
 
5.6%
3 8238
 
5.6%
5 7673
 
5.2%
6 5742
 
3.9%
7 5233
 
3.6%
8 4990
 
3.4%

천장재_함유_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9522 
1
 
257
<NA>
 
221

Length

Max length4
Median length1
Mean length1.0663
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9522
95.2%
1 257
 
2.6%
<NA> 221
 
2.2%

Length

2024-05-11T15:47:25.479335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:25.619586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9522
95.2%
1 257
 
2.6%
na 221
 
2.2%

단열재_함유_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9759 
<NA>
 
224
1
 
17

Length

Max length4
Median length1
Mean length1.0672
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9759
97.6%
<NA> 224
 
2.2%
1 17
 
0.2%

Length

2024-05-11T15:47:25.768782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:25.944972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9759
97.6%
na 224
 
2.2%
1 17
 
0.2%

지붕재_함유_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9502 
1
 
277
<NA>
 
221

Length

Max length4
Median length1
Mean length1.0663
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9502
95.0%
1 277
 
2.8%
<NA> 221
 
2.2%

Length

2024-05-11T15:47:26.098385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:26.231174image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9502
95.0%
1 277
 
2.8%
na 221
 
2.2%

보온재_함유_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9728 
<NA>
 
224
1
 
48

Length

Max length4
Median length1
Mean length1.0672
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9728
97.3%
<NA> 224
 
2.2%
1 48
 
0.5%

Length

2024-05-11T15:47:26.432766image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:26.599462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9728
97.3%
na 224
 
2.2%
1 48
 
0.5%

기타_함유_유무
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing10000
Missing (%)100.0%
Memory size166.0 KiB

해당없음_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
8542 
0
1403 
<NA>
 
55

Length

Max length4
Median length1
Mean length1.0165
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 8542
85.4%
0 1403
 
14.0%
<NA> 55
 
0.5%

Length

2024-05-11T15:47:27.136801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:27.330253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 8542
85.4%
0 1403
 
14.0%
na 55
 
0.5%

기타_유무
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0
9590 
<NA>
 
221
1
 
189

Length

Max length4
Median length1
Mean length1.0663
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 9590
95.9%
<NA> 221
 
2.2%
1 189
 
1.9%

Length

2024-05-11T15:47:27.489402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:27.628457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 9590
95.9%
na 221
 
2.2%
1 189
 
1.9%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
5651 
0
4340 
1
 
9

Length

Max length4
Median length4
Mean length2.6953
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row0
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 5651
56.5%
0 4340
43.4%
1 9
 
0.1%

Length

2024-05-11T15:47:27.769037image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:47:27.921927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 5651
56.5%
0 4340
43.4%
1 9
 
0.1%

작업_일자
Real number (ℝ)

Distinct976
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20151752
Minimum20111227
Maximum20240510
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T15:47:28.126673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20111227
5-th percentile20111227
Q120120204
median20150312
Q320180123
95-th percentile20220803
Maximum20240510
Range129283
Interquartile range (IQR)59919

Descriptive statistics

Standard deviation36038.48
Coefficient of variation (CV)0.0017883546
Kurtosis-0.4396753
Mean20151752
Median Absolute Deviation (MAD)30000
Skewness0.65296221
Sum2.0151752 × 1011
Variance1.298772 × 109
MonotonicityNot monotonic
2024-05-11T15:47:28.356477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20111227 2397
 
24.0%
20170406 183
 
1.8%
20201111 178
 
1.8%
20211029 138
 
1.4%
20240510 130
 
1.3%
20170928 115
 
1.1%
20201007 111
 
1.1%
20180927 107
 
1.1%
20130831 103
 
1.0%
20141115 103
 
1.0%
Other values (966) 6435
64.3%
ValueCountFrequency (%)
20111227 2397
24.0%
20120102 14
 
0.1%
20120104 2
 
< 0.1%
20120106 3
 
< 0.1%
20120111 2
 
< 0.1%
20120112 3
 
< 0.1%
20120113 4
 
< 0.1%
20120118 1
 
< 0.1%
20120119 2
 
< 0.1%
20120120 2
 
< 0.1%
ValueCountFrequency (%)
20240510 130
1.3%
20240507 12
 
0.1%
20240425 6
 
0.1%
20240420 8
 
0.1%
20240417 2
 
< 0.1%
20240411 5
 
0.1%
20240406 6
 
0.1%
20240402 1
 
< 0.1%
20240330 5
 
0.1%
20240327 61
0.6%

Interactions

2024-05-11T15:47:24.007129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:47:28.483707image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
천장재_함유_유무단열재_함유_유무지붕재_함유_유무보온재_함유_유무해당없음_유무기타_유무바닥재함유유무작업_일자
천장재_함유_유무1.0000.2880.3490.1470.5730.4510.2180.072
단열재_함유_유무0.2881.0000.1390.2410.1420.1450.3810.000
지붕재_함유_유무0.3490.1391.0000.2900.5960.3290.1020.068
보온재_함유_유무0.1470.2410.2901.0000.2560.1600.0230.149
해당없음_유무0.5730.1420.5960.2561.0000.4960.1580.163
기타_유무0.4510.1450.3290.1600.4961.0000.1290.085
바닥재함유유무0.2180.3810.1020.0230.1580.1291.0000.000
작업_일자0.0720.0000.0680.1490.1630.0850.0001.000
2024-05-11T15:47:28.653922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
바닥재함유유무지붕재_함유_유무천장재_함유_유무보온재_함유_유무해당없음_유무기타_유무단열재_함유_유무
바닥재함유유무1.0000.0650.1400.0150.1010.0820.249
지붕재_함유_유무0.0651.0000.2270.1870.4070.2130.089
천장재_함유_유무0.1400.2271.0000.0940.3890.2980.186
보온재_함유_유무0.0150.1870.0941.0000.1650.1020.155
해당없음_유무0.1010.4070.3890.1651.0000.3300.091
기타_유무0.0820.2130.2980.1020.3301.0000.092
단열재_함유_유무0.2490.0890.1860.1550.0910.0921.000
2024-05-11T15:47:28.821404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
작업_일자천장재_함유_유무단열재_함유_유무지붕재_함유_유무보온재_함유_유무해당없음_유무기타_유무바닥재함유유무
작업_일자1.0000.0590.0000.0500.1110.1600.0710.000
천장재_함유_유무0.0591.0000.1860.2270.0940.3890.2980.140
단열재_함유_유무0.0000.1861.0000.0890.1550.0910.0920.249
지붕재_함유_유무0.0500.2270.0891.0000.1870.4070.2130.065
보온재_함유_유무0.1110.0940.1550.1871.0000.1650.1020.015
해당없음_유무0.1600.3890.0910.4070.1651.0000.3300.101
기타_유무0.0710.2980.0920.2130.1020.3301.0000.082
바닥재함유유무0.0000.1400.2490.0650.0150.1010.0821.000

Missing values

2024-05-11T15:47:24.191397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:47:24.409465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_허가대장_pk천장재_함유_유무단열재_함유_유무지붕재_함유_유무보온재_함유_유무기타_함유_유무해당없음_유무기타_유무바닥재함유유무작업_일자
8985311470-1000129010000<NA>10<NA>20111227
6883311380-1000237140000<NA>10020120504
9772111500-1000364000000<NA>10<NA>20131214
410811110-1000315680000<NA>10<NA>20191016
8466611440-1000612250000<NA>10<NA>20201111
6028711320-1000256410000<NA>10020130219
4667211290-1000310350000<NA>10<NA>20150226
2927911230-1000373750000<NA>01020140521
330911110-1000255460000<NA>10<NA>20160922
7783111410-1000341920000<NA>10020141115
관리_허가대장_pk천장재_함유_유무단열재_함유_유무지붕재_함유_유무보온재_함유_유무기타_함유_유무해당없음_유무기타_유무바닥재함유유무작업_일자
9120411470-1000287750000<NA>10<NA>20140308
9987511500-1000566180000<NA>10<NA>20180725
6915511380-1000263140000<NA>10020120104
8480911440-1000630080000<NA>10<NA>20201111
9619611500-1000217640000<NA>10020111227
3977811260-1000261090000<NA>01020170407
4510911290-1000123380000<NA>10020111227
7608911410-1000109360000<NA>10<NA>20111227
1362211200-1000045260000<NA>10<NA>20111227
1867411215-1000165420000<NA>10020111227