Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 10000 |
Missing cells | 3444 |
Missing cells (%) | 5.7% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 566.4 KiB |
Average record size in memory | 58.0 B |
Variable types
Text | 3 |
---|---|
Categorical | 1 |
Numeric | 2 |
Dataset
Description | 관리_폐쇄말소대장_PK,동명칭,호_명,층_구분_코드,층_번호,작업_일자 |
---|---|
Author | 서울특별시 |
URL | https://data.seoul.go.kr/dataList/OA-15749/S/1/datasetView.do |
층_구분_코드 is highly imbalanced (81.9%) | Imbalance |
동명칭 has 3416 (34.2%) missing values | Missing |
관리_폐쇄말소대장_PK has unique values | Unique |
층_번호 has 651 (6.5%) zeros | Zeros |
Reproduction
Analysis started | 2024-05-11 05:37:13.167563 |
---|---|
Analysis finished | 2024-05-11 05:37:15.564202 |
Duration | 2.4 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
관리_폐쇄말소대장_PK
Text
UNIQUE
 
Distinct | 10000 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Length
Max length | 28 |
---|---|
Median length | 15 |
Mean length | 15.948 |
Min length | 10 |
Characters and Unicode
Total characters | 159480 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 10000 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 11650-100777865 |
---|---|
2nd row | 11710-101330516 |
3rd row | 11215-100674522 |
4th row | 11710-100159170 |
5th row | 11530-1000000000000001491424 |
Value | Count | Frequency (%) |
11650-100777865 | 1 | < 0.1% |
11410-100382287 | 1 | < 0.1% |
11530-1000000000000001491061 | 1 | < 0.1% |
11470-100124086 | 1 | < 0.1% |
11710-100299574 | 1 | < 0.1% |
11650-97280 | 1 | < 0.1% |
11530-1000000000000001496691 | 1 | < 0.1% |
11710-100313482 | 1 | < 0.1% |
11710-100298785 | 1 | < 0.1% |
11440-100511145 | 1 | < 0.1% |
Other values (9990) | 9990 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 44590 | |
1 | 41898 | |
- | 10000 | 6.3% |
5 | 9469 | 5.9% |
3 | 9394 | 5.9% |
7 | 9089 | 5.7% |
2 | 8046 | 5.0% |
4 | 7809 | 4.9% |
6 | 7349 | 4.6% |
8 | 6033 | 3.8% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 149480 | |
Dash Punctuation | 10000 | 6.3% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 44590 | |
1 | 41898 | |
5 | 9469 | 6.3% |
3 | 9394 | 6.3% |
7 | 9089 | 6.1% |
2 | 8046 | 5.4% |
4 | 7809 | 5.2% |
6 | 7349 | 4.9% |
8 | 6033 | 4.0% |
9 | 5803 | 3.9% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 10000 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 159480 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 44590 | |
1 | 41898 | |
- | 10000 | 6.3% |
5 | 9469 | 5.9% |
3 | 9394 | 5.9% |
7 | 9089 | 5.7% |
2 | 8046 | 5.0% |
4 | 7809 | 4.9% |
6 | 7349 | 4.6% |
8 | 6033 | 3.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 159480 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 44590 | |
1 | 41898 | |
- | 10000 | 6.3% |
5 | 9469 | 5.9% |
3 | 9394 | 5.9% |
7 | 9089 | 5.7% |
2 | 8046 | 5.0% |
4 | 7809 | 4.9% |
6 | 7349 | 4.6% |
8 | 6033 | 3.8% |
동명칭
Text
MISSING
 
Distinct | 560 |
---|---|
Distinct (%) | 8.5% |
Missing | 3416 |
Missing (%) | 34.2% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
가든파이브라이프 | 1029 | 14.4% |
101동 | 313 | 4.4% |
가든파이브툴 | 210 | 2.9% |
102동 | 200 | 2.8% |
더 | 135 | 1.9% |
상가 | 128 | 1.8% |
비동 | 112 | 1.6% |
103동 | 89 | 1.2% |
2동 | 88 | 1.2% |
시티 | 84 | 1.2% |
Other values (572) | 4736 |
Most occurring characters
Value | Count | Frequency (%) |
동 | 4017 | 12.8% |
1 | 3095 | 9.9% |
이 | 2454 | 7.8% |
0 | 2409 | 7.7% |
2 | 1623 | 5.2% |
가 | 1498 | 4.8% |
파 | 1397 | 4.5% |
3 | 1358 | 4.3% |
브 | 1241 | 4.0% |
든 | 1239 | 4.0% |
Other values (263) | 10964 |
Most occurring categories
Value | Count | Frequency (%) |
Other Letter | 19417 | |
Decimal Number | 11151 | |
Space Separator | 540 | 1.7% |
Uppercase Letter | 99 | 0.3% |
Open Punctuation | 28 | 0.1% |
Close Punctuation | 28 | 0.1% |
Dash Punctuation | 18 | 0.1% |
Other Punctuation | 10 | < 0.1% |
Lowercase Letter | 4 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
동 | 4017 | |
이 | 2454 | |
가 | 1498 | 7.7% |
파 | 1397 | 7.2% |
브 | 1241 | 6.4% |
든 | 1239 | 6.4% |
라 | 1188 | 6.1% |
프 | 1065 | 5.5% |
트 | 227 | 1.2% |
툴 | 210 | 1.1% |
Other values (226) | 4881 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 37 | |
B | 20 | |
C | 10 | 10.1% |
D | 7 | 7.1% |
E | 4 | 4.0% |
I | 3 | 3.0% |
H | 3 | 3.0% |
V | 3 | 3.0% |
T | 3 | 3.0% |
J | 2 | 2.0% |
Other values (5) | 7 | 7.1% |
Decimal Number
Value | Count | Frequency (%) |
1 | 3095 | |
0 | 2409 | |
2 | 1623 | |
3 | 1358 | |
4 | 802 | 7.2% |
5 | 656 | 5.9% |
6 | 386 | 3.5% |
7 | 322 | 2.9% |
8 | 278 | 2.5% |
9 | 222 | 2.0% |
Other Punctuation
Value | Count | Frequency (%) |
* | 6 | |
, | 2 | 20.0% |
& | 1 | 10.0% |
. | 1 | 10.0% |
Lowercase Letter
Value | Count | Frequency (%) |
u | 1 | |
s | 1 | |
e | 1 | |
o | 1 |
Space Separator
Value | Count | Frequency (%) |
540 |
Open Punctuation
Value | Count | Frequency (%) |
( | 28 |
Close Punctuation
Value | Count | Frequency (%) |
) | 28 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 18 |
Most occurring scripts
Value | Count | Frequency (%) |
Hangul | 19417 | |
Common | 11775 | |
Latin | 103 | 0.3% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
동 | 4017 | |
이 | 2454 | |
가 | 1498 | 7.7% |
파 | 1397 | 7.2% |
브 | 1241 | 6.4% |
든 | 1239 | 6.4% |
라 | 1188 | 6.1% |
프 | 1065 | 5.5% |
트 | 227 | 1.2% |
툴 | 210 | 1.1% |
Other values (226) | 4881 |
Latin
Value | Count | Frequency (%) |
A | 37 | |
B | 20 | |
C | 10 | 9.7% |
D | 7 | 6.8% |
E | 4 | 3.9% |
I | 3 | 2.9% |
H | 3 | 2.9% |
V | 3 | 2.9% |
T | 3 | 2.9% |
J | 2 | 1.9% |
Other values (9) | 11 | 10.7% |
Common
Value | Count | Frequency (%) |
1 | 3095 | |
0 | 2409 | |
2 | 1623 | |
3 | 1358 | |
4 | 802 | 6.8% |
5 | 656 | 5.6% |
540 | 4.6% | |
6 | 386 | 3.3% |
7 | 322 | 2.7% |
8 | 278 | 2.4% |
Other values (8) | 306 | 2.6% |
Most occurring blocks
Value | Count | Frequency (%) |
Hangul | 19417 | |
ASCII | 11878 |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
동 | 4017 | |
이 | 2454 | |
가 | 1498 | 7.7% |
파 | 1397 | 7.2% |
브 | 1241 | 6.4% |
든 | 1239 | 6.4% |
라 | 1188 | 6.1% |
프 | 1065 | 5.5% |
트 | 227 | 1.2% |
툴 | 210 | 1.1% |
Other values (226) | 4881 |
ASCII
Value | Count | Frequency (%) |
1 | 3095 | |
0 | 2409 | |
2 | 1623 | |
3 | 1358 | |
4 | 802 | 6.8% |
5 | 656 | 5.5% |
540 | 4.5% | |
6 | 386 | 3.2% |
7 | 322 | 2.7% |
8 | 278 | 2.3% |
Other values (27) | 409 | 3.4% |
호_명
Text
Distinct | 3376 |
---|---|
Distinct (%) | 33.9% |
Missing | 28 |
Missing (%) | 0.3% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
401 | 143 | 1.4% |
201 | 134 | 1.3% |
301 | 132 | 1.3% |
302 | 131 | 1.3% |
202 | 130 | 1.3% |
101호 | 110 | 1.1% |
101 | 103 | 1.0% |
202호 | 103 | 1.0% |
501 | 101 | 1.0% |
201호 | 101 | 1.0% |
Other values (3287) | 8905 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 8773 | |
1 | 7304 | |
2 | 4548 | |
호 | 3379 | 8.0% |
3 | 3154 | 7.5% |
4 | 2398 | 5.7% |
5 | 1965 | 4.7% |
6 | 1483 | 3.5% |
- | 1468 | 3.5% |
7 | 1303 | 3.1% |
Other values (82) | 6203 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 33085 | |
Other Letter | 5256 | 12.5% |
Uppercase Letter | 1952 | 4.7% |
Dash Punctuation | 1468 | 3.5% |
Space Separator | 121 | 0.3% |
Close Punctuation | 41 | 0.1% |
Open Punctuation | 41 | 0.1% |
Other Punctuation | 14 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
호 | 3379 | |
층 | 918 | 17.5% |
지 | 225 | 4.3% |
동 | 140 | 2.7% |
하 | 90 | 1.7% |
제 | 82 | 1.6% |
오 | 73 | 1.4% |
비 | 49 | 0.9% |
상 | 33 | 0.6% |
가 | 30 | 0.6% |
Other values (52) | 237 | 4.5% |
Uppercase Letter
Value | Count | Frequency (%) |
B | 535 | |
T | 291 | |
Y | 274 | |
L | 266 | |
F | 239 | |
A | 117 | 6.0% |
E | 77 | 3.9% |
S | 51 | 2.6% |
D | 47 | 2.4% |
C | 42 | 2.2% |
Other values (4) | 13 | 0.7% |
Decimal Number
Value | Count | Frequency (%) |
0 | 8773 | |
1 | 7304 | |
2 | 4548 | |
3 | 3154 | 9.5% |
4 | 2398 | 7.2% |
5 | 1965 | 5.9% |
6 | 1483 | 4.5% |
7 | 1303 | 3.9% |
8 | 1134 | 3.4% |
9 | 1023 | 3.1% |
Other Punctuation
Value | Count | Frequency (%) |
, | 9 | |
. | 5 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 1468 |
Space Separator
Value | Count | Frequency (%) |
121 |
Close Punctuation
Value | Count | Frequency (%) |
) | 41 |
Open Punctuation
Value | Count | Frequency (%) |
( | 41 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 34770 | |
Hangul | 5256 | 12.5% |
Latin | 1952 | 4.7% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
호 | 3379 | |
층 | 918 | 17.5% |
지 | 225 | 4.3% |
동 | 140 | 2.7% |
하 | 90 | 1.7% |
제 | 82 | 1.6% |
오 | 73 | 1.4% |
비 | 49 | 0.9% |
상 | 33 | 0.6% |
가 | 30 | 0.6% |
Other values (52) | 237 | 4.5% |
Common
Value | Count | Frequency (%) |
0 | 8773 | |
1 | 7304 | |
2 | 4548 | |
3 | 3154 | 9.1% |
4 | 2398 | 6.9% |
5 | 1965 | 5.7% |
6 | 1483 | 4.3% |
- | 1468 | 4.2% |
7 | 1303 | 3.7% |
8 | 1134 | 3.3% |
Other values (6) | 1240 | 3.6% |
Latin
Value | Count | Frequency (%) |
B | 535 | |
T | 291 | |
Y | 274 | |
L | 266 | |
F | 239 | |
A | 117 | 6.0% |
E | 77 | 3.9% |
S | 51 | 2.6% |
D | 47 | 2.4% |
C | 42 | 2.2% |
Other values (4) | 13 | 0.7% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 36722 | |
Hangul | 5252 | 12.5% |
Compat Jamo | 4 | < 0.1% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 8773 | |
1 | 7304 | |
2 | 4548 | |
3 | 3154 | 8.6% |
4 | 2398 | 6.5% |
5 | 1965 | 5.4% |
6 | 1483 | 4.0% |
- | 1468 | 4.0% |
7 | 1303 | 3.5% |
8 | 1134 | 3.1% |
Other values (20) | 3192 | 8.7% |
Hangul
Value | Count | Frequency (%) |
호 | 3379 | |
층 | 918 | 17.5% |
지 | 225 | 4.3% |
동 | 140 | 2.7% |
하 | 90 | 1.7% |
제 | 82 | 1.6% |
오 | 73 | 1.4% |
비 | 49 | 0.9% |
상 | 33 | 0.6% |
가 | 30 | 0.6% |
Other values (48) | 233 | 4.4% |
Compat Jamo
Value | Count | Frequency (%) |
ㄹ | 1 | |
ㅇ | 1 | |
ㄴ | 1 | |
ㅁ | 1 |
층_구분_코드
Categorical
IMBALANCE
 
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
지상 | |
---|---|
지하 | 682 |
옥탑 | 1 |
<NA> | 1 |
Length
Max length | 4 |
---|---|
Median length | 2 |
Mean length | 2.0002 |
Min length | 2 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 지상 |
---|---|
2nd row | 지상 |
3rd row | 지상 |
4th row | 지상 |
5th row | 지상 |
Common Values
Value | Count | Frequency (%) |
지상 | 9316 | |
지하 | 682 | 6.8% |
옥탑 | 1 | < 0.1% |
<NA> | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
지상 | 9316 | |
지하 | 682 | 6.8% |
옥탑 | 1 | < 0.1% |
na | 1 | < 0.1% |
층_번호
Real number (ℝ)
ZEROS
 
Distinct | 44 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5.4313 |
Minimum | 0 |
---|---|
Maximum | 49 |
Zeros | 651 |
Zeros (%) | 6.5% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 2 |
median | 4 |
Q3 | 8 |
95-th percentile | 16 |
Maximum | 49 |
Range | 49 |
Interquartile range (IQR) | 6 |
Descriptive statistics
Standard deviation | 5.3727616 |
---|---|
Coefficient of variation (CV) | 0.98922202 |
Kurtosis | 5.5267689 |
Mean | 5.4313 |
Median Absolute Deviation (MAD) | 3 |
Skewness | 1.9371405 |
Sum | 54313 |
Variance | 28.866567 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 1745 | |
2 | 1324 | |
3 | 1108 | |
4 | 859 | |
5 | 722 | |
0 | 651 | 6.5% |
6 | 524 | 5.2% |
7 | 468 | 4.7% |
9 | 418 | 4.2% |
8 | 416 | 4.2% |
Other values (34) | 1765 |
Value | Count | Frequency (%) |
0 | 651 | 6.5% |
1 | 1745 | |
2 | 1324 | |
3 | 1108 | |
4 | 859 | |
5 | 722 | |
6 | 524 | 5.2% |
7 | 468 | 4.7% |
8 | 416 | 4.2% |
9 | 418 | 4.2% |
Value | Count | Frequency (%) |
49 | 1 | < 0.1% |
44 | 1 | < 0.1% |
42 | 1 | < 0.1% |
40 | 4 | |
39 | 1 | < 0.1% |
38 | 2 | < 0.1% |
37 | 5 | |
36 | 6 | |
35 | 4 | |
34 | 1 | < 0.1% |
작업_일자
Real number (ℝ)
Distinct | 123 |
---|---|
Distinct (%) | 1.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20225085 |
Minimum | 20201201 |
---|---|
Maximum | 20240327 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 20201201 |
---|---|
5-th percentile | 20211029 |
Q1 | 20211214 |
median | 20231104 |
Q3 | 20231104 |
95-th percentile | 20231124 |
Maximum | 20240327 |
Range | 39126 |
Interquartile range (IQR) | 19890 |
Descriptive statistics
Standard deviation | 8922.0596 |
---|---|
Coefficient of variation (CV) | 0.00044113831 |
Kurtosis | -0.85324708 |
Mean | 20225085 |
Median Absolute Deviation (MAD) | 20 |
Skewness | -0.96863241 |
Sum | 2.0225085 × 1011 |
Variance | 79603147 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20231104 | 3171 | |
20211029 | 2068 | |
20231124 | 1092 | 10.9% |
20231110 | 778 | 7.8% |
20230329 | 467 | 4.7% |
20230321 | 415 | 4.2% |
20231028 | 407 | 4.1% |
20230324 | 164 | 1.6% |
20211127 | 106 | 1.1% |
20230411 | 73 | 0.7% |
Other values (113) | 1259 | 12.6% |
Value | Count | Frequency (%) |
20201201 | 11 | 0.1% |
20201204 | 3 | < 0.1% |
20201208 | 6 | 0.1% |
20201216 | 36 | |
20201230 | 33 | |
20210106 | 8 | 0.1% |
20210108 | 2 | < 0.1% |
20210119 | 60 | |
20210126 | 10 | 0.1% |
20210130 | 11 | 0.1% |
Value | Count | Frequency (%) |
20240327 | 1 | < 0.1% |
20231124 | 1092 | 10.9% |
20231110 | 778 | 7.8% |
20231104 | 3171 | |
20231028 | 407 | 4.1% |
20230908 | 15 | 0.1% |
20230708 | 2 | < 0.1% |
20230602 | 2 | < 0.1% |
20230527 | 3 | < 0.1% |
20230422 | 35 | 0.4% |
층_구분_코드 | 층_번호 | 작업_일자 | |
---|---|---|---|
층_구분_코드 | 1.000 | 0.263 | 0.207 |
층_번호 | 0.263 | 1.000 | 0.232 |
작업_일자 | 0.207 | 0.232 | 1.000 |
층_번호 | 작업_일자 | 층_구분_코드 | |
---|---|---|---|
층_번호 | 1.000 | 0.259 | 0.162 |
작업_일자 | 0.259 | 1.000 | 0.084 |
층_구분_코드 | 0.162 | 0.084 | 1.000 |
관리_폐쇄말소대장_PK | 동명칭 | 호_명 | 층_구분_코드 | 층_번호 | 작업_일자 | |
---|---|---|---|---|---|---|
42648 | 11650-100777865 | <NA> | 102호 | 지상 | 1 | 20211029 |
31041 | 11710-101330516 | 509동 | 1702 | 지상 | 17 | 20211029 |
57546 | 11215-100674522 | 더 라움 펜트하우스 | 2동-501호 | 지상 | 5 | 20231110 |
8555 | 11710-100159170 | 가든파이브라이프 | Y-3034 | 지상 | 3 | 20231104 |
86755 | 11530-1000000000000001491424 | 305동 | 204 | 지상 | 2 | 20230329 |
51294 | 11650-100864310 | 341동 | 304호 | 지상 | 0 | 20220309 |
96569 | 11545-61161 | <NA> | 3층 301호 | 지상 | 3 | 20230104 |
65677 | 11710-101406822 | 1503동 | 803 | 지상 | 8 | 20211029 |
39530 | 11650-100776270 | <NA> | 301호 | 지상 | 3 | 20211029 |
92190 | 11650-97586 | 엘동 | 제1의5호 | 지상 | 1 | 20230321 |
관리_폐쇄말소대장_PK | 동명칭 | 호_명 | 층_구분_코드 | 층_번호 | 작업_일자 | |
---|---|---|---|---|---|---|
56716 | 11590-100403851 | <NA> | 202 | 지상 | 2 | 20211029 |
2250 | 11545-100353173 | <NA> | 1006 | 지상 | 10 | 20231104 |
20695 | 11710-100313448 | 가든파이브라이프 | T-6084 | 지상 | 6 | 20231104 |
71572 | 11440-100735347 | 102동 | 1403 | 지상 | 14 | 20231104 |
99216 | 11650-97329 | 102동 | 306호 | 지상 | 3 | 20230411 |
13520 | 11260-100132900 | <NA> | A(EAST)동603호 | 지상 | 6 | 20231110 |
16608 | 11710-101527957 | 1208동 | 1801 | 지상 | 18 | 20231104 |
66921 | 11650-100827557 | 319동 | 202호 | 지상 | 0 | 20211029 |
4288 | 11545-100310328 | <NA> | 1309 | 지상 | 13 | 20231104 |
44685 | 11650-100779189 | <NA> | 101호 | 지상 | 1 | 20211029 |