Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 10000 |
Missing cells | 3825 |
Missing cells (%) | 9.6% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 390.6 KiB |
Average record size in memory | 40.0 B |
Variable types
Text | 3 |
---|---|
Categorical | 1 |
Dataset
Description | 관리_건축물대장_PK,동명칭,호_명,층_구분_코드 |
---|---|
Author | 서울특별시 |
URL | https://data.seoul.go.kr/dataList/OA-15393/S/1/datasetView.do |
층_구분_코드 is highly imbalanced (86.5%) | Imbalance |
동명칭 has 3816 (38.2%) missing values | Missing |
관리_건축물대장_PK has unique values | Unique |
Reproduction
Analysis started | 2024-05-18 03:51:22.758348 |
---|---|
Analysis finished | 2024-05-18 03:51:24.850108 |
Duration | 2.09 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
관리_건축물대장_PK
Text
UNIQUE
 
Distinct | 10000 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
Length
Max length | 28 |
---|---|
Median length | 11 |
Mean length | 12.8371 |
Min length | 11 |
Characters and Unicode
Total characters | 128371 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique
Unique | 10000 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 11320-20542 |
---|---|
2nd row | 11530-100243033 |
3rd row | 11380-100182917 |
4th row | 11170-75974 |
5th row | 11350-92272 |
Value | Count | Frequency (%) |
11320-20542 | 1 | < 0.1% |
11410-91737 | 1 | < 0.1% |
11710-74309 | 1 | < 0.1% |
11440-49148 | 1 | < 0.1% |
11170-43779 | 1 | < 0.1% |
11710-74768 | 1 | < 0.1% |
11350-52486 | 1 | < 0.1% |
11380-111671 | 1 | < 0.1% |
11590-100201159 | 1 | < 0.1% |
11260-100270427 | 1 | < 0.1% |
Other values (9990) | 9990 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 31827 | |
0 | 25347 | |
- | 10000 | 7.8% |
5 | 9838 | 7.7% |
3 | 9738 | 7.6% |
2 | 9385 | 7.3% |
4 | 6899 | 5.4% |
8 | 6766 | 5.3% |
6 | 6474 | 5.0% |
7 | 6453 | 5.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 118371 | |
Dash Punctuation | 10000 | 7.8% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 31827 | |
0 | 25347 | |
5 | 9838 | 8.3% |
3 | 9738 | 8.2% |
2 | 9385 | 7.9% |
4 | 6899 | 5.8% |
8 | 6766 | 5.7% |
6 | 6474 | 5.5% |
7 | 6453 | 5.5% |
9 | 5644 | 4.8% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 10000 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 128371 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 31827 | |
0 | 25347 | |
- | 10000 | 7.8% |
5 | 9838 | 7.7% |
3 | 9738 | 7.6% |
2 | 9385 | 7.3% |
4 | 6899 | 5.4% |
8 | 6766 | 5.3% |
6 | 6474 | 5.0% |
7 | 6453 | 5.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 128371 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 31827 | |
0 | 25347 | |
- | 10000 | 7.8% |
5 | 9838 | 7.7% |
3 | 9738 | 7.6% |
2 | 9385 | 7.3% |
4 | 6899 | 5.4% |
8 | 6766 | 5.3% |
6 | 6474 | 5.0% |
7 | 6453 | 5.0% |
동명칭
Text
MISSING
 
Distinct | 778 |
---|---|
Distinct (%) | 12.6% |
Missing | 3816 |
Missing (%) | 38.2% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
101동 | 505 | 7.8% |
102동 | 376 | 5.8% |
103동 | 237 | 3.7% |
104동 | 223 | 3.5% |
105동 | 217 | 3.4% |
106동 | 216 | 3.4% |
108동 | 115 | 1.8% |
110동 | 92 | 1.4% |
109동 | 90 | 1.4% |
203동 | 84 | 1.3% |
Other values (832) | 4279 |
Most occurring characters
Value | Count | Frequency (%) |
동 | 5602 | |
1 | 5039 | |
0 | 3914 | |
2 | 1685 | 6.6% |
3 | 1174 | 4.6% |
4 | 943 | 3.7% |
5 | 729 | 2.9% |
6 | 684 | 2.7% |
8 | 520 | 2.0% |
7 | 367 | 1.4% |
Other values (325) | 4843 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 15361 | |
Other Letter | 9329 | |
Uppercase Letter | 407 | 1.6% |
Space Separator | 250 | 1.0% |
Close Punctuation | 43 | 0.2% |
Open Punctuation | 43 | 0.2% |
Lowercase Letter | 32 | 0.1% |
Dash Punctuation | 25 | 0.1% |
Other Punctuation | 8 | < 0.1% |
Letter Number | 2 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
동 | 5602 | |
가 | 219 | 2.3% |
빌 | 166 | 1.8% |
상 | 147 | 1.6% |
스 | 129 | 1.4% |
리 | 94 | 1.0% |
아 | 90 | 1.0% |
트 | 89 | 1.0% |
이 | 88 | 0.9% |
주 | 73 | 0.8% |
Other values (280) | 2632 |
Uppercase Letter
Value | Count | Frequency (%) |
A | 76 | |
B | 55 | |
T | 50 | |
S | 25 | 6.1% |
E | 25 | 6.1% |
W | 23 | 5.7% |
V | 20 | 4.9% |
R | 20 | 4.9% |
O | 18 | 4.4% |
I | 15 | 3.7% |
Other values (12) | 80 |
Decimal Number
Value | Count | Frequency (%) |
1 | 5039 | |
0 | 3914 | |
2 | 1685 | 11.0% |
3 | 1174 | 7.6% |
4 | 943 | 6.1% |
5 | 729 | 4.7% |
6 | 684 | 4.5% |
8 | 520 | 3.4% |
7 | 367 | 2.4% |
9 | 306 | 2.0% |
Lowercase Letter
Value | Count | Frequency (%) |
l | 14 | |
e | 6 | |
z | 6 | |
i | 6 |
Other Punctuation
Value | Count | Frequency (%) |
. | 6 | |
& | 1 | 12.5% |
, | 1 | 12.5% |
Letter Number
Value | Count | Frequency (%) |
Ⅴ | 1 | |
Ⅱ | 1 |
Space Separator
Value | Count | Frequency (%) |
250 |
Close Punctuation
Value | Count | Frequency (%) |
) | 43 |
Open Punctuation
Value | Count | Frequency (%) |
( | 43 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 25 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 15730 | |
Hangul | 9329 | |
Latin | 441 | 1.7% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
동 | 5602 | |
가 | 219 | 2.3% |
빌 | 166 | 1.8% |
상 | 147 | 1.6% |
스 | 129 | 1.4% |
리 | 94 | 1.0% |
아 | 90 | 1.0% |
트 | 89 | 1.0% |
이 | 88 | 0.9% |
주 | 73 | 0.8% |
Other values (280) | 2632 |
Latin
Value | Count | Frequency (%) |
A | 76 | |
B | 55 | |
T | 50 | |
S | 25 | 5.7% |
E | 25 | 5.7% |
W | 23 | 5.2% |
V | 20 | 4.5% |
R | 20 | 4.5% |
O | 18 | 4.1% |
I | 15 | 3.4% |
Other values (18) | 114 |
Common
Value | Count | Frequency (%) |
1 | 5039 | |
0 | 3914 | |
2 | 1685 | 10.7% |
3 | 1174 | 7.5% |
4 | 943 | 6.0% |
5 | 729 | 4.6% |
6 | 684 | 4.3% |
8 | 520 | 3.3% |
7 | 367 | 2.3% |
9 | 306 | 1.9% |
Other values (7) | 369 | 2.3% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 16169 | |
Hangul | 9329 | |
Number Forms | 2 | < 0.1% |
Most frequent character per block
Hangul
Value | Count | Frequency (%) |
동 | 5602 | |
가 | 219 | 2.3% |
빌 | 166 | 1.8% |
상 | 147 | 1.6% |
스 | 129 | 1.4% |
리 | 94 | 1.0% |
아 | 90 | 1.0% |
트 | 89 | 1.0% |
이 | 88 | 0.9% |
주 | 73 | 0.8% |
Other values (280) | 2632 |
ASCII
Value | Count | Frequency (%) |
1 | 5039 | |
0 | 3914 | |
2 | 1685 | 10.4% |
3 | 1174 | 7.3% |
4 | 943 | 5.8% |
5 | 729 | 4.5% |
6 | 684 | 4.2% |
8 | 520 | 3.2% |
7 | 367 | 2.3% |
9 | 306 | 1.9% |
Other values (33) | 808 | 5.0% |
Number Forms
Value | Count | Frequency (%) |
Ⅴ | 1 | |
Ⅱ | 1 |
호_명
Text
Distinct | 1822 |
---|---|
Distinct (%) | 18.2% |
Missing | 9 |
Missing (%) | 0.1% |
Memory size | 156.2 KiB |
Value | Count | Frequency (%) |
301 | 211 | 2.1% |
401 | 189 | 1.9% |
201 | 187 | 1.9% |
202 | 166 | 1.7% |
302 | 159 | 1.6% |
402 | 155 | 1.5% |
501 | 148 | 1.5% |
201호 | 136 | 1.4% |
101 | 130 | 1.3% |
301호 | 116 | 1.2% |
Other values (1779) | 8456 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 9331 | |
1 | 7517 | |
호 | 4909 | |
2 | 4449 | |
3 | 3027 | 7.6% |
4 | 2400 | 6.0% |
5 | 1902 | 4.8% |
6 | 1371 | 3.4% |
7 | 1147 | 2.9% |
8 | 868 | 2.2% |
Other values (70) | 2931 | 7.4% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 32842 | |
Other Letter | 6192 | 15.5% |
Uppercase Letter | 381 | 1.0% |
Dash Punctuation | 324 | 0.8% |
Space Separator | 62 | 0.2% |
Open Punctuation | 18 | < 0.1% |
Close Punctuation | 18 | < 0.1% |
Connector Punctuation | 8 | < 0.1% |
Other Punctuation | 6 | < 0.1% |
Lowercase Letter | 1 | < 0.1% |
Most frequent character per category
Other Letter
Value | Count | Frequency (%) |
호 | 4909 | |
층 | 611 | 9.9% |
지 | 188 | 3.0% |
동 | 107 | 1.7% |
하 | 53 | 0.9% |
아 | 36 | 0.6% |
오 | 32 | 0.5% |
비 | 26 | 0.4% |
상 | 24 | 0.4% |
가 | 24 | 0.4% |
Other values (36) | 182 | 2.9% |
Uppercase Letter
Value | Count | Frequency (%) |
B | 188 | |
A | 80 | |
S | 20 | 5.2% |
E | 19 | 5.0% |
T | 17 | 4.5% |
W | 12 | 3.1% |
C | 11 | 2.9% |
F | 10 | 2.6% |
O | 9 | 2.4% |
G | 4 | 1.0% |
Other values (5) | 11 | 2.9% |
Decimal Number
Value | Count | Frequency (%) |
0 | 9331 | |
1 | 7517 | |
2 | 4449 | |
3 | 3027 | 9.2% |
4 | 2400 | 7.3% |
5 | 1902 | 5.8% |
6 | 1371 | 4.2% |
7 | 1147 | 3.5% |
8 | 868 | 2.6% |
9 | 830 | 2.5% |
Other Punctuation
Value | Count | Frequency (%) |
. | 4 | |
: | 1 | 16.7% |
, | 1 | 16.7% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 324 |
Space Separator
Value | Count | Frequency (%) |
62 |
Open Punctuation
Value | Count | Frequency (%) |
( | 18 |
Close Punctuation
Value | Count | Frequency (%) |
) | 18 |
Connector Punctuation
Value | Count | Frequency (%) |
_ | 8 |
Lowercase Letter
Value | Count | Frequency (%) |
b | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 33278 | |
Hangul | 6192 | 15.5% |
Latin | 382 | 1.0% |
Most frequent character per script
Hangul
Value | Count | Frequency (%) |
호 | 4909 | |
층 | 611 | 9.9% |
지 | 188 | 3.0% |
동 | 107 | 1.7% |
하 | 53 | 0.9% |
아 | 36 | 0.6% |
오 | 32 | 0.5% |
비 | 26 | 0.4% |
상 | 24 | 0.4% |
가 | 24 | 0.4% |
Other values (36) | 182 | 2.9% |
Common
Value | Count | Frequency (%) |
0 | 9331 | |
1 | 7517 | |
2 | 4449 | |
3 | 3027 | 9.1% |
4 | 2400 | 7.2% |
5 | 1902 | 5.7% |
6 | 1371 | 4.1% |
7 | 1147 | 3.4% |
8 | 868 | 2.6% |
9 | 830 | 2.5% |
Other values (8) | 436 | 1.3% |
Latin
Value | Count | Frequency (%) |
B | 188 | |
A | 80 | |
S | 20 | 5.2% |
E | 19 | 5.0% |
T | 17 | 4.5% |
W | 12 | 3.1% |
C | 11 | 2.9% |
F | 10 | 2.6% |
O | 9 | 2.4% |
G | 4 | 1.0% |
Other values (6) | 12 | 3.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 33660 | |
Hangul | 6192 | 15.5% |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 9331 | |
1 | 7517 | |
2 | 4449 | |
3 | 3027 | 9.0% |
4 | 2400 | 7.1% |
5 | 1902 | 5.7% |
6 | 1371 | 4.1% |
7 | 1147 | 3.4% |
8 | 868 | 2.6% |
9 | 830 | 2.5% |
Other values (24) | 818 | 2.4% |
Hangul
Value | Count | Frequency (%) |
호 | 4909 | |
층 | 611 | 9.9% |
지 | 188 | 3.0% |
동 | 107 | 1.7% |
하 | 53 | 0.9% |
아 | 36 | 0.6% |
오 | 32 | 0.5% |
비 | 26 | 0.4% |
상 | 24 | 0.4% |
가 | 24 | 0.4% |
Other values (36) | 182 | 2.9% |
층_구분_코드
Categorical
IMBALANCE
 
Distinct | 3 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
지상 | |
---|---|
지하 | 338 |
옥탑 | 1 |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | < 0.1% |
Sample
1st row | 지상 |
---|---|
2nd row | 지상 |
3rd row | 지상 |
4th row | 지상 |
5th row | 지상 |
Common Values
Value | Count | Frequency (%) |
지상 | 9661 | |
지하 | 338 | 3.4% |
옥탑 | 1 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
지상 | 9661 | |
지하 | 338 | 3.4% |
옥탑 | 1 | < 0.1% |
관리_건축물대장_PK | 동명칭 | 호_명 | 층_구분_코드 | |
---|---|---|---|---|
32179 | 11320-20542 | 111동 | 605호 | 지상 |
73807 | 11530-100243033 | 804동 | 1004 | 지상 |
61660 | 11380-100182917 | 811동 | 708 | 지상 |
69431 | 11170-75974 | <NA> | 102호 | 지상 |
39957 | 11350-92272 | 104동 | 809호 | 지상 |
9880 | 11590-95572 | 204동 | 606 | 지상 |
42331 | 11590-100219793 | 에이동 | 202 | 지상 |
69308 | 11170-79517 | (2단지) | 202-2705 | 지상 |
54539 | 11350-91206 | 10동 | 507호 | 지상 |
39483 | 11440-58822 | <NA> | 아-502 | 지상 |
관리_건축물대장_PK | 동명칭 | 호_명 | 층_구분_코드 | |
---|---|---|---|---|
88217 | 11380-100200182 | 332동 | 1002 | 지상 |
32551 | 11470-89216 | <NA> | 402호 | 지상 |
29091 | 11350-95910 | 203동 | 401호 | 지상 |
59883 | 11170-64086 | <NA> | 209호 | 지상 |
21739 | 11710-152434 | 상가 | 3층1호 | 지상 |
67417 | 11470-113238 | 106동 | 309호 | 지상 |
67990 | 11230-100256270 | <NA> | 604 | 지상 |
80807 | 11230-100181318 | <NA> | 501 | 지상 |
75080 | 11380-100199474 | 317동 | 605 | 지상 |
65976 | 11560-69081 | <NA> | 1층마-8호 | 지상 |