Overview

Dataset statistics

Number of variables9
Number of observations38
Missing cells162
Missing cells (%)47.4%
Duplicate rows1
Duplicate rows (%)2.6%
Total size in memory2.8 KiB
Average record size in memory76.5 B

Variable types

Unsupported3
Text5
Categorical1

Dataset

Description건축물의 사용승인 전 도로명주소 부여를 위해 생성되는 건물 정보
Author행정안전부
URLhttps://www.vworld.kr/dtmk/dtmk_ntads_s002.do?dsId=30056

Alerts

Unnamed: 8 has constant value ""Constant
Dataset has 1 (2.6%) duplicate rowsDuplicates
테이블정의서 has 1 (2.6%) missing valuesMissing
Unnamed: 1 has 6 (15.8%) missing valuesMissing
Unnamed: 2 has 5 (13.2%) missing valuesMissing
Unnamed: 4 has 6 (15.8%) missing valuesMissing
Unnamed: 5 has 38 (100.0%) missing valuesMissing
Unnamed: 6 has 33 (86.8%) missing valuesMissing
Unnamed: 7 has 36 (94.7%) missing valuesMissing
Unnamed: 8 has 37 (97.4%) missing valuesMissing
테이블정의서 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-17 03:36:06.328001
Analysis finished2024-04-17 03:36:06.818149
Duration0.49 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

테이블정의서
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing1
Missing (%)2.6%
Memory size436.0 B

Unnamed: 1
Text

MISSING 

Distinct32
Distinct (%)100.0%
Missing6
Missing (%)15.8%
Memory size436.0 B
2024-04-17T12:36:06.923962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length11
Mean length10.09375
Min length4

Characters and Unicode

Total characters323
Distinct characters29
Distinct categories6 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)100.0%

Sample

1st row컬럼ID
2nd rowXGEOMETRY
3rd rowBUL_MAN_NO (PK)
4th rowSIG_CD (PK)
5th rowRN_CD
ValueCountFrequency (%)
pk 2
 
5.9%
컬럼id 1
 
2.9%
mvmn_resn 1
 
2.9%
li_cd 1
 
2.9%
mntn_yn 1
 
2.9%
lnbr_mnnm 1
 
2.9%
lnbr_slno 1
 
2.9%
ntfc_de 1
 
2.9%
mvm_res_cd 1
 
2.9%
mvmn_de 1
 
2.9%
Other values (23) 23
67.6%
2024-04-17T12:36:07.214393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 45
13.9%
42
13.0%
N 34
10.5%
D 25
 
7.7%
M 21
 
6.5%
L 16
 
5.0%
S 16
 
5.0%
B 16
 
5.0%
O 14
 
4.3%
E 13
 
4.0%
Other values (19) 81
25.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 230
71.2%
Connector Punctuation 45
 
13.9%
Space Separator 42
 
13.0%
Open Punctuation 2
 
0.6%
Close Punctuation 2
 
0.6%
Other Letter 2
 
0.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N 34
14.8%
D 25
10.9%
M 21
 
9.1%
L 16
 
7.0%
S 16
 
7.0%
B 16
 
7.0%
O 14
 
6.1%
E 13
 
5.7%
C 12
 
5.2%
U 10
 
4.3%
Other values (13) 53
23.0%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 45
100.0%
Space Separator
ValueCountFrequency (%)
42
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 230
71.2%
Common 91
 
28.2%
Hangul 2
 
0.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
N 34
14.8%
D 25
10.9%
M 21
 
9.1%
L 16
 
7.0%
S 16
 
7.0%
B 16
 
7.0%
O 14
 
6.1%
E 13
 
5.7%
C 12
 
5.2%
U 10
 
4.3%
Other values (13) 53
23.0%
Common
ValueCountFrequency (%)
_ 45
49.5%
42
46.2%
( 2
 
2.2%
) 2
 
2.2%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 321
99.4%
Hangul 2
 
0.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 45
14.0%
42
13.1%
N 34
10.6%
D 25
 
7.8%
M 21
 
6.5%
L 16
 
5.0%
S 16
 
5.0%
B 16
 
5.0%
O 14
 
4.4%
E 13
 
4.0%
Other values (17) 79
24.6%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Unnamed: 2
Text

MISSING 

Distinct32
Distinct (%)97.0%
Missing5
Missing (%)13.2%
Memory size436.0 B
2024-04-17T12:36:07.387190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length8
Mean length5.2121212
Min length3

Characters and Unicode

Total characters172
Distinct characters59
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)93.9%

Sample

1st row컬럼명
2nd row공간이미지정보
3rd row건물일련번호
4th row시군구코드
5th row도로명코드
ValueCountFrequency (%)
이동사유코드 2
 
6.1%
공간이미지정보 1
 
3.0%
건물명 1
 
3.0%
도로구간시군구코드 1
 
3.0%
지하층수 1
 
3.0%
지상층수 1
 
3.0%
건물관리번호 1
 
3.0%
기초구역번호 1
 
3.0%
작업일시 1
 
3.0%
이동일자 1
 
3.0%
Other values (22) 22
66.7%
2024-04-17T12:36:07.642068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
13
 
7.6%
13
 
7.6%
12
 
7.0%
9
 
5.2%
9
 
5.2%
8
 
4.7%
7
 
4.1%
6
 
3.5%
6
 
3.5%
5
 
2.9%
Other values (49) 84
48.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 172
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
13
 
7.6%
13
 
7.6%
12
 
7.0%
9
 
5.2%
9
 
5.2%
8
 
4.7%
7
 
4.1%
6
 
3.5%
6
 
3.5%
5
 
2.9%
Other values (49) 84
48.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 172
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
13
 
7.6%
13
 
7.6%
12
 
7.0%
9
 
5.2%
9
 
5.2%
8
 
4.7%
7
 
4.1%
6
 
3.5%
6
 
3.5%
5
 
2.9%
Other values (49) 84
48.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 172
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
13
 
7.6%
13
 
7.6%
12
 
7.0%
9
 
5.2%
9
 
5.2%
8
 
4.7%
7
 
4.1%
6
 
3.5%
6
 
3.5%
5
 
2.9%
Other values (49) 84
48.8%

Unnamed: 3
Categorical

Distinct5
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Memory size436.0 B
VARCHAR2
19 
NUMBER
11 
<NA>
테이블ID
 
1
타입
 
1

Length

Max length8
Median length7
Mean length6.5526316
Min length2

Unique

Unique2 ?
Unique (%)5.3%

Sample

1st row<NA>
2nd row테이블ID
3rd row<NA>
4th row타입
5th row<NA>

Common Values

ValueCountFrequency (%)
VARCHAR2 19
50.0%
NUMBER 11
28.9%
<NA> 6
 
15.8%
테이블ID 1
 
2.6%
타입 1
 
2.6%

Length

2024-04-17T12:36:07.750610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-17T12:36:07.837612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
varchar2 19
50.0%
number 11
28.9%
na 6
 
15.8%
테이블id 1
 
2.6%
타입 1
 
2.6%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing6
Missing (%)15.8%
Memory size436.0 B

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing38
Missing (%)100.0%
Memory size474.0 B

Unnamed: 6
Text

MISSING 

Distinct4
Distinct (%)80.0%
Missing33
Missing (%)86.8%
Memory size436.0 B
2024-04-17T12:36:07.930634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length4
Mean length3.2
Min length2

Characters and Unicode

Total characters16
Distinct characters11
Distinct categories3 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)60.0%

Sample

1st row작성일
2nd row테이블명
3rd rowPK/FK
4th rowPK
5th rowPK
ValueCountFrequency (%)
pk 2
40.0%
작성일 1
20.0%
테이블명 1
20.0%
pk/fk 1
20.0%
2024-04-17T12:36:08.134413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 4
25.0%
P 3
18.8%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
/ 1
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8
50.0%
Other Letter 7
43.8%
Other Punctuation 1
 
6.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Uppercase Letter
ValueCountFrequency (%)
K 4
50.0%
P 3
37.5%
F 1
 
12.5%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8
50.0%
Hangul 7
43.8%
Common 1
 
6.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Latin
ValueCountFrequency (%)
K 4
50.0%
P 3
37.5%
F 1
 
12.5%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9
56.2%
Hangul 7
43.8%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 4
44.4%
P 3
33.3%
/ 1
 
11.1%
F 1
 
11.1%
Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

Unnamed: 7
Text

MISSING 

Distinct2
Distinct (%)100.0%
Missing36
Missing (%)94.7%
Memory size436.0 B
2024-04-17T12:36:08.241021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length4.5
Mean length4.5
Min length2

Characters and Unicode

Total characters9
Distinct characters9
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)100.0%

Sample

1st row건물
2nd rowDefault
ValueCountFrequency (%)
건물 1
50.0%
default 1
50.0%
2024-04-17T12:36:08.439580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
11.1%
1
11.1%
D 1
11.1%
e 1
11.1%
f 1
11.1%
a 1
11.1%
u 1
11.1%
l 1
11.1%
t 1
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 6
66.7%
Other Letter 2
 
22.2%
Uppercase Letter 1
 
11.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1
16.7%
f 1
16.7%
a 1
16.7%
u 1
16.7%
l 1
16.7%
t 1
16.7%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Uppercase Letter
ValueCountFrequency (%)
D 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 7
77.8%
Hangul 2
 
22.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 1
14.3%
e 1
14.3%
f 1
14.3%
a 1
14.3%
u 1
14.3%
l 1
14.3%
t 1
14.3%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 7
77.8%
Hangul 2
 
22.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
ASCII
ValueCountFrequency (%)
D 1
14.3%
e 1
14.3%
f 1
14.3%
a 1
14.3%
u 1
14.3%
l 1
14.3%
t 1
14.3%

Unnamed: 8
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing37
Missing (%)97.4%
Memory size436.0 B
2024-04-17T12:36:08.550143image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters9
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st row참조테이블명/비고
ValueCountFrequency (%)
참조테이블명/비고 1
100.0%
2024-04-17T12:36:08.738164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
/ 1
11.1%
1
11.1%
1
11.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8
88.9%
Other Punctuation 1
 
11.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8
88.9%
Common 1
 
11.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8
88.9%
ASCII 1
 
11.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
ASCII
ValueCountFrequency (%)
/ 1
100.0%

Correlations

2024-04-17T12:36:08.804505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 6Unnamed: 7
Unnamed: 11.0001.0001.0001.000NaN
Unnamed: 21.0001.0001.0001.000NaN
Unnamed: 31.0001.0001.0001.0000.000
Unnamed: 61.0001.0001.0001.0000.000
Unnamed: 7NaNNaN0.0000.0001.000

Missing values

2024-04-17T12:36:06.548250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-17T12:36:06.649615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-17T12:36:06.746933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0작성자<NA><NA><NA>NaN<NA>작성일<NA><NA>
1주제영역명<NA><NA>테이블IDZ_KAIS_TL_SPBD_BULD<NA>테이블명건물<NA>
2테이블설명<NA><NA><NA>NaN<NA><NA><NA><NA>
3No컬럼ID컬럼명타입길이(Byte)<NA>PK/FKDefault참조테이블명/비고
41XGEOMETRY공간이미지정보<NA>NaN<NA><NA><NA><NA>
52BUL_MAN_NO (PK)건물일련번호NUMBER7<NA>PK<NA><NA>
63SIG_CD (PK)시군구코드VARCHAR25<NA>PK<NA><NA>
74RN_CD도로명코드VARCHAR27<NA><NA><NA><NA>
85RDS_MAN_NO도로구간일련번호NUMBER12<NA><NA><NA><NA>
96BSI_INT_SN기초구간일련번호NUMBER10<NA><NA><NA><NA>
테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
2825MVMN_DE이동일자VARCHAR28<NA><NA><NA><NA>
2926OPERT_DE작업일시VARCHAR214<NA><NA><NA><NA>
3027BSI_ZON_NO기초구역번호NUMBER5<NA><NA><NA><NA>
3128BD_MGT_SN건물관리번호VARCHAR225<NA><NA><NA><NA>
3229GRO_FLO_CO지상층수NUMBER3<NA><NA><NA><NA>
3330UND_FLO_CO지하층수NUMBER3<NA><NA><NA><NA>
3431RDS_SIG_CD도로구간시군구코드VARCHAR25<NA><NA><NA><NA>
35인덱스명<NA>인덱스키<NA>NaN<NA><NA><NA><NA>
36NaN<NA><NA><NA>NaN<NA><NA><NA><NA>
37업무규칙<NA><NA><NA>NaN<NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 6Unnamed: 7Unnamed: 8# duplicates
0<NA><NA><NA><NA><NA><NA>3