Overview

Dataset statistics

Number of variables9
Number of observations78
Missing cells244
Missing cells (%)34.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.6 KiB
Average record size in memory73.7 B

Variable types

Unsupported3
Text4
Categorical1
Boolean1

Dataset

Description공시지가 토지특성 2016
Author국토교통부
URLhttps://www.vworld.kr/dtmk/dtmk_ntads_s002.do?dsId=30536

Alerts

Unnamed: 8 has constant value ""Constant
Unnamed: 5 is highly imbalanced (68.7%)Imbalance
Unnamed: 1 has 6 (7.7%) missing valuesMissing
Unnamed: 2 has 1 (1.3%) missing valuesMissing
Unnamed: 4 has 5 (6.4%) missing valuesMissing
Unnamed: 5 has 7 (9.0%) missing valuesMissing
Unnamed: 6 has 73 (93.6%) missing valuesMissing
Unnamed: 7 has 75 (96.2%) missing valuesMissing
Unnamed: 8 has 77 (98.7%) missing valuesMissing
테이블정의서 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-04-18 00:53:06.880295
Analysis finished2024-04-18 00:53:08.359536
Duration1.48 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

테이블정의서
Unsupported

REJECTED  UNSUPPORTED 

Missing0
Missing (%)0.0%
Memory size756.0 B

Unnamed: 1
Text

MISSING 

Distinct72
Distinct (%)100.0%
Missing6
Missing (%)7.7%
Memory size756.0 B
2024-04-18T09:53:08.537074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length15
Median length12
Mean length8.7777778
Min length3

Characters and Unicode

Total characters632
Distinct characters33
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique72 ?
Unique (%)100.0%

Sample

1st row컬럼ID
2nd rowSTDMT
3rd rowPNU
4th rowLAND_SEQNO
5th rowSGG_CD
ValueCountFrequency (%)
컬럼id 1
 
1.4%
stdmt 1
 
1.4%
calc_jiga 1
 
1.4%
prev_jiga 1
 
1.4%
py_jiga 1
 
1.4%
handwk_yn 1
 
1.4%
lclw_step_cd 1
 
1.4%
lclw_mthd_cd 1
 
1.4%
harm_wast 1
 
1.4%
harm_rail 1
 
1.4%
Other values (62) 62
86.1%
2024-04-18T09:53:08.926007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
_ 77
 
12.2%
A 59
 
9.3%
R 41
 
6.5%
C 37
 
5.9%
D 34
 
5.4%
N 34
 
5.4%
E 32
 
5.1%
T 32
 
5.1%
S 30
 
4.7%
P 27
 
4.3%
Other values (23) 229
36.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 532
84.2%
Connector Punctuation 77
 
12.2%
Decimal Number 21
 
3.3%
Other Letter 2
 
0.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 59
 
11.1%
R 41
 
7.7%
C 37
 
7.0%
D 34
 
6.4%
N 34
 
6.4%
E 32
 
6.0%
T 32
 
6.0%
S 30
 
5.6%
P 27
 
5.1%
L 25
 
4.7%
Other values (16) 181
34.0%
Decimal Number
ValueCountFrequency (%)
2 12
57.1%
1 7
33.3%
3 1
 
4.8%
4 1
 
4.8%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 77
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 532
84.2%
Common 98
 
15.5%
Hangul 2
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A 59
 
11.1%
R 41
 
7.7%
C 37
 
7.0%
D 34
 
6.4%
N 34
 
6.4%
E 32
 
6.0%
T 32
 
6.0%
S 30
 
5.6%
P 27
 
5.1%
L 25
 
4.7%
Other values (16) 181
34.0%
Common
ValueCountFrequency (%)
_ 77
78.6%
2 12
 
12.2%
1 7
 
7.1%
3 1
 
1.0%
4 1
 
1.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 630
99.7%
Hangul 2
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
_ 77
 
12.2%
A 59
 
9.4%
R 41
 
6.5%
C 37
 
5.9%
D 34
 
5.4%
N 34
 
5.4%
E 32
 
5.1%
T 32
 
5.1%
S 30
 
4.8%
P 27
 
4.3%
Other values (21) 227
36.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Unnamed: 2
Text

MISSING 

Distinct77
Distinct (%)100.0%
Missing1
Missing (%)1.3%
Memory size756.0 B
2024-04-18T09:53:09.165630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length14
Median length10
Mean length5.4805195
Min length2

Characters and Unicode

Total characters422
Distinct characters130
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique77 ?
Unique (%)100.0%

Sample

1st row김민호
2nd row가격업무
3rd row공시지가 토지특성 2016
4th row컬럼명
5th row기준월
ValueCountFrequency (%)
김민호 1
 
1.2%
토지구분 1
 
1.2%
산정지가 1
 
1.2%
종전지가 1
 
1.2%
전년지가 1
 
1.2%
수작업여부 1
 
1.2%
대규모개발사업단계코드 1
 
1.2%
대규모개발사업방식코드 1
 
1.2%
3년전지가 1
 
1.2%
유해철도 1
 
1.2%
Other values (70) 70
87.5%
2024-04-18T09:53:09.532099image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
38
 
9.0%
17
 
4.0%
13
 
3.1%
2 13
 
3.1%
12
 
2.8%
12
 
2.8%
12
 
2.8%
11
 
2.6%
10
 
2.4%
10
 
2.4%
Other values (120) 274
64.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 385
91.2%
Decimal Number 25
 
5.9%
Uppercase Letter 8
 
1.9%
Space Separator 3
 
0.7%
Other Punctuation 1
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
38
 
9.9%
17
 
4.4%
13
 
3.4%
12
 
3.1%
12
 
3.1%
12
 
3.1%
11
 
2.9%
10
 
2.6%
10
 
2.6%
9
 
2.3%
Other values (105) 241
62.6%
Uppercase Letter
ValueCountFrequency (%)
T 2
25.0%
N 1
12.5%
P 1
12.5%
M 1
12.5%
D 1
12.5%
S 1
12.5%
U 1
12.5%
Decimal Number
ValueCountFrequency (%)
2 13
52.0%
1 8
32.0%
4 1
 
4.0%
0 1
 
4.0%
6 1
 
4.0%
3 1
 
4.0%
Space Separator
ValueCountFrequency (%)
3
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 385
91.2%
Common 29
 
6.9%
Latin 8
 
1.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
38
 
9.9%
17
 
4.4%
13
 
3.4%
12
 
3.1%
12
 
3.1%
12
 
3.1%
11
 
2.9%
10
 
2.6%
10
 
2.6%
9
 
2.3%
Other values (105) 241
62.6%
Common
ValueCountFrequency (%)
2 13
44.8%
1 8
27.6%
3
 
10.3%
, 1
 
3.4%
4 1
 
3.4%
0 1
 
3.4%
6 1
 
3.4%
3 1
 
3.4%
Latin
ValueCountFrequency (%)
T 2
25.0%
N 1
12.5%
P 1
12.5%
M 1
12.5%
D 1
12.5%
S 1
12.5%
U 1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 385
91.2%
ASCII 37
 
8.8%

Most frequent character per block

Hangul
ValueCountFrequency (%)
38
 
9.9%
17
 
4.4%
13
 
3.4%
12
 
3.1%
12
 
3.1%
12
 
3.1%
11
 
2.9%
10
 
2.6%
10
 
2.6%
9
 
2.3%
Other values (105) 241
62.6%
ASCII
ValueCountFrequency (%)
2 13
35.1%
1 8
21.6%
3
 
8.1%
T 2
 
5.4%
N 1
 
2.7%
P 1
 
2.7%
, 1
 
2.7%
M 1
 
2.7%
D 1
 
2.7%
S 1
 
2.7%
Other values (5) 5
 
13.5%

Unnamed: 3
Categorical

Distinct6
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Memory size756.0 B
CHAR
37 
NUMBER
21 
VARCHAR2
13 
<NA>
테이블ID
 
1

Length

Max length8
Median length4
Mean length5.1923077
Min length2

Unique

Unique2 ?
Unique (%)2.6%

Sample

1st row<NA>
2nd row테이블ID
3rd row<NA>
4th row타입
5th rowCHAR

Common Values

ValueCountFrequency (%)
CHAR 37
47.4%
NUMBER 21
26.9%
VARCHAR2 13
 
16.7%
<NA> 5
 
6.4%
테이블ID 1
 
1.3%
타입 1
 
1.3%

Length

2024-04-18T09:53:09.660939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-18T09:53:09.763817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
char 37
47.4%
number 21
26.9%
varchar2 13
 
16.7%
na 5
 
6.4%
테이블id 1
 
1.3%
타입 1
 
1.3%

Unnamed: 4
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing5
Missing (%)6.4%
Memory size756.0 B

Unnamed: 5
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)2.8%
Missing7
Missing (%)9.0%
Memory size288.0 B
True
67 
False
 
4
(Missing)
ValueCountFrequency (%)
True 67
85.9%
False 4
 
5.1%
(Missing) 7
 
9.0%
2024-04-18T09:53:09.859462image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Unnamed: 6
Text

MISSING 

Distinct5
Distinct (%)100.0%
Missing73
Missing (%)93.6%
Memory size756.0 B
2024-04-18T09:53:09.986329image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length3
Mean length3.6
Min length3

Characters and Unicode

Total characters18
Distinct characters13
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)100.0%

Sample

1st row작성일
2nd row테이블명
3rd rowPK/FK
4th rowPK1
5th rowPK2
ValueCountFrequency (%)
작성일 1
20.0%
테이블명 1
20.0%
pk/fk 1
20.0%
pk1 1
20.0%
pk2 1
20.0%
2024-04-18T09:53:10.266589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 4
22.2%
P 3
16.7%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
1
 
5.6%
/ 1
 
5.6%
Other values (3) 3
16.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 8
44.4%
Other Letter 7
38.9%
Decimal Number 2
 
11.1%
Other Punctuation 1
 
5.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Uppercase Letter
ValueCountFrequency (%)
K 4
50.0%
P 3
37.5%
F 1
 
12.5%
Decimal Number
ValueCountFrequency (%)
1 1
50.0%
2 1
50.0%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 8
44.4%
Hangul 7
38.9%
Common 3
 
16.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Latin
ValueCountFrequency (%)
K 4
50.0%
P 3
37.5%
F 1
 
12.5%
Common
ValueCountFrequency (%)
/ 1
33.3%
1 1
33.3%
2 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11
61.1%
Hangul 7
38.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 4
36.4%
P 3
27.3%
/ 1
 
9.1%
F 1
 
9.1%
1 1
 
9.1%
2 1
 
9.1%
Hangul
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

Unnamed: 7
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing75
Missing (%)96.2%
Memory size756.0 B

Unnamed: 8
Text

CONSTANT  MISSING 

Distinct1
Distinct (%)100.0%
Missing77
Missing (%)98.7%
Memory size756.0 B
2024-04-18T09:53:10.401609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters9
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st row참조테이블명/비고
ValueCountFrequency (%)
참조테이블명/비고 1
100.0%
2024-04-18T09:53:10.636623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
/ 1
11.1%
1
11.1%
1
11.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8
88.9%
Other Punctuation 1
 
11.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Other Punctuation
ValueCountFrequency (%)
/ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8
88.9%
Common 1
 
11.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Common
ValueCountFrequency (%)
/ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8
88.9%
ASCII 1
 
11.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
ASCII
ValueCountFrequency (%)
/ 1
100.0%

Correlations

2024-04-18T09:53:10.714711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 5Unnamed: 6
Unnamed: 11.0001.0001.0001.0001.000
Unnamed: 21.0001.0001.0001.0001.000
Unnamed: 31.0001.0001.0000.0711.000
Unnamed: 51.0001.0000.0711.000NaN
Unnamed: 61.0001.0001.000NaN1.000
2024-04-18T09:53:10.851134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 5
Unnamed: 31.0000.115
Unnamed: 50.1151.000
2024-04-18T09:53:10.936507image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 5
Unnamed: 31.0000.115
Unnamed: 50.1151.000

Missing values

2024-04-18T09:53:08.146192image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-04-18T09:53:08.270085image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0작성자<NA>김민호<NA>NaN<NA>작성일2017-05-10 00:00:00<NA>
1주제영역명<NA>가격업무테이블IDAPMM_NV_LAND_2016<NA>테이블명공시지가 토지특성 2016<NA>
2테이블설명<NA>공시지가 토지특성 2016<NA>NaN<NA><NA>NaN<NA>
3No컬럼ID컬럼명타입길이(Byte)<NA>PK/FKDefault참조테이블명/비고
41STDMT기준월CHAR2NPK1NaN<NA>
52PNU토지코드VARCHAR219NPK2NaN<NA>
63LAND_SEQNO토지일련번호NUMBER6,0N<NA>NaN<NA>
74SGG_CD시군구코드CHAR5Y<NA>NaN<NA>
85LAND_LOC_CD토지소재지코드CHAR5Y<NA>NaN<NA>
96LAND_GBN토지구분CHAR1Y<NA>NaN<NA>
테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
6865CNFER_CD확인자코드VARCHAR23Y<NA>NaN<NA>
6966VRFY_GBN검증구분CHAR2Y<NA>NaN<NA>
7067PY_VRFY_GBN전년검증구분CHAR2Y<NA>NaN<NA>
7168LAND_MOV_YMD토지이동일자VARCHAR28N<NA>NaN<NA>
7269LAND_MOV_RSN_CD토지이동사유코드VARCHAR25Y<NA>NaN<NA>
7370HOUSE_PANN_YN주택공시여부CHAR1Y<NA>NaN<NA>
7471COL_ADM_SECT_CD원천시군구코드VARCHAR25Y<NA>NaN<NA>
75인덱스명<NA>인덱스키<NA>NaN<NA><NA>NaN<NA>
76APMM_NV_LAND_2016_INX1<NA>STDMT, PNU<NA>NaN<NA><NA>NaN<NA>
77업무규칙<NA><NA><NA>NaN<NA><NA>NaN<NA>