Overview

Dataset statistics

Number of variables9
Number of observations25
Missing cells113
Missing cells (%)50.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 KiB
Average record size in memory78.1 B

Variable types

Unsupported4
Categorical5

Dataset

Description부동산 가격공시에 관한 법률에 의거 개별공시지가 산정을 위한 공시지가 토지특성(2022)
Author국토교통부
URLhttp://data.nsdi.go.kr/dataset/20220727ds00002

Alerts

Unnamed: 8 has constant value "참조테이블명/비고" Constant
Unnamed: 6 is highly correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 2 is highly correlated with Unnamed: 6 and 2 other fieldsHigh correlation
Unnamed: 1 is highly correlated with Unnamed: 2 and 1 other fieldsHigh correlation
Unnamed: 3 is highly correlated with Unnamed: 6 and 2 other fieldsHigh correlation
테이블정의서 has 1 (4.0%) missing values Missing
Unnamed: 1 has 6 (24.0%) missing values Missing
Unnamed: 2 has 3 (12.0%) missing values Missing
Unnamed: 3 has 5 (20.0%) missing values Missing
Unnamed: 4 has 5 (20.0%) missing values Missing
Unnamed: 5 has 25 (100.0%) missing values Missing
Unnamed: 6 has 22 (88.0%) missing values Missing
Unnamed: 7 has 22 (88.0%) missing values Missing
Unnamed: 8 has 24 (96.0%) missing values Missing
테이블정의서 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2022-08-14 13:01:46.075690
Analysis finished2022-08-14 13:01:48.920493
Duration2.84 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

테이블정의서
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing1
Missing (%)4.0%
Memory size328.0 B

Unnamed: 1
Categorical

HIGH CORRELATION
MISSING

Distinct19
Distinct (%)100.0%
Missing6
Missing (%)24.0%
Memory size328.0 B
컬럼ID
 
1
STDMT
 
1
PNU
 
1
LAND_SEQNO
 
1
SGG_CD
 
1
Other values (14)
14 

Length

Max length11
Median length9
Mean length6.368421053
Min length3

Unique

Unique19 ?
Unique (%)100.0%

Sample

1st row컬럼ID
2nd rowSTDMT
3rd rowPNU
4th rowLAND_SEQNO
5th rowSGG_CD

Common Values

ValueCountFrequency (%)
컬럼ID1
 
4.0%
STDMT1
 
4.0%
PNU1
 
4.0%
LAND_SEQNO1
 
4.0%
SGG_CD1
 
4.0%
LAND_LOC_CD1
 
4.0%
LAND_GBN1
 
4.0%
BOBN1
 
4.0%
BUBN1
 
4.0%
ADM_UMD_CD1
 
4.0%
Other values (9)9
36.0%
(Missing)6
24.0%

Length

2022-08-14T22:01:49.035852image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
컬럼id1
 
5.3%
pnilp1
 
5.3%
geo_form1
 
5.3%
geo_hl1
 
5.3%
land_use1
 
5.3%
spfc21
 
5.3%
spfc11
 
5.3%
parea1
 
5.3%
jimok1
 
5.3%
adm_umd_cd1
 
5.3%
Other values (9)9
47.4%

Unnamed: 2
Categorical

HIGH CORRELATION
MISSING

Distinct22
Distinct (%)100.0%
Missing3
Missing (%)12.0%
Memory size328.0 B
부번
 
1
공시지가 토지특성
 
1
컬럼명
 
1
기준월
 
1
필지고유번호
 
1
Other values (17)
17 

Length

Max length20
Median length7
Mean length5.272727273
Min length2

Unique

Unique22 ?
Unique (%)100.0%

Sample

1st row부동산 제공 표준 데이터셋 v1.83
2nd row공시지가 토지특성
3rd row컬럼명
4th row기준월
5th row필지고유번호

Common Values

ValueCountFrequency (%)
부번1
 
4.0%
공시지가 토지특성1
 
4.0%
컬럼명1
 
4.0%
기준월1
 
4.0%
필지고유번호1
 
4.0%
토지일련번호1
 
4.0%
시군구코드1
 
4.0%
토지소재지코드1
 
4.0%
토지구분1
 
4.0%
본번1
 
4.0%
Other values (12)12
48.0%
(Missing)3
 
12.0%

Length

2022-08-14T22:01:49.309320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부번1
 
3.7%
표준1
 
3.7%
행정읍면동코드1
 
3.7%
도로접면1
 
3.7%
지형형상1
 
3.7%
지형고저1
 
3.7%
토지이용상황1
 
3.7%
용도지역21
 
3.7%
용도지역11
 
3.7%
면적1
 
3.7%
Other values (17)17
63.0%

Unnamed: 3
Categorical

HIGH CORRELATION
MISSING

Distinct5
Distinct (%)25.0%
Missing5
Missing (%)20.0%
Memory size328.0 B
CHAR
12 
VARCHAR2
NUMBER
테이블ID
 
1
타입
 
1

Length

Max length8
Median length4
Mean length4.85
Min length2

Unique

Unique2 ?
Unique (%)10.0%

Sample

1st row테이블ID
2nd row타입
3rd rowCHAR
4th rowVARCHAR2
5th rowNUMBER

Common Values

ValueCountFrequency (%)
CHAR12
48.0%
VARCHAR23
 
12.0%
NUMBER3
 
12.0%
테이블ID1
 
4.0%
타입1
 
4.0%
(Missing)5
20.0%

Length

2022-08-14T22:01:49.614613image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-14T22:01:49.793914image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
char12
60.0%
varchar23
 
15.0%
number3
 
15.0%
테이블id1
 
5.0%
타입1
 
5.0%

Unnamed: 4
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing5
Missing (%)20.0%
Memory size328.0 B

Unnamed: 5
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing25
Missing (%)100.0%
Memory size353.0 B

Unnamed: 6
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)100.0%
Missing22
Missing (%)88.0%
Memory size328.0 B
작성일
테이블명
PK/FK

Length

Max length5
Median length4
Mean length4
Min length3

Unique

Unique3 ?
Unique (%)100.0%

Sample

1st row작성일
2nd row테이블명
3rd rowPK/FK

Common Values

ValueCountFrequency (%)
작성일1
 
4.0%
테이블명1
 
4.0%
PK/FK1
 
4.0%
(Missing)22
88.0%

Length

2022-08-14T22:01:49.936212image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-14T22:01:50.145398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
작성일1
33.3%
테이블명1
33.3%
pk/fk1
33.3%

Unnamed: 7
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing22
Missing (%)88.0%
Memory size328.0 B

Unnamed: 8
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)100.0%
Missing24
Missing (%)96.0%
Memory size328.0 B
참조테이블명/비고

Length

Max length9
Median length9
Mean length9
Min length9

Unique

Unique1 ?
Unique (%)100.0%

Sample

1st row참조테이블명/비고

Common Values

ValueCountFrequency (%)
참조테이블명/비고1
 
4.0%
(Missing)24
96.0%

Length

2022-08-14T22:01:50.387356image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-14T22:01:50.594195image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
참조테이블명/비고1
100.0%

Correlations

2022-08-14T22:01:50.730855image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-14T22:01:50.986641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-14T22:01:51.228857image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-14T22:01:51.475892image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-14T22:01:51.734553image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-14T22:01:47.879762image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-14T22:01:48.320543image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-14T22:01:48.617471image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-14T22:01:48.815778image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
0작성자<NA>부동산 제공 표준 데이터셋 v1.83<NA>NaN<NA>작성일2017<NA>
1주제영역명<NA><NA>테이블IDAPMM_NV_LAND_OPEN<NA>테이블명공시지가 토지특성<NA>
2테이블설명<NA>공시지가 토지특성<NA>NaN<NA><NA>NaN<NA>
3No컬럼ID컬럼명타입길이(Byte)<NA>PK/FKDefault참조테이블명/비고
41STDMT기준월CHAR2<NA><NA>NaN<NA>
52PNU필지고유번호VARCHAR219<NA><NA>NaN<NA>
63LAND_SEQNO토지일련번호NUMBER6<NA><NA>NaN<NA>
74SGG_CD시군구코드CHAR5<NA><NA>NaN<NA>
85LAND_LOC_CD토지소재지코드CHAR5<NA><NA>NaN<NA>
96LAND_GBN토지구분CHAR1<NA><NA>NaN<NA>

Last rows

테이블정의서Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8
1512PAREA면적NUMBER17,5<NA><NA>NaN<NA>
1613SPFC1용도지역1CHAR2<NA><NA>NaN<NA>
1714SPFC2용도지역2CHAR2<NA><NA>NaN<NA>
1815LAND_USE토지이용상황VARCHAR23<NA><NA>NaN<NA>
1916GEO_HL지형고저CHAR2<NA><NA>NaN<NA>
2017GEO_FORM지형형상CHAR2<NA><NA>NaN<NA>
2118ROAD_SIDE도로접면CHAR2<NA><NA>NaN<NA>
22인덱스명<NA>인덱스키<NA>NaN<NA><NA>NaN<NA>
23NaN<NA><NA><NA>NaN<NA><NA>NaN<NA>
24업무규칙<NA><NA><NA>NaN<NA><NA>NaN<NA>