Dataset statistics
Number of variables | 9 |
---|---|
Number of observations | 25 |
Missing cells | 113 |
Missing cells (%) | 50.2% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1.9 KiB |
Average record size in memory | 78.1 B |
Variable types
Unsupported | 4 |
---|---|
Categorical | 5 |
Dataset
Description | 부동산 가격공시에 관한 법률에 의거 개별공시지가 산정을 위한 공시지가 토지특성(2022) |
---|---|
Author | 국토교통부 |
URL | http://data.nsdi.go.kr/dataset/20220727ds00002 |
Unnamed: 8 has constant value "참조테이블명/비고" | Constant |
Unnamed: 6 is highly correlated with Unnamed: 2 and 1 other fields | High correlation |
Unnamed: 2 is highly correlated with Unnamed: 6 and 2 other fields | High correlation |
Unnamed: 1 is highly correlated with Unnamed: 2 and 1 other fields | High correlation |
Unnamed: 3 is highly correlated with Unnamed: 6 and 2 other fields | High correlation |
테이블정의서 has 1 (4.0%) missing values | Missing |
Unnamed: 1 has 6 (24.0%) missing values | Missing |
Unnamed: 2 has 3 (12.0%) missing values | Missing |
Unnamed: 3 has 5 (20.0%) missing values | Missing |
Unnamed: 4 has 5 (20.0%) missing values | Missing |
Unnamed: 5 has 25 (100.0%) missing values | Missing |
Unnamed: 6 has 22 (88.0%) missing values | Missing |
Unnamed: 7 has 22 (88.0%) missing values | Missing |
Unnamed: 8 has 24 (96.0%) missing values | Missing |
테이블정의서 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
Reproduction
Analysis started | 2022-08-14 13:01:46.075690 |
---|---|
Analysis finished | 2022-08-14 13:01:48.920493 |
Duration | 2.84 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 19 |
---|---|
Distinct (%) | 100.0% |
Missing | 6 |
Missing (%) | 24.0% |
Memory size | 328.0 B |
컬럼ID | 1 |
---|---|
STDMT | 1 |
PNU | 1 |
LAND_SEQNO | 1 |
SGG_CD | 1 |
Other values (14) |
Length
Max length | 11 |
---|---|
Median length | 9 |
Mean length | 6.368421053 |
Min length | 3 |
Unique
Unique | 19 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 컬럼ID |
---|---|
2nd row | STDMT |
3rd row | PNU |
4th row | LAND_SEQNO |
5th row | SGG_CD |
Common Values
Value | Count | Frequency (%) |
컬럼ID | 1 | 4.0% |
STDMT | 1 | 4.0% |
PNU | 1 | 4.0% |
LAND_SEQNO | 1 | 4.0% |
SGG_CD | 1 | 4.0% |
LAND_LOC_CD | 1 | 4.0% |
LAND_GBN | 1 | 4.0% |
BOBN | 1 | 4.0% |
BUBN | 1 | 4.0% |
ADM_UMD_CD | 1 | 4.0% |
Other values (9) | 9 | |
(Missing) | 6 |
Length
Value | Count | Frequency (%) |
컬럼id | 1 | 5.3% |
pnilp | 1 | 5.3% |
geo_form | 1 | 5.3% |
geo_hl | 1 | 5.3% |
land_use | 1 | 5.3% |
spfc2 | 1 | 5.3% |
spfc1 | 1 | 5.3% |
parea | 1 | 5.3% |
jimok | 1 | 5.3% |
adm_umd_cd | 1 | 5.3% |
Other values (9) | 9 |
Distinct | 22 |
---|---|
Distinct (%) | 100.0% |
Missing | 3 |
Missing (%) | 12.0% |
Memory size | 328.0 B |
부번 | 1 |
---|---|
공시지가 토지특성 | 1 |
컬럼명 | 1 |
기준월 | 1 |
필지고유번호 | 1 |
Other values (17) |
Length
Max length | 20 |
---|---|
Median length | 7 |
Mean length | 5.272727273 |
Min length | 2 |
Unique
Unique | 22 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 부동산 제공 표준 데이터셋 v1.83 |
---|---|
2nd row | 공시지가 토지특성 |
3rd row | 컬럼명 |
4th row | 기준월 |
5th row | 필지고유번호 |
Common Values
Value | Count | Frequency (%) |
부번 | 1 | 4.0% |
공시지가 토지특성 | 1 | 4.0% |
컬럼명 | 1 | 4.0% |
기준월 | 1 | 4.0% |
필지고유번호 | 1 | 4.0% |
토지일련번호 | 1 | 4.0% |
시군구코드 | 1 | 4.0% |
토지소재지코드 | 1 | 4.0% |
토지구분 | 1 | 4.0% |
본번 | 1 | 4.0% |
Other values (12) | 12 | |
(Missing) | 3 | 12.0% |
Length
Value | Count | Frequency (%) |
부번 | 1 | 3.7% |
표준 | 1 | 3.7% |
행정읍면동코드 | 1 | 3.7% |
도로접면 | 1 | 3.7% |
지형형상 | 1 | 3.7% |
지형고저 | 1 | 3.7% |
토지이용상황 | 1 | 3.7% |
용도지역2 | 1 | 3.7% |
용도지역1 | 1 | 3.7% |
면적 | 1 | 3.7% |
Other values (17) | 17 |
Distinct | 5 |
---|---|
Distinct (%) | 25.0% |
Missing | 5 |
Missing (%) | 20.0% |
Memory size | 328.0 B |
CHAR | |
---|---|
VARCHAR2 | |
NUMBER | |
테이블ID | 1 |
타입 | 1 |
Length
Max length | 8 |
---|---|
Median length | 4 |
Mean length | 4.85 |
Min length | 2 |
Unique
Unique | 2 ? |
---|---|
Unique (%) | 10.0% |
Sample
1st row | 테이블ID |
---|---|
2nd row | 타입 |
3rd row | CHAR |
4th row | VARCHAR2 |
5th row | NUMBER |
Common Values
Value | Count | Frequency (%) |
CHAR | 12 | |
VARCHAR2 | 3 | 12.0% |
NUMBER | 3 | 12.0% |
테이블ID | 1 | 4.0% |
타입 | 1 | 4.0% |
(Missing) | 5 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
char | 12 | |
varchar2 | 3 | 15.0% |
number | 3 | 15.0% |
테이블id | 1 | 5.0% |
타입 | 1 | 5.0% |
Distinct | 3 |
---|---|
Distinct (%) | 100.0% |
Missing | 22 |
Missing (%) | 88.0% |
Memory size | 328.0 B |
작성일 | |
---|---|
테이블명 | |
PK/FK |
Length
Max length | 5 |
---|---|
Median length | 4 |
Mean length | 4 |
Min length | 3 |
Unique
Unique | 3 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 작성일 |
---|---|
2nd row | 테이블명 |
3rd row | PK/FK |
Common Values
Value | Count | Frequency (%) |
작성일 | 1 | 4.0% |
테이블명 | 1 | 4.0% |
PK/FK | 1 | 4.0% |
(Missing) | 22 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
작성일 | 1 | |
테이블명 | 1 | |
pk/fk | 1 |
Distinct | 1 |
---|---|
Distinct (%) | 100.0% |
Missing | 24 |
Missing (%) | 96.0% |
Memory size | 328.0 B |
참조테이블명/비고 |
---|
Length
Max length | 9 |
---|---|
Median length | 9 |
Mean length | 9 |
Min length | 9 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 100.0% |
Sample
1st row | 참조테이블명/비고 |
---|
Common Values
Value | Count | Frequency (%) |
참조테이블명/비고 | 1 | 4.0% |
(Missing) | 24 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
참조테이블명/비고 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
테이블정의서 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
0 | 작성자 | <NA> | 부동산 제공 표준 데이터셋 v1.83 | <NA> | NaN | <NA> | 작성일 | 2017 | <NA> |
1 | 주제영역명 | <NA> | <NA> | 테이블ID | APMM_NV_LAND_OPEN | <NA> | 테이블명 | 공시지가 토지특성 | <NA> |
2 | 테이블설명 | <NA> | 공시지가 토지특성 | <NA> | NaN | <NA> | <NA> | NaN | <NA> |
3 | No | 컬럼ID | 컬럼명 | 타입 | 길이(Byte) | <NA> | PK/FK | Default | 참조테이블명/비고 |
4 | 1 | STDMT | 기준월 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
5 | 2 | PNU | 필지고유번호 | VARCHAR2 | 19 | <NA> | <NA> | NaN | <NA> |
6 | 3 | LAND_SEQNO | 토지일련번호 | NUMBER | 6 | <NA> | <NA> | NaN | <NA> |
7 | 4 | SGG_CD | 시군구코드 | CHAR | 5 | <NA> | <NA> | NaN | <NA> |
8 | 5 | LAND_LOC_CD | 토지소재지코드 | CHAR | 5 | <NA> | <NA> | NaN | <NA> |
9 | 6 | LAND_GBN | 토지구분 | CHAR | 1 | <NA> | <NA> | NaN | <NA> |
Last rows
테이블정의서 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | Unnamed: 4 | Unnamed: 5 | Unnamed: 6 | Unnamed: 7 | Unnamed: 8 | |
---|---|---|---|---|---|---|---|---|---|
15 | 12 | PAREA | 면적 | NUMBER | 17,5 | <NA> | <NA> | NaN | <NA> |
16 | 13 | SPFC1 | 용도지역1 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
17 | 14 | SPFC2 | 용도지역2 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
18 | 15 | LAND_USE | 토지이용상황 | VARCHAR2 | 3 | <NA> | <NA> | NaN | <NA> |
19 | 16 | GEO_HL | 지형고저 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
20 | 17 | GEO_FORM | 지형형상 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
21 | 18 | ROAD_SIDE | 도로접면 | CHAR | 2 | <NA> | <NA> | NaN | <NA> |
22 | 인덱스명 | <NA> | 인덱스키 | <NA> | NaN | <NA> | <NA> | NaN | <NA> |
23 | NaN | <NA> | <NA> | <NA> | NaN | <NA> | <NA> | NaN | <NA> |
24 | 업무규칙 | <NA> | <NA> | <NA> | NaN | <NA> | <NA> | NaN | <NA> |