Overview

Dataset statistics

Number of variables8
Number of observations10000
Missing cells8189
Missing cells (%)10.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory664.2 KiB
Average record size in memory68.0 B

Variable types

Numeric4
Categorical4

Dataset

Description경상남도 창원시 2015년 기준 개별주택가격 정보에 대한 자료입니다. 읍면동 별 지번별 주택 가경을 공시하는 자료입니다.
Author경상남도
URLhttps://bigdata.gyeongnam.go.kr/index.gn?menuCd=DOM_000000114002001000&publicdatapk=15046125

Alerts

읍면동 has a high cardinality: 158 distinct values High cardinality
has a high cardinality: 108 distinct values High cardinality
has 8189 (81.9%) missing values Missing
df_index has unique values Unique
부번 has 1136 (11.4%) zeros Zeros

Reproduction

Analysis started2022-08-11 19:32:18.394387
Analysis finished2022-08-11 19:32:23.028554
Duration4.63 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40750.2057
Minimum31
Maximum81008
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-08-12T04:32:23.121790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum31
5-th percentile4192.6
Q120712.75
median40658
Q361242.25
95-th percentile77207.75
Maximum81008
Range80977
Interquartile range (IQR)40529.5

Descriptive statistics

Standard deviation23456.85859
Coefficient of variation (CV)0.575625526
Kurtosis-1.204165096
Mean40750.2057
Median Absolute Deviation (MAD)20317.5
Skewness-0.00903744776
Sum407502057
Variance550224214.9
MonotonicityNot monotonic
2022-08-12T04:32:23.350966image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
759731
 
< 0.1%
233761
 
< 0.1%
69971
 
< 0.1%
534761
 
< 0.1%
286021
 
< 0.1%
760341
 
< 0.1%
680571
 
< 0.1%
62611
 
< 0.1%
314411
 
< 0.1%
548251
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
311
< 0.1%
421
< 0.1%
481
< 0.1%
491
< 0.1%
651
< 0.1%
771
< 0.1%
901
< 0.1%
981
< 0.1%
1271
< 0.1%
1341
< 0.1%
ValueCountFrequency (%)
810081
< 0.1%
810071
< 0.1%
809961
< 0.1%
809951
< 0.1%
809841
< 0.1%
809791
< 0.1%
809751
< 0.1%
809741
< 0.1%
809731
< 0.1%
809711
< 0.1%

시군구
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
경상남도 창원시 마산합포구
2713 
경상남도 창원시 의창구
2467 
경상남도 창원시 마산회원구
2078 
경상남도 창원시 진해구
2046 
경상남도 창원시 성산구
696 

Length

Max length14
Median length12
Mean length12.9582
Min length12

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경상남도 창원시 진해구
2nd row경상남도 창원시 마산회원구
3rd row경상남도 창원시 마산회원구
4th row경상남도 창원시 진해구
5th row경상남도 창원시 의창구

Common Values

ValueCountFrequency (%)
경상남도 창원시 마산합포구2713
27.1%
경상남도 창원시 의창구2467
24.7%
경상남도 창원시 마산회원구2078
20.8%
경상남도 창원시 진해구2046
20.5%
경상남도 창원시 성산구696
 
7.0%

Length

2022-08-12T04:32:23.540941image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T04:32:23.773097image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
경상남도10000
33.3%
창원시10000
33.3%
마산합포구2713
 
9.0%
의창구2467
 
8.2%
마산회원구2078
 
6.9%
진해구2046
 
6.8%
성산구696
 
2.3%

읍면동
Categorical

HIGH CARDINALITY

Distinct158
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
회원동
 
493
합성동
 
371
명서동
 
363
산호동
 
339
경화동
 
329
Other values (153)
8105 

Length

Max length5
Median length3
Mean length2.9465
Min length2

Unique

Unique11 ?
Unique (%)0.1%

Sample

1st row자은동
2nd row회원동
3rd row구암동
4th row제황산동
5th row용호동

Common Values

ValueCountFrequency (%)
회원동493
 
4.9%
합성동371
 
3.7%
명서동363
 
3.6%
산호동339
 
3.4%
경화동329
 
3.3%
여좌동325
 
3.2%
구암동287
 
2.9%
북면274
 
2.7%
봉곡동271
 
2.7%
동읍266
 
2.7%
Other values (148)6682
66.8%

Length

2022-08-12T04:32:23.935529image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
회원동493
 
4.9%
합성동371
 
3.7%
명서동363
 
3.6%
산호동339
 
3.4%
경화동329
 
3.3%
여좌동325
 
3.2%
구암동287
 
2.9%
북면274
 
2.7%
봉곡동271
 
2.7%
동읍266
 
2.7%
Other values (148)6682
66.8%


Categorical

HIGH CARDINALITY
MISSING

Distinct108
Distinct (%)6.0%
Missing8189
Missing (%)81.9%
Memory size78.2 KiB
진동리
 
69
가술리
 
66
중리
 
59
삼계리
 
55
신촌리
 
45
Other values (103)
1517 

Length

Max length3
Median length3
Mean length2.945886251
Min length2

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row구복리
2nd row평성리
3rd row우암리
4th row화천리
5th row대산리

Common Values

ValueCountFrequency (%)
진동리69
 
0.7%
가술리66
 
0.7%
중리59
 
0.6%
삼계리55
 
0.5%
신촌리45
 
0.4%
갈전리43
 
0.4%
오서리40
 
0.4%
심리39
 
0.4%
수정리36
 
0.4%
내포리35
 
0.4%
Other values (98)1324
 
13.2%
(Missing)8189
81.9%

Length

2022-08-12T04:32:24.071578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
진동리69
 
3.8%
가술리66
 
3.6%
중리59
 
3.3%
삼계리55
 
3.0%
신촌리45
 
2.5%
갈전리43
 
2.4%
오서리40
 
2.2%
심리39
 
2.2%
수정리36
 
2.0%
내포리35
 
1.9%
Other values (98)1324
73.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
9991 
2
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
19991
99.9%
29
 
0.1%

Length

2022-08-12T04:32:24.368926image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-12T04:32:24.507295image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
19991
99.9%
29
 
0.1%

본번
Real number (ℝ≥0)

Distinct1171
Distinct (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean299.0439
Minimum1
Maximum1773
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-12T04:32:24.635561image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q155
median155
Q3451
95-th percentile1084.1
Maximum1773
Range1772
Interquartile range (IQR)396

Descriptive statistics

Standard deviation330.32938
Coefficient of variation (CV)1.104618352
Kurtosis1.639337542
Mean299.0439
Median Absolute Deviation (MAD)130
Skewness1.483659625
Sum2990439
Variance109117.4993
MonotonicityNot monotonic
2022-08-12T04:32:24.849501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1294
 
0.9%
186
 
0.9%
375
 
0.8%
2873
 
0.7%
1466
 
0.7%
464
 
0.6%
563
 
0.6%
2661
 
0.6%
860
 
0.6%
657
 
0.6%
Other values (1161)9301
93.0%
ValueCountFrequency (%)
186
0.9%
255
0.5%
375
0.8%
464
0.6%
563
0.6%
657
0.6%
751
0.5%
860
0.6%
938
0.4%
1041
0.4%
ValueCountFrequency (%)
17731
< 0.1%
17711
< 0.1%
17691
< 0.1%
17662
< 0.1%
17631
< 0.1%
17621
< 0.1%
17601
< 0.1%
17592
< 0.1%
16431
< 0.1%
16402
< 0.1%

부번
Real number (ℝ≥0)

ZEROS

Distinct278
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.3043
Minimum0
Maximum624
Zeros1136
Zeros (%)11.4%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-12T04:32:25.102000image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median8
Q318
95-th percentile62
Maximum624
Range624
Interquartile range (IQR)16

Descriptive statistics

Standard deviation45.32950229
Coefficient of variation (CV)2.348155711
Kurtosis63.3146324
Mean19.3043
Median Absolute Deviation (MAD)7
Skewness7.066784556
Sum193043
Variance2054.763778
MonotonicityNot monotonic
2022-08-12T04:32:25.284146image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01136
 
11.4%
1801
 
8.0%
2626
 
6.3%
3551
 
5.5%
4465
 
4.7%
6432
 
4.3%
5431
 
4.3%
7377
 
3.8%
8337
 
3.4%
10305
 
3.0%
Other values (268)4539
45.4%
ValueCountFrequency (%)
01136
11.4%
1801
8.0%
2626
6.3%
3551
5.5%
4465
4.7%
5431
 
4.3%
6432
 
4.3%
7377
 
3.8%
8337
 
3.4%
9300
 
3.0%
ValueCountFrequency (%)
6241
< 0.1%
6031
< 0.1%
6011
< 0.1%
6001
< 0.1%
5931
< 0.1%
5681
< 0.1%
5671
< 0.1%
5661
< 0.1%
5641
< 0.1%
5561
< 0.1%

주택공시가격
Real number (ℝ≥0)

Distinct1379
Distinct (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean115900613.3
Minimum582000
Maximum5540000000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-08-12T04:32:25.493879image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum582000
5-th percentile22000000
Q151600000
median81200000
Q3161000000
95-th percentile312000000
Maximum5540000000
Range5539418000
Interquartile range (IQR)109400000

Descriptive statistics

Standard deviation108640030.4
Coefficient of variation (CV)0.9373550954
Kurtosis622.4290489
Mean115900613.3
Median Absolute Deviation (MAD)38900000
Skewness13.5346705
Sum1.159006133 × 1012
Variance1.180265621 × 1016
MonotonicityNot monotonic
2022-08-12T04:32:25.686093image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10100000053
 
0.5%
10200000049
 
0.5%
10000000046
 
0.5%
10300000042
 
0.4%
10900000038
 
0.4%
11300000036
 
0.4%
11400000035
 
0.4%
10500000035
 
0.4%
11200000035
 
0.4%
11100000033
 
0.3%
Other values (1369)9598
96.0%
ValueCountFrequency (%)
5820001
< 0.1%
6460001
< 0.1%
7490001
< 0.1%
8860001
< 0.1%
9420001
< 0.1%
9480001
< 0.1%
10600001
< 0.1%
10700001
< 0.1%
11700001
< 0.1%
12100002
< 0.1%
ValueCountFrequency (%)
55400000001
< 0.1%
7100000001
< 0.1%
6970000001
< 0.1%
6930000001
< 0.1%
6840000001
< 0.1%
6760000001
< 0.1%
6650000001
< 0.1%
6590000001
< 0.1%
6500000001
< 0.1%
6450000001
< 0.1%

Interactions

2022-08-12T04:32:21.755164image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:19.771063image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.516962image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:21.066898image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:21.904480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.033661image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.677496image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:21.202102image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:22.059999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.203128image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.817359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:21.329125image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:22.207402image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.356726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:20.943418image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-08-12T04:32:21.616970image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-08-12T04:32:25.852526image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-12T04:32:26.060261image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-12T04:32:26.279917image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-12T04:32:26.712945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-08-12T04:32:26.876950image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-08-12T04:32:22.557808image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-12T04:32:22.798314image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-12T04:32:22.931790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_index시군구읍면동토지구분(1 일반, 2 산, 5 블록노트)본번부번주택공시가격
075973경상남도 창원시 진해구자은동<NA>18885124000000
161646경상남도 창원시 마산회원구회원동<NA>14805041100000
248730경상남도 창원시 마산회원구구암동<NA>1903466300000
365356경상남도 창원시 진해구제황산동<NA>1271447900000
412301경상남도 창원시 의창구용호동<NA>1363270000000
542517경상남도 창원시 마산합포구구산면구복리11943415100000
663102경상남도 창원시 마산회원구내서읍평성리1266248500000
724838경상남도 창원시 성산구사파동<NA>1871213000000
826425경상남도 창원시 마산합포구교방동<NA>11372122000000
952503경상남도 창원시 마산회원구석전동<NA>1266734900000

Last rows

df_index시군구읍면동토지구분(1 일반, 2 산, 5 블록노트)본번부번주택공시가격
999039007경상남도 창원시 마산합포구자산동<NA>132333100000000
999160831경상남도 창원시 마산회원구회원동<NA>13172056000000
999231634경상남도 창원시 마산합포구산호동<NA>15141055200000
999367768경상남도 창원시 진해구여좌동<NA>1996347000000
999424331경상남도 창원시 성산구사파동<NA>13518289000000
999566982경상남도 창원시 진해구인사동<NA>16308440000
999666003경상남도 창원시 진해구안곡동<NA>142211900000
999777607경상남도 창원시 진해구제덕동<NA>11500289000000
999831625경상남도 창원시 마산합포구산호동<NA>15105517600000
999943190경상남도 창원시 마산합포구진동면고현리174207850000