Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory322.4 KiB
Average record size in memory33.0 B

Variable types

Numeric2
Categorical2

Dataset

Description한국남동발전의 자산관리대장 현황입니다. 관리번호, 자산단위명, 취득일자, 기초취득가액, 기초상각누계액, 기초장부가액 등의 정보를 포함하고 있습니다.
Author한국남동발전㈜
URLhttps://www.data.go.kr/data/15064135/fileData.do

Alerts

취득일자 has a high cardinality: 1364 distinct values High cardinality
df_index is highly correlated with 관리번호High correlation
관리번호 is highly correlated with df_indexHigh correlation
df_index has unique values Unique
관리번호 has unique values Unique

Reproduction

Analysis started2022-10-03 07:13:18.559136
Analysis finished2022-10-03 07:13:19.913612
Duration1.35 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11684.0763
Minimum3
Maximum23402
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-10-03T16:13:20.017024image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile1153.95
Q15833.75
median11689.5
Q317558.25
95-th percentile22239.1
Maximum23402
Range23399
Interquartile range (IQR)11724.5

Descriptive statistics

Standard deviation6752.891927
Coefficient of variation (CV)0.5779568495
Kurtosis-1.197417564
Mean11684.0763
Median Absolute Deviation (MAD)5865.5
Skewness0.003004149179
Sum116840763
Variance45601549.38
MonotonicityNot monotonic
2022-10-03T16:13:20.241468image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
111781
 
< 0.1%
137601
 
< 0.1%
224191
 
< 0.1%
57351
 
< 0.1%
99821
 
< 0.1%
34801
 
< 0.1%
130251
 
< 0.1%
91051
 
< 0.1%
95721
 
< 0.1%
20701
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
31
< 0.1%
71
< 0.1%
101
< 0.1%
111
< 0.1%
161
< 0.1%
171
< 0.1%
241
< 0.1%
251
< 0.1%
301
< 0.1%
321
< 0.1%
ValueCountFrequency (%)
234021
< 0.1%
234001
< 0.1%
233961
< 0.1%
233941
< 0.1%
233921
< 0.1%
233821
< 0.1%
233741
< 0.1%
233691
< 0.1%
233651
< 0.1%
233631
< 0.1%

관리번호
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4281111865
Minimum1010000062
Maximum9010000076
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size88.0 KiB
2022-10-03T16:13:20.464019image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1010000062
5-th percentile1010002261
Q14010007716
median4010022478
Q34710007799
95-th percentile6510010733
Maximum9010000076
Range8000000014
Interquartile range (IQR)700000083.5

Descriptive statistics

Standard deviation1610727564
Coefficient of variation (CV)0.3762404756
Kurtosis0.2120058872
Mean4281111865
Median Absolute Deviation (MAD)699981277.5
Skewness-0.1173307152
Sum4.281111865 × 1013
Variance2.594443286 × 1018
MonotonicityNot monotonic
2022-10-03T16:13:20.913845image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40100214251
 
< 0.1%
47100070091
 
< 0.1%
40100058991
 
< 0.1%
10100027251
 
< 0.1%
40100089331
 
< 0.1%
47100011021
 
< 0.1%
40100110421
 
< 0.1%
40100134241
 
< 0.1%
40100257881
 
< 0.1%
40100191211
 
< 0.1%
Other values (9990)9990
99.9%
ValueCountFrequency (%)
10100000621
< 0.1%
10100000661
< 0.1%
10100000691
< 0.1%
10100000701
< 0.1%
10100000751
< 0.1%
10100000761
< 0.1%
10100000851
< 0.1%
10100000861
< 0.1%
10100000911
< 0.1%
10100000931
< 0.1%
ValueCountFrequency (%)
90100000761
< 0.1%
90100000731
< 0.1%
90100000691
< 0.1%
90100000671
< 0.1%
90100000651
< 0.1%
90100000501
< 0.1%
90100000401
< 0.1%
90100000341
< 0.1%
90100000301
< 0.1%
90100000271
< 0.1%

자산단위명
Categorical

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
기계장치
4226 
기계장치-저장품
1600 
비품
1494 
토지
891 
건물
527 
Other values (10)
1262 

Length

Max length8
Median length7
Mean length4.1017
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row기계장치
2nd row비품
3rd row기계장치
4th row토지
5th row기계장치

Common Values

ValueCountFrequency (%)
기계장치4226
42.3%
기계장치-저장품1600
 
16.0%
비품1494
 
14.9%
토지891
 
8.9%
건물527
 
5.3%
공구와기구468
 
4.7%
구축물453
 
4.5%
소프트웨어186
 
1.9%
차량운반구71
 
0.7%
종합검사원가50
 
0.5%
Other values (5)34
 
0.3%

Length

2022-10-03T16:13:21.144810image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
기계장치4226
42.3%
기계장치-저장품1600
 
16.0%
비품1494
 
14.9%
토지891
 
8.9%
건물527
 
5.3%
공구와기구468
 
4.7%
구축물453
 
4.5%
소프트웨어186
 
1.9%
차량운반구71
 
0.7%
종합검사원가50
 
0.5%
Other values (5)34
 
0.3%

취득일자
Categorical

HIGH CARDINALITY

Distinct1364
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
2001-04-02
1116 
2010-01-01
 
652
2014-06-30
 
324
2015-12-31
 
239
2015-01-01
 
190
Other values (1359)
7479 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique641 ?
Unique (%)6.4%

Sample

1st row2014-12-31
2nd row2012-12-31
3rd row2007-10-26
4th row2008-04-01
5th row2010-02-28

Common Values

ValueCountFrequency (%)
2001-04-021116
 
11.2%
2010-01-01652
 
6.5%
2014-06-30324
 
3.2%
2015-12-31239
 
2.4%
2015-01-01190
 
1.9%
2008-04-01167
 
1.7%
1997-03-01162
 
1.6%
2019-12-31158
 
1.6%
2016-08-31156
 
1.6%
2004-07-12154
 
1.5%
Other values (1354)6682
66.8%

Length

2022-10-03T16:13:21.321041image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2001-04-021116
 
11.2%
2010-01-01652
 
6.5%
2014-06-30324
 
3.2%
2015-12-31239
 
2.4%
2015-01-01190
 
1.9%
2008-04-01167
 
1.7%
1997-03-01162
 
1.6%
2019-12-31158
 
1.6%
2016-08-31156
 
1.6%
2004-07-12154
 
1.5%
Other values (1354)6682
66.8%

Interactions

2022-10-03T16:13:19.335085image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-03T16:13:18.915855image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-03T16:13:19.469978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-03T16:13:19.127793image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-10-03T16:13:21.466682image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-03T16:13:21.629400image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-03T16:13:21.803274image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-03T16:13:21.962703image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-03T16:13:19.694360image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-03T16:13:19.856040image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_index관리번호자산단위명취득일자
0111784010021425기계장치2014-12-31
1200466510004460비품2012-12-31
252294010005899기계장치2007-10-26
314631010002725토지2008-04-01
464484010008933기계장치2010-02-28
5150434710001102기계장치-저장품2010-01-01
670604010011042기계장치1984-02-01
782934010013424기계장치2012-07-31
8142274010025788기계장치2019-12-31
9171434710007009기계장치-저장품2017-05-17

Last rows

df_index관리번호자산단위명취득일자
9990229068200000220소프트웨어2012-03-29
9991128284010024000기계장치2016-08-31
9992148814710000860기계장치-저장품2010-01-01
9993147054710000597기계장치-저장품2010-01-01
9994145314710000360기계장치-저장품2010-01-01
9995173624710007457기계장치-저장품2017-06-27
9996128964010024068기계장치2016-08-31
999758714010007762기계장치2001-04-02
9998149624710000989기계장치-저장품2010-01-01
9999205386510008643비품2015-09-30