Overview

Dataset statistics

Number of variables9
Number of observations500
Missing cells59
Missing cells (%)1.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory37.7 KiB
Average record size in memory77.3 B

Variable types

Numeric5
Categorical4

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=51

Alerts

기준년월(TS_YM) is highly correlated with 일별(TS_YMD)High correlation
일별(TS_YMD) is highly correlated with 기준년월(TS_YM)High correlation
성별(SEX_CCD) has 27 (5.4%) missing values Missing
연령대별(AGE_GB) has 32 (6.4%) missing values Missing

Reproduction

Analysis started2022-10-29 06:25:57.210318
Analysis finished2022-10-29 06:26:05.898145
Duration8.69 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Distinct470
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.114428809 × 1012
Minimum1.10105301 × 1012
Maximum1.125073031 × 1012
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:06.179547image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1.10105301 × 1012
5-th percentile1.10206702 × 1012
Q11.109063018 × 1012
median1.115061015 × 1012
Q31.12106802 × 1012
95-th percentile1.124071021 × 1012
Maximum1.125073031 × 1012
Range2.40200208 × 1010
Interquartile range (IQR)1.20050025 × 1010

Descriptive statistics

Standard deviation7147337232
Coefficient of variation (CV)0.006413453397
Kurtosis-1.078478973
Mean1.114428809 × 1012
Median Absolute Deviation (MAD)6000995004
Skewness-0.2703069853
Sum5.572144046 × 1014
Variance5.10844295 × 1019
MonotonicityNot monotonic
2022-10-29T15:26:06.596112image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.11307504 × 10123
 
0.6%
1.12307103 × 10123
 
0.6%
1.10206901 × 10122
 
0.4%
1.11606403 × 10122
 
0.4%
1.11506301 × 10122
 
0.4%
1.12206002 × 10122
 
0.4%
1.11407701 × 10122
 
0.4%
1.12107203 × 10122
 
0.4%
1.12306404 × 10122
 
0.4%
1.12106802 × 10122
 
0.4%
Other values (460)478
95.6%
ValueCountFrequency (%)
1.10105301 × 10121
0.2%
1.101053021 × 10121
0.2%
1.10105502 × 10121
0.2%
1.10105701 × 10121
0.2%
1.10106303 × 10121
0.2%
1.10106701 × 10121
0.2%
1.10106702 × 10121
0.2%
1.10107101 × 10121
0.2%
1.10107201 × 10121
0.2%
1.10107301 × 10121
0.2%
ValueCountFrequency (%)
1.125073031 × 10121
0.2%
1.12507303 × 10121
0.2%
1.12507302 × 10121
0.2%
1.12507301 × 10121
0.2%
1.12507202 × 10121
0.2%
1.12507102 × 10121
0.2%
1.12507102 × 10121
0.2%
1.12507102 × 10121
0.2%
1.12507102 × 10121
0.2%
1.12506702 × 10121
0.2%
Distinct46
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
SB016
61 
SB008
56 
SB001
53 
SB013
41 
SB020
31 
Other values (41)
258 

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique7 ?
Unique (%)1.4%

Sample

1st rowSB008
2nd rowSB007
3rd rowSB016
4th rowSB008
5th rowSB051

Common Values

ValueCountFrequency (%)
SB01661
 
12.2%
SB00856
 
11.2%
SB00153
 
10.6%
SB01341
 
8.2%
SB02031
 
6.2%
SB05428
 
5.6%
SB00619
 
3.8%
SB05119
 
3.8%
SB00718
 
3.6%
SB00517
 
3.4%
Other values (36)157
31.4%

Length

2022-10-29T15:26:06.889200image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sb01661
 
12.2%
sb00856
 
11.2%
sb00153
 
10.6%
sb01341
 
8.2%
sb02031
 
6.2%
sb05428
 
5.6%
sb00619
 
3.8%
sb05119
 
3.8%
sb00718
 
3.6%
sb00517
 
3.4%
Other values (36)157
31.4%

기준년월(TS_YM)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct55
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201890.788
Minimum201701
Maximum202107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:07.164880image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum201701
5-th percentile201705
Q1201803
median201903.5
Q3202005.25
95-th percentile202105
Maximum202107
Range406
Interquartile range (IQR)202.25

Descriptive statistics

Standard deviation130.7724816
Coefficient of variation (CV)0.0006477387249
Kurtosis-1.141198686
Mean201890.788
Median Absolute Deviation (MAD)101.5
Skewness0.09052048919
Sum100945394
Variance17101.44194
MonotonicityNot monotonic
2022-10-29T15:26:07.616815image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20171024
 
4.8%
20201016
 
3.2%
20180616
 
3.2%
20190313
 
2.6%
20190113
 
2.6%
20210113
 
2.6%
20190712
 
2.4%
20190212
 
2.4%
20181212
 
2.4%
20180111
 
2.2%
Other values (45)358
71.6%
ValueCountFrequency (%)
2017015
 
1.0%
2017029
 
1.8%
2017036
 
1.2%
2017043
 
0.6%
2017059
 
1.8%
2017066
 
1.2%
20170710
2.0%
2017086
 
1.2%
2017095
 
1.0%
20171024
4.8%
ValueCountFrequency (%)
20210711
2.2%
2021068
1.6%
2021058
1.6%
2021047
1.4%
20210311
2.2%
2021026
 
1.2%
20210113
2.6%
2020126
 
1.2%
2020116
 
1.2%
20201016
3.2%

일별(TS_YMD)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct441
Distinct (%)88.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20189095.03
Minimum20170102
Maximum20210731
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:08.021057image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20170102
5-th percentile20170519.65
Q120180327.75
median20190364
Q320200547.75
95-th percentile20210511
Maximum20210731
Range40629
Interquartile range (IQR)20220

Descriptive statistics

Standard deviation13077.02662
Coefficient of variation (CV)0.0006477272313
Kurtosis-1.141123368
Mean20189095.03
Median Absolute Deviation (MAD)10141
Skewness0.09056341695
Sum1.009454751 × 1010
Variance171008625.3
MonotonicityNot monotonic
2022-10-29T15:26:08.365570image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201903043
 
0.6%
201710213
 
0.6%
201902073
 
0.6%
201708263
 
0.6%
201912092
 
0.4%
201801192
 
0.4%
201803052
 
0.4%
201806172
 
0.4%
201812072
 
0.4%
201907152
 
0.4%
Other values (431)476
95.2%
ValueCountFrequency (%)
201701021
0.2%
201701061
0.2%
201701151
0.2%
201701181
0.2%
201701291
0.2%
201702021
0.2%
201702081
0.2%
201702101
0.2%
201702131
0.2%
201702142
0.4%
ValueCountFrequency (%)
202107311
0.2%
202107291
0.2%
202107271
0.2%
202107251
0.2%
202107221
0.2%
202107171
0.2%
202107141
0.2%
202107111
0.2%
202107101
0.2%
202107091
0.2%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
개인
471 
법인
 
29

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row개인
2nd row개인
3rd row법인
4th row개인
5th row개인

Common Values

ValueCountFrequency (%)
개인471
94.2%
법인29
 
5.8%

Length

2022-10-29T15:26:08.711780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-29T15:26:08.983978image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
개인471
94.2%
법인29
 
5.8%

성별(SEX_CCD)
Categorical

MISSING

Distinct2
Distinct (%)0.4%
Missing27
Missing (%)5.4%
Memory size4.0 KiB
M
263 
F
210 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowM
5th rowF

Common Values

ValueCountFrequency (%)
M263
52.6%
F210
42.0%
(Missing)27
 
5.4%

Length

2022-10-29T15:26:09.218298image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-29T15:26:09.430148image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
m263
55.6%
f210
44.4%

연령대별(AGE_GB)
Categorical

MISSING

Distinct7
Distinct (%)1.5%
Missing32
Missing (%)6.4%
Memory size4.0 KiB
30대
113 
50대
89 
40대
83 
20대
70 
60대
66 
Other values (2)
47 

Length

Max length5
Median length3
Mean length3.128205128
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row30대
2nd row40대
3rd row20대
4th row20대
5th row30대

Common Values

ValueCountFrequency (%)
30대113
22.6%
50대89
17.8%
40대83
16.6%
20대70
14.0%
60대66
13.2%
70대이상30
 
6.0%
10대17
 
3.4%
(Missing)32
 
6.4%

Length

2022-10-29T15:26:09.741765image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-29T15:26:10.021746image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
30대113
24.1%
50대89
19.0%
40대83
17.7%
20대70
15.0%
60대66
14.1%
70대이상30
 
6.4%
10대17
 
3.6%

카드이용금액계(AMT_CORR)
Real number (ℝ≥0)

Distinct370
Distinct (%)74.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean581545.8756
Minimum5030
Maximum18832823
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:10.338416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5030
5-th percentile16070.85
Q152312
median134552.5
Q3375741
95-th percentile2114536.55
Maximum18832823
Range18827793
Interquartile range (IQR)323429

Descriptive statistics

Standard deviation1700597.688
Coefficient of variation (CV)2.924270912
Kurtosis50.03216233
Mean581545.8756
Median Absolute Deviation (MAD)99342.5
Skewness6.482024953
Sum290772937.8
Variance2.892032497 × 1012
MonotonicityNot monotonic
2022-10-29T15:26:10.697023image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
452708
 
1.6%
352107
 
1.4%
402406
 
1.2%
754506
 
1.2%
603605
 
1.0%
855105
 
1.0%
331985
 
1.0%
503005
 
1.0%
804804
 
0.8%
1006004
 
0.8%
Other values (360)445
89.0%
ValueCountFrequency (%)
50303
0.6%
5331.81
 
0.2%
65391
 
0.2%
70421
 
0.2%
75452
0.4%
80481
 
0.2%
90542
0.4%
100602
0.4%
110661
 
0.2%
120721
 
0.2%
ValueCountFrequency (%)
188328231
0.2%
14038327.61
0.2%
131913601
0.2%
12469923.31
0.2%
113205181
0.2%
8907551.551
0.2%
8898774.21
0.2%
7606164.81
0.2%
6417022.51
0.2%
6047991.521
0.2%

카드이용건수(USECT_CORR)
Real number (ℝ≥0)

Distinct34
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.56754
Minimum5.03
Maximum352.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:11.047412image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum5.03
5-th percentile5.03
Q15.03
median10.06
Q325.15
95-th percentile65.39
Maximum352.1
Range347.07
Interquartile range (IQR)20.12

Descriptive statistics

Standard deviation29.68060645
Coefficient of variation (CV)1.44308004
Kurtosis42.08403046
Mean20.56754
Median Absolute Deviation (MAD)5.03
Skewness5.21428054
Sum10283.77
Variance880.9383993
MonotonicityNot monotonic
2022-10-29T15:26:11.374754image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=34)
ValueCountFrequency (%)
5.03200
40.0%
10.0675
 
15.0%
15.0938
 
7.6%
20.1234
 
6.8%
30.1823
 
4.6%
25.1520
 
4.0%
35.2115
 
3.0%
9.112
 
2.4%
45.2712
 
2.4%
40.2411
 
2.2%
Other values (24)60
 
12.0%
ValueCountFrequency (%)
5.03200
40.0%
9.112
 
2.4%
10.0675
 
15.0%
15.0938
 
7.6%
18.29
 
1.8%
20.1234
 
6.8%
25.1520
 
4.0%
27.33
 
0.6%
30.1823
 
4.6%
35.2115
 
3.0%
ValueCountFrequency (%)
352.11
0.2%
246.471
0.2%
206.231
0.2%
160.961
0.2%
150.91
0.2%
127.41
0.2%
120.722
0.4%
109.21
0.2%
95.572
0.4%
90.541
0.2%

Interactions

2022-10-29T15:26:03.422703image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:57.640033image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:59.114459image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:00.530113image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:02.000475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:03.703377image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:57.896017image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:59.380886image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:00.820521image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:02.272975image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:03.945618image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:58.181168image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:59.658543image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:01.121044image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:02.602171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:04.254115image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:58.564718image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:59.951732image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:01.401010image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:02.851910image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:04.487354image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:25:58.846101image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:00.242296image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:01.707182image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:03.133666image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-10-29T15:26:11.682778image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-29T15:26:11.949358image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-29T15:26:12.264870image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-29T15:26:12.604807image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-10-29T15:26:12.934258image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-10-29T15:26:04.803262image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-29T15:26:05.187566image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-10-29T15:26:05.510046image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-10-29T15:26:05.668150image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

가맹점집계구코드(TOT_REG_CD)내국인업종코드(SB_UPJONG_CD)기준년월(TS_YM)일별(TS_YMD)개인법인구분(PSN_CPR)성별(SEX_CCD)연령대별(AGE_GB)카드이용금액계(AMT_CORR)카드이용건수(USECT_CORR)
01122060020002SB00820210320210319개인<NA>30대41246.015.09
11118054020017SB00720200320200331개인M40대40240.05.03
21102060020001SB01620180720180711법인F20대11320518.05.03
31104055020002SB00820190720190720개인M20대5331.865.39
41112068010404SB05120190520190524개인M30대50400.630.18
51106090020001SB04420171020171014개인F60대65390.010.06
61114074020006SB00520171220171203개인<NA>20대56084.510.06
71118054010002SB00420190720190705개인M50대56336.05.03
81119074060001SB05420190220190207개인M40대310351.015.09
91113075040005SB01320171020171023개인F50대186110.035.21

Last rows

가맹점집계구코드(TOT_REG_CD)내국인업종코드(SB_UPJONG_CD)기준년월(TS_YM)일별(TS_YMD)개인법인구분(PSN_CPR)성별(SEX_CCD)연령대별(AGE_GB)카드이용금액계(AMT_CORR)카드이용건수(USECT_CORR)
4901107070010606SB00820171120171123개인M20대186110.010.06
4911107060030012SB01920210420210425개인M70대이상160960.090.54
4921110051040002SB00820200220200220개인F50대85510.09.1
4931121071010002SB04520200120200109개인F50대150900.05.03
4941102060060001SB05120210620210608개인F30대35210.05.03
4951106081030003SB00520181220181225개인F30대63478.640.24
4961103063020101SB02020190720190729개인M50대87622.65.03
4971111058030001SB05420171020171021개인F50대28168.010.06
4981115059030005SB00720180220180224개인M20대32946.55.03
4991120072030004SB03920181020181028개인F40대191643.045.27