Dataset statistics
Number of variables | 9 |
---|---|
Number of observations | 500 |
Missing cells | 59 |
Missing cells (%) | 1.3% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 37.7 KiB |
Average record size in memory | 77.3 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 4 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | 신한카드 |
URL | https://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=51 |
기준년월(TS_YM) is highly correlated with 일별(TS_YMD) | High correlation |
일별(TS_YMD) is highly correlated with 기준년월(TS_YM) | High correlation |
성별(SEX_CCD) has 27 (5.4%) missing values | Missing |
연령대별(AGE_GB) has 32 (6.4%) missing values | Missing |
Reproduction
Analysis started | 2022-10-29 06:25:57.210318 |
---|---|
Analysis finished | 2022-10-29 06:26:05.898145 |
Duration | 8.69 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
가맹점집계구코드(TOT_REG_CD)
Real number (ℝ≥0)
Distinct | 470 |
---|---|
Distinct (%) | 94.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.114428809 × 1012 |
Minimum | 1.10105301 × 1012 |
---|---|
Maximum | 1.125073031 × 1012 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 1.10105301 × 1012 |
---|---|
5-th percentile | 1.10206702 × 1012 |
Q1 | 1.109063018 × 1012 |
median | 1.115061015 × 1012 |
Q3 | 1.12106802 × 1012 |
95-th percentile | 1.124071021 × 1012 |
Maximum | 1.125073031 × 1012 |
Range | 2.40200208 × 1010 |
Interquartile range (IQR) | 1.20050025 × 1010 |
Descriptive statistics
Standard deviation | 7147337232 |
---|---|
Coefficient of variation (CV) | 0.006413453397 |
Kurtosis | -1.078478973 |
Mean | 1.114428809 × 1012 |
Median Absolute Deviation (MAD) | 6000995004 |
Skewness | -0.2703069853 |
Sum | 5.572144046 × 1014 |
Variance | 5.10844295 × 1019 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1.11307504 × 1012 | 3 | 0.6% |
1.12307103 × 1012 | 3 | 0.6% |
1.10206901 × 1012 | 2 | 0.4% |
1.11606403 × 1012 | 2 | 0.4% |
1.11506301 × 1012 | 2 | 0.4% |
1.12206002 × 1012 | 2 | 0.4% |
1.11407701 × 1012 | 2 | 0.4% |
1.12107203 × 1012 | 2 | 0.4% |
1.12306404 × 1012 | 2 | 0.4% |
1.12106802 × 1012 | 2 | 0.4% |
Other values (460) | 478 |
Value | Count | Frequency (%) |
1.10105301 × 1012 | 1 | |
1.101053021 × 1012 | 1 | |
1.10105502 × 1012 | 1 | |
1.10105701 × 1012 | 1 | |
1.10106303 × 1012 | 1 | |
1.10106701 × 1012 | 1 | |
1.10106702 × 1012 | 1 | |
1.10107101 × 1012 | 1 | |
1.10107201 × 1012 | 1 | |
1.10107301 × 1012 | 1 |
Value | Count | Frequency (%) |
1.125073031 × 1012 | 1 | |
1.12507303 × 1012 | 1 | |
1.12507302 × 1012 | 1 | |
1.12507301 × 1012 | 1 | |
1.12507202 × 1012 | 1 | |
1.12507102 × 1012 | 1 | |
1.12507102 × 1012 | 1 | |
1.12507102 × 1012 | 1 | |
1.12507102 × 1012 | 1 | |
1.12506702 × 1012 | 1 |
내국인업종코드(SB_UPJONG_CD)
Categorical
Distinct | 46 |
---|---|
Distinct (%) | 9.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
SB016 | |
---|---|
SB008 | |
SB001 | |
SB013 | |
SB020 | |
Other values (41) |
Length
Max length | 5 |
---|---|
Median length | 5 |
Mean length | 5 |
Min length | 5 |
Unique
Unique | 7 ? |
---|---|
Unique (%) | 1.4% |
Sample
1st row | SB008 |
---|---|
2nd row | SB007 |
3rd row | SB016 |
4th row | SB008 |
5th row | SB051 |
Common Values
Value | Count | Frequency (%) |
SB016 | 61 | 12.2% |
SB008 | 56 | 11.2% |
SB001 | 53 | 10.6% |
SB013 | 41 | 8.2% |
SB020 | 31 | 6.2% |
SB054 | 28 | 5.6% |
SB006 | 19 | 3.8% |
SB051 | 19 | 3.8% |
SB007 | 18 | 3.6% |
SB005 | 17 | 3.4% |
Other values (36) | 157 |
Length
Value | Count | Frequency (%) |
sb016 | 61 | 12.2% |
sb008 | 56 | 11.2% |
sb001 | 53 | 10.6% |
sb013 | 41 | 8.2% |
sb020 | 31 | 6.2% |
sb054 | 28 | 5.6% |
sb006 | 19 | 3.8% |
sb051 | 19 | 3.8% |
sb007 | 18 | 3.6% |
sb005 | 17 | 3.4% |
Other values (36) | 157 |
Distinct | 55 |
---|---|
Distinct (%) | 11.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 201890.788 |
Minimum | 201701 |
---|---|
Maximum | 202107 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 201701 |
---|---|
5-th percentile | 201705 |
Q1 | 201803 |
median | 201903.5 |
Q3 | 202005.25 |
95-th percentile | 202105 |
Maximum | 202107 |
Range | 406 |
Interquartile range (IQR) | 202.25 |
Descriptive statistics
Standard deviation | 130.7724816 |
---|---|
Coefficient of variation (CV) | 0.0006477387249 |
Kurtosis | -1.141198686 |
Mean | 201890.788 |
Median Absolute Deviation (MAD) | 101.5 |
Skewness | 0.09052048919 |
Sum | 100945394 |
Variance | 17101.44194 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
201710 | 24 | 4.8% |
202010 | 16 | 3.2% |
201806 | 16 | 3.2% |
201903 | 13 | 2.6% |
201901 | 13 | 2.6% |
202101 | 13 | 2.6% |
201907 | 12 | 2.4% |
201902 | 12 | 2.4% |
201812 | 12 | 2.4% |
201801 | 11 | 2.2% |
Other values (45) | 358 |
Value | Count | Frequency (%) |
201701 | 5 | 1.0% |
201702 | 9 | 1.8% |
201703 | 6 | 1.2% |
201704 | 3 | 0.6% |
201705 | 9 | 1.8% |
201706 | 6 | 1.2% |
201707 | 10 | |
201708 | 6 | 1.2% |
201709 | 5 | 1.0% |
201710 | 24 |
Value | Count | Frequency (%) |
202107 | 11 | |
202106 | 8 | |
202105 | 8 | |
202104 | 7 | |
202103 | 11 | |
202102 | 6 | 1.2% |
202101 | 13 | |
202012 | 6 | 1.2% |
202011 | 6 | 1.2% |
202010 | 16 |
Distinct | 441 |
---|---|
Distinct (%) | 88.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20189095.03 |
Minimum | 20170102 |
---|---|
Maximum | 20210731 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 20170102 |
---|---|
5-th percentile | 20170519.65 |
Q1 | 20180327.75 |
median | 20190364 |
Q3 | 20200547.75 |
95-th percentile | 20210511 |
Maximum | 20210731 |
Range | 40629 |
Interquartile range (IQR) | 20220 |
Descriptive statistics
Standard deviation | 13077.02662 |
---|---|
Coefficient of variation (CV) | 0.0006477272313 |
Kurtosis | -1.141123368 |
Mean | 20189095.03 |
Median Absolute Deviation (MAD) | 10141 |
Skewness | 0.09056341695 |
Sum | 1.009454751 × 1010 |
Variance | 171008625.3 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20190304 | 3 | 0.6% |
20171021 | 3 | 0.6% |
20190207 | 3 | 0.6% |
20170826 | 3 | 0.6% |
20191209 | 2 | 0.4% |
20180119 | 2 | 0.4% |
20180305 | 2 | 0.4% |
20180617 | 2 | 0.4% |
20181207 | 2 | 0.4% |
20190715 | 2 | 0.4% |
Other values (431) | 476 |
Value | Count | Frequency (%) |
20170102 | 1 | |
20170106 | 1 | |
20170115 | 1 | |
20170118 | 1 | |
20170129 | 1 | |
20170202 | 1 | |
20170208 | 1 | |
20170210 | 1 | |
20170213 | 1 | |
20170214 | 2 |
Value | Count | Frequency (%) |
20210731 | 1 | |
20210729 | 1 | |
20210727 | 1 | |
20210725 | 1 | |
20210722 | 1 | |
20210717 | 1 | |
20210714 | 1 | |
20210711 | 1 | |
20210710 | 1 | |
20210709 | 1 |
개인법인구분(PSN_CPR)
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
개인 | |
---|---|
법인 | 29 |
Length
Max length | 2 |
---|---|
Median length | 2 |
Mean length | 2 |
Min length | 2 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 개인 |
---|---|
2nd row | 개인 |
3rd row | 법인 |
4th row | 개인 |
5th row | 개인 |
Common Values
Value | Count | Frequency (%) |
개인 | 471 | |
법인 | 29 | 5.8% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
개인 | 471 | |
법인 | 29 | 5.8% |
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 27 |
Missing (%) | 5.4% |
Memory size | 4.0 KiB |
M | |
---|---|
F |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | M |
---|---|
2nd row | F |
3rd row | M |
4th row | M |
5th row | F |
Common Values
Value | Count | Frequency (%) |
M | 263 | |
F | 210 | |
(Missing) | 27 | 5.4% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
m | 263 | |
f | 210 |
Distinct | 7 |
---|---|
Distinct (%) | 1.5% |
Missing | 32 |
Missing (%) | 6.4% |
Memory size | 4.0 KiB |
30대 | |
---|---|
50대 | |
40대 | |
20대 | |
60대 | |
Other values (2) |
Length
Max length | 5 |
---|---|
Median length | 3 |
Mean length | 3.128205128 |
Min length | 3 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 30대 |
---|---|
2nd row | 40대 |
3rd row | 20대 |
4th row | 20대 |
5th row | 30대 |
Common Values
Value | Count | Frequency (%) |
30대 | 113 | |
50대 | 89 | |
40대 | 83 | |
20대 | 70 | |
60대 | 66 | |
70대이상 | 30 | 6.0% |
10대 | 17 | 3.4% |
(Missing) | 32 | 6.4% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
30대 | 113 | |
50대 | 89 | |
40대 | 83 | |
20대 | 70 | |
60대 | 66 | |
70대이상 | 30 | 6.4% |
10대 | 17 | 3.6% |
카드이용금액계(AMT_CORR)
Real number (ℝ≥0)
Distinct | 370 |
---|---|
Distinct (%) | 74.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 581545.8756 |
Minimum | 5030 |
---|---|
Maximum | 18832823 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 5030 |
---|---|
5-th percentile | 16070.85 |
Q1 | 52312 |
median | 134552.5 |
Q3 | 375741 |
95-th percentile | 2114536.55 |
Maximum | 18832823 |
Range | 18827793 |
Interquartile range (IQR) | 323429 |
Descriptive statistics
Standard deviation | 1700597.688 |
---|---|
Coefficient of variation (CV) | 2.924270912 |
Kurtosis | 50.03216233 |
Mean | 581545.8756 |
Median Absolute Deviation (MAD) | 99342.5 |
Skewness | 6.482024953 |
Sum | 290772937.8 |
Variance | 2.892032497 × 1012 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
45270 | 8 | 1.6% |
35210 | 7 | 1.4% |
40240 | 6 | 1.2% |
75450 | 6 | 1.2% |
60360 | 5 | 1.0% |
85510 | 5 | 1.0% |
33198 | 5 | 1.0% |
50300 | 5 | 1.0% |
80480 | 4 | 0.8% |
100600 | 4 | 0.8% |
Other values (360) | 445 |
Value | Count | Frequency (%) |
5030 | 3 | |
5331.8 | 1 | 0.2% |
6539 | 1 | 0.2% |
7042 | 1 | 0.2% |
7545 | 2 | |
8048 | 1 | 0.2% |
9054 | 2 | |
10060 | 2 | |
11066 | 1 | 0.2% |
12072 | 1 | 0.2% |
Value | Count | Frequency (%) |
18832823 | 1 | |
14038327.6 | 1 | |
13191360 | 1 | |
12469923.3 | 1 | |
11320518 | 1 | |
8907551.55 | 1 | |
8898774.2 | 1 | |
7606164.8 | 1 | |
6417022.5 | 1 | |
6047991.52 | 1 |
카드이용건수(USECT_CORR)
Real number (ℝ≥0)
Distinct | 34 |
---|---|
Distinct (%) | 6.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20.56754 |
Minimum | 5.03 |
---|---|
Maximum | 352.1 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 5.03 |
---|---|
5-th percentile | 5.03 |
Q1 | 5.03 |
median | 10.06 |
Q3 | 25.15 |
95-th percentile | 65.39 |
Maximum | 352.1 |
Range | 347.07 |
Interquartile range (IQR) | 20.12 |
Descriptive statistics
Standard deviation | 29.68060645 |
---|---|
Coefficient of variation (CV) | 1.44308004 |
Kurtosis | 42.08403046 |
Mean | 20.56754 |
Median Absolute Deviation (MAD) | 5.03 |
Skewness | 5.21428054 |
Sum | 10283.77 |
Variance | 880.9383993 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
5.03 | 200 | |
10.06 | 75 | 15.0% |
15.09 | 38 | 7.6% |
20.12 | 34 | 6.8% |
30.18 | 23 | 4.6% |
25.15 | 20 | 4.0% |
35.21 | 15 | 3.0% |
9.1 | 12 | 2.4% |
45.27 | 12 | 2.4% |
40.24 | 11 | 2.2% |
Other values (24) | 60 | 12.0% |
Value | Count | Frequency (%) |
5.03 | 200 | |
9.1 | 12 | 2.4% |
10.06 | 75 | 15.0% |
15.09 | 38 | 7.6% |
18.2 | 9 | 1.8% |
20.12 | 34 | 6.8% |
25.15 | 20 | 4.0% |
27.3 | 3 | 0.6% |
30.18 | 23 | 4.6% |
35.21 | 15 | 3.0% |
Value | Count | Frequency (%) |
352.1 | 1 | |
246.47 | 1 | |
206.23 | 1 | |
160.96 | 1 | |
150.9 | 1 | |
127.4 | 1 | |
120.72 | 2 | |
109.2 | 1 | |
95.57 | 2 | |
90.54 | 1 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
가맹점집계구코드(TOT_REG_CD) | 내국인업종코드(SB_UPJONG_CD) | 기준년월(TS_YM) | 일별(TS_YMD) | 개인법인구분(PSN_CPR) | 성별(SEX_CCD) | 연령대별(AGE_GB) | 카드이용금액계(AMT_CORR) | 카드이용건수(USECT_CORR) | |
---|---|---|---|---|---|---|---|---|---|
0 | 1122060020002 | SB008 | 202103 | 20210319 | 개인 | <NA> | 30대 | 41246.0 | 15.09 |
1 | 1118054020017 | SB007 | 202003 | 20200331 | 개인 | M | 40대 | 40240.0 | 5.03 |
2 | 1102060020001 | SB016 | 201807 | 20180711 | 법인 | F | 20대 | 11320518.0 | 5.03 |
3 | 1104055020002 | SB008 | 201907 | 20190720 | 개인 | M | 20대 | 5331.8 | 65.39 |
4 | 1112068010404 | SB051 | 201905 | 20190524 | 개인 | M | 30대 | 50400.6 | 30.18 |
5 | 1106090020001 | SB044 | 201710 | 20171014 | 개인 | F | 60대 | 65390.0 | 10.06 |
6 | 1114074020006 | SB005 | 201712 | 20171203 | 개인 | <NA> | 20대 | 56084.5 | 10.06 |
7 | 1118054010002 | SB004 | 201907 | 20190705 | 개인 | M | 50대 | 56336.0 | 5.03 |
8 | 1119074060001 | SB054 | 201902 | 20190207 | 개인 | M | 40대 | 310351.0 | 15.09 |
9 | 1113075040005 | SB013 | 201710 | 20171023 | 개인 | F | 50대 | 186110.0 | 35.21 |
Last rows
가맹점집계구코드(TOT_REG_CD) | 내국인업종코드(SB_UPJONG_CD) | 기준년월(TS_YM) | 일별(TS_YMD) | 개인법인구분(PSN_CPR) | 성별(SEX_CCD) | 연령대별(AGE_GB) | 카드이용금액계(AMT_CORR) | 카드이용건수(USECT_CORR) | |
---|---|---|---|---|---|---|---|---|---|
490 | 1107070010606 | SB008 | 201711 | 20171123 | 개인 | M | 20대 | 186110.0 | 10.06 |
491 | 1107060030012 | SB019 | 202104 | 20210425 | 개인 | M | 70대이상 | 160960.0 | 90.54 |
492 | 1110051040002 | SB008 | 202002 | 20200220 | 개인 | F | 50대 | 85510.0 | 9.1 |
493 | 1121071010002 | SB045 | 202001 | 20200109 | 개인 | F | 50대 | 150900.0 | 5.03 |
494 | 1102060060001 | SB051 | 202106 | 20210608 | 개인 | F | 30대 | 35210.0 | 5.03 |
495 | 1106081030003 | SB005 | 201812 | 20181225 | 개인 | F | 30대 | 63478.6 | 40.24 |
496 | 1103063020101 | SB020 | 201907 | 20190729 | 개인 | M | 50대 | 87622.6 | 5.03 |
497 | 1111058030001 | SB054 | 201710 | 20171021 | 개인 | F | 50대 | 28168.0 | 10.06 |
498 | 1115059030005 | SB007 | 201802 | 20180224 | 개인 | M | 20대 | 32946.5 | 5.03 |
499 | 1120072030004 | SB039 | 201810 | 20181028 | 개인 | F | 40대 | 191643.0 | 45.27 |