Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 500 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 29.9 KiB |
Average record size in memory | 61.3 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 2 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | 신한카드 |
URL | https://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=51 |
기준년월(TS_YM) is highly correlated with 일별(TS_YMD) | High correlation |
일별(TS_YMD) is highly correlated with 기준년월(TS_YM) | High correlation |
Reproduction
Analysis started | 2022-10-29 06:26:31.517898 |
---|---|
Analysis finished | 2022-10-29 06:26:39.814250 |
Duration | 8.3 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
가맹점집계구코드(TOT_REG_CD)
Real number (ℝ≥0)
Distinct | 414 |
---|---|
Distinct (%) | 82.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 1.114012363 × 1012 |
Minimum | 1.10105302 × 1012 |
---|---|
Maximum | 1.12507202 × 1012 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 1.10105302 × 1012 |
---|---|
5-th percentile | 1.102056922 × 1012 |
Q1 | 1.10607101 × 1012 |
median | 1.11406951 × 1012 |
Q3 | 1.12205403 × 1012 |
95-th percentile | 1.12405721 × 1012 |
Maximum | 1.12507202 × 1012 |
Range | 2.401900002 × 1010 |
Interquartile range (IQR) | 1.598302 × 1010 |
Descriptive statistics
Standard deviation | 7935546698 |
---|---|
Coefficient of variation (CV) | 0.007123391948 |
Kurtosis | -1.434694349 |
Mean | 1.114012363 × 1012 |
Median Absolute Deviation (MAD) | 7987500022 |
Skewness | -0.2440411236 |
Sum | 5.570061815 × 1014 |
Variance | 6.297290139 × 1019 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1.11905403 × 1012 | 6 | 1.2% |
1.12306403 × 1012 | 5 | 1.0% |
1.11406608 × 1012 | 4 | 0.8% |
1.10205504 × 1012 | 4 | 0.8% |
1.10307302 × 1012 | 4 | 0.8% |
1.11406607 × 1012 | 3 | 0.6% |
1.12306403 × 1012 | 3 | 0.6% |
1.11307504 × 1012 | 3 | 0.6% |
1.12205403 × 1012 | 3 | 0.6% |
1.12306502 × 1012 | 3 | 0.6% |
Other values (404) | 462 |
Value | Count | Frequency (%) |
1.10105302 × 1012 | 2 | |
1.10105401 × 1012 | 1 | |
1.10105602 × 1012 | 1 | |
1.10106103 × 1012 | 1 | |
1.10106103 × 1012 | 2 | |
1.10106103 × 1012 | 1 | |
1.10106104 × 1012 | 1 | |
1.10106302 × 1012 | 1 | |
1.10106303 × 1012 | 1 | |
1.10107201 × 1012 | 1 |
Value | Count | Frequency (%) |
1.12507202 × 1012 | 1 | |
1.12507201 × 1012 | 1 | |
1.12506702 × 1012 | 1 | |
1.12505901 × 1012 | 1 | |
1.12505101 × 1012 | 1 | |
1.124082021 × 1012 | 1 | |
1.12408002 × 1012 | 2 | |
1.12407703 × 1012 | 2 | |
1.12407105 × 1012 | 1 | |
1.12407104 × 1012 | 1 |
외국인관광업종코드(SF_UPJONG_CD)
Categorical
Distinct | 44 |
---|---|
Distinct (%) | 8.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
SF020816 | |
---|---|
SF010408 | |
SF010101 | |
SF020713 | |
SF010306 | |
Other values (39) |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Unique
Unique | 7 ? |
---|---|
Unique (%) | 1.4% |
Sample
1st row | SF010408 |
---|---|
2nd row | SF020713 |
3rd row | SF061535 |
4th row | SF010306 |
5th row | SF010305 |
Common Values
Value | Count | Frequency (%) |
SF020816 | 86 | |
SF010408 | 85 | |
SF010101 | 62 | |
SF020713 | 31 | 6.2% |
SF010306 | 22 | 4.4% |
SF010305 | 21 | 4.2% |
SF010203 | 18 | 3.6% |
SF082148 | 16 | 3.2% |
SF010202 | 14 | 2.8% |
SF082045 | 12 | 2.4% |
Other values (34) | 133 |
Length
Value | Count | Frequency (%) |
sf020816 | 86 | |
sf010408 | 85 | |
sf010101 | 62 | |
sf020713 | 31 | 6.2% |
sf010306 | 22 | 4.4% |
sf010305 | 21 | 4.2% |
sf010203 | 18 | 3.6% |
sf082148 | 16 | 3.2% |
sf010202 | 14 | 2.8% |
sf082045 | 12 | 2.4% |
Other values (34) | 133 |
Distinct | 55 |
---|---|
Distinct (%) | 11.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 201878.378 |
Minimum | 201701 |
---|---|
Maximum | 202107 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 201701 |
---|---|
5-th percentile | 201703 |
Q1 | 201802 |
median | 201903 |
Q3 | 202004 |
95-th percentile | 202104 |
Maximum | 202107 |
Range | 406 |
Interquartile range (IQR) | 202 |
Descriptive statistics
Standard deviation | 129.7829972 |
---|---|
Coefficient of variation (CV) | 0.0006428771546 |
Kurtosis | -1.104051813 |
Mean | 201878.378 |
Median Absolute Deviation (MAD) | 101 |
Skewness | 0.1516933422 |
Sum | 100939189 |
Variance | 16843.62637 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
201903 | 17 | 3.4% |
201710 | 16 | 3.2% |
201705 | 15 | 3.0% |
201908 | 14 | 2.8% |
202007 | 14 | 2.8% |
201808 | 14 | 2.8% |
201912 | 13 | 2.6% |
201701 | 13 | 2.6% |
201809 | 13 | 2.6% |
202001 | 13 | 2.6% |
Other values (45) | 358 |
Value | Count | Frequency (%) |
201701 | 13 | |
201702 | 10 | |
201703 | 11 | |
201704 | 9 | |
201705 | 15 | |
201706 | 6 | 1.2% |
201707 | 10 | |
201708 | 3 | 0.6% |
201709 | 9 | |
201710 | 16 |
Value | Count | Frequency (%) |
202107 | 8 | |
202106 | 3 | 0.6% |
202105 | 8 | |
202104 | 7 | |
202103 | 9 | |
202102 | 10 | |
202101 | 7 | |
202012 | 7 | |
202011 | 8 | |
202010 | 7 |
Distinct | 429 |
---|---|
Distinct (%) | 85.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20187854.08 |
Minimum | 20170101 |
---|---|
Maximum | 20210727 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 20170101 |
---|---|
5-th percentile | 20170302.95 |
Q1 | 20180213.75 |
median | 20190319.5 |
Q3 | 20200421.25 |
95-th percentile | 20210401.3 |
Maximum | 20210727 |
Range | 40626 |
Interquartile range (IQR) | 20207.5 |
Descriptive statistics
Standard deviation | 12979.11085 |
---|---|
Coefficient of variation (CV) | 0.0006429168151 |
Kurtosis | -1.104151269 |
Mean | 20187854.08 |
Median Absolute Deviation (MAD) | 10105 |
Skewness | 0.1514788756 |
Sum | 1.009392704 × 1010 |
Variance | 168457318.5 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20190620 | 3 | 0.6% |
20190322 | 3 | 0.6% |
20171208 | 3 | 0.6% |
20190325 | 3 | 0.6% |
20171009 | 2 | 0.4% |
20200731 | 2 | 0.4% |
20171007 | 2 | 0.4% |
20180803 | 2 | 0.4% |
20171021 | 2 | 0.4% |
20170919 | 2 | 0.4% |
Other values (419) | 476 |
Value | Count | Frequency (%) |
20170101 | 1 | |
20170104 | 2 | |
20170105 | 1 | |
20170107 | 1 | |
20170108 | 1 | |
20170117 | 1 | |
20170124 | 1 | |
20170125 | 2 | |
20170127 | 1 | |
20170128 | 1 |
Value | Count | Frequency (%) |
20210727 | 1 | |
20210720 | 1 | |
20210719 | 1 | |
20210718 | 1 | |
20210714 | 1 | |
20210713 | 2 | |
20210703 | 1 | |
20210616 | 1 | |
20210606 | 1 | |
20210604 | 1 |
국가별(COUNTRY_ENG_NM)
Categorical
Distinct | 11 |
---|---|
Distinct (%) | 2.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
미국 | |
---|---|
ETC | |
일본 | |
중국 | |
영국 | |
Other values (6) |
Length
Max length | 3 |
---|---|
Median length | 2 |
Mean length | 2.276 |
Min length | 2 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | ETC |
---|---|
2nd row | 미국 |
3rd row | 미국 |
4th row | 러시아 |
5th row | ETC |
Common Values
Value | Count | Frequency (%) |
미국 | 223 | |
ETC | 119 | |
일본 | 52 | 10.4% |
중국 | 33 | 6.6% |
영국 | 23 | 4.6% |
홍콩 | 15 | 3.0% |
러시아 | 14 | 2.8% |
대만 | 9 | 1.8% |
태국 | 7 | 1.4% |
싱가폴 | 4 | 0.8% |
Length
Value | Count | Frequency (%) |
미국 | 223 | |
etc | 119 | |
일본 | 52 | 10.4% |
중국 | 33 | 6.6% |
영국 | 23 | 4.6% |
홍콩 | 15 | 3.0% |
러시아 | 14 | 2.8% |
대만 | 9 | 1.8% |
태국 | 7 | 1.4% |
싱가폴 | 4 | 0.8% |
카드이용금액계(AMT_CORR)
Real number (ℝ≥0)
Distinct | 357 |
---|---|
Distinct (%) | 71.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 294458.1392 |
Minimum | 400 |
---|---|
Maximum | 25133533 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 400 |
---|---|
5-th percentile | 2595 |
Q1 | 8500 |
median | 21025 |
Q3 | 63790 |
95-th percentile | 521615.5 |
Maximum | 25133533 |
Range | 25133133 |
Interquartile range (IQR) | 55290 |
Descriptive statistics
Standard deviation | 1667494.508 |
---|---|
Coefficient of variation (CV) | 5.662925511 |
Kurtosis | 122.5312819 |
Mean | 294458.1392 |
Median Absolute Deviation (MAD) | 15800 |
Skewness | 10.11052004 |
Sum | 147229069.6 |
Variance | 2.780537935 × 1012 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
6000 | 8 | 1.6% |
4500 | 7 | 1.4% |
24000 | 6 | 1.2% |
9000 | 6 | 1.2% |
4000 | 6 | 1.2% |
8500 | 5 | 1.0% |
8000 | 5 | 1.0% |
7000 | 4 | 0.8% |
10000 | 4 | 0.8% |
10400 | 4 | 0.8% |
Other values (347) | 445 |
Value | Count | Frequency (%) |
400 | 1 | |
600 | 1 | |
900 | 1 | |
950 | 2 | |
1000 | 2 | |
1400 | 1 | |
1500 | 2 | |
1700 | 1 | |
1800 | 2 | |
1900 | 1 |
Value | Count | Frequency (%) |
25133533 | 1 | |
15766000 | 1 | |
11960000 | 1 | |
10250000 | 1 | |
9213603.09 | 1 | |
7933600 | 1 | |
5700000 | 1 | |
5150000 | 1 | |
4950000 | 1 | |
4565252.8 | 1 |
카드이용건수(USECT_CORR)
Real number (ℝ≥0)
Distinct | 21 |
---|---|
Distinct (%) | 4.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.54154 |
Minimum | 1 |
---|---|
Maximum | 107 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
median | 1 |
Q3 | 2 |
95-th percentile | 6 |
Maximum | 107 |
Range | 106 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 6.74558404 |
---|---|
Coefficient of variation (CV) | 2.654132549 |
Kurtosis | 145.3715005 |
Mean | 2.54154 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 11.07360889 |
Sum | 1270.77 |
Variance | 45.50290404 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 316 | |
2 | 72 | 14.4% |
2.86 | 34 | 6.8% |
3 | 23 | 4.6% |
4 | 16 | 3.2% |
5 | 7 | 1.4% |
6 | 5 | 1.0% |
11 | 4 | 0.8% |
8 | 4 | 0.8% |
5.71 | 3 | 0.6% |
Other values (11) | 16 | 3.2% |
Value | Count | Frequency (%) |
1 | 316 | |
2 | 72 | 14.4% |
2.86 | 34 | 6.8% |
3 | 23 | 4.6% |
4 | 16 | 3.2% |
5 | 7 | 1.4% |
5.71 | 3 | 0.6% |
6 | 5 | 1.0% |
7 | 3 | 0.6% |
8 | 4 | 0.8% |
Value | Count | Frequency (%) |
107 | 1 | 0.2% |
72 | 1 | 0.2% |
52 | 1 | 0.2% |
42.84 | 1 | 0.2% |
27 | 1 | 0.2% |
17.14 | 2 | |
14.28 | 1 | 0.2% |
14 | 1 | 0.2% |
11 | 4 | |
10 | 2 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
가맹점집계구코드(TOT_REG_CD) | 외국인관광업종코드(SF_UPJONG_CD) | 기준년월(TS_YM) | 일별(TS_YMD) | 국가별(COUNTRY_ENG_NM) | 카드이용금액계(AMT_CORR) | 카드이용건수(USECT_CORR) | |
---|---|---|---|---|---|---|---|
0 | 1114069010029 | SF010408 | 201712 | 20171202 | ETC | 19000.0 | 1.0 |
1 | 1105066010704 | SF020713 | 201811 | 20181103 | 미국 | 6000.0 | 1.0 |
2 | 1103066030003 | SF061535 | 201706 | 20170607 | 미국 | 921360.31 | 4.0 |
3 | 1114069010019 | SF010306 | 201912 | 20191225 | 러시아 | 4700.0 | 1.0 |
4 | 1103069010101 | SF010305 | 202105 | 20210508 | ETC | 106900.0 | 2.0 |
5 | 1123052020005 | SF010306 | 201912 | 20191218 | 미국 | 200000.0 | 4.0 |
6 | 1122054030002 | SF010305 | 201707 | 20170716 | 미국 | 59490.0 | 1.0 |
7 | 1105054030011 | SF020713 | 202102 | 20210219 | 중국 | 9213603.09 | 1.0 |
8 | 1102055040001 | SF061536 | 202007 | 20200711 | 일본 | 6000.0 | 2.0 |
9 | 1114071010001 | SF010408 | 201711 | 20171105 | 미국 | 10300.0 | 2.86 |
Last rows
가맹점집계구코드(TOT_REG_CD) | 외국인관광업종코드(SF_UPJONG_CD) | 기준년월(TS_YM) | 일별(TS_YMD) | 국가별(COUNTRY_ENG_NM) | 카드이용금액계(AMT_CORR) | 카드이용건수(USECT_CORR) | |
---|---|---|---|---|---|---|---|
490 | 1120055030008 | SF010101 | 201811 | 20181120 | 미국 | 15050.0 | 2.86 |
491 | 1113075030010 | SF010101 | 202107 | 20210713 | 미국 | 65200.0 | 1.0 |
492 | 1123066010003 | SF031020 | 201903 | 20190308 | ETC | 82922.43 | 1.0 |
493 | 1121073020007 | SF020816 | 201811 | 20181124 | 중국 | 73710.0 | 2.86 |
494 | 1106071020007 | SF010306 | 202010 | 20201011 | 미국 | 16500.0 | 1.0 |
495 | 1121058010021 | SF020816 | 202009 | 20200901 | ETC | 4500.0 | 1.0 |
496 | 1123064030006 | SF020713 | 201909 | 20190920 | 일본 | 17200.0 | 2.0 |
497 | 1121061020004 | SF010408 | 201909 | 20190917 | 미국 | 22100.0 | 1.0 |
498 | 1122054030002 | SF010101 | 202011 | 20201117 | 미국 | 6000.0 | 5.0 |
499 | 1123064050003 | SF010408 | 201904 | 20190406 | ETC | 13820.4 | 1.0 |