Overview

Dataset statistics

Number of variables7
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory29.9 KiB
Average record size in memory61.3 B

Variable types

Numeric5
Categorical2

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=51

Alerts

기준년월(TS_YM) is highly correlated with 일별(TS_YMD)High correlation
일별(TS_YMD) is highly correlated with 기준년월(TS_YM)High correlation

Reproduction

Analysis started2022-10-29 06:26:31.517898
Analysis finished2022-10-29 06:26:39.814250
Duration8.3 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Distinct414
Distinct (%)82.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.114012363 × 1012
Minimum1.10105302 × 1012
Maximum1.12507202 × 1012
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:39.951324image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1.10105302 × 1012
5-th percentile1.102056922 × 1012
Q11.10607101 × 1012
median1.11406951 × 1012
Q31.12205403 × 1012
95-th percentile1.12405721 × 1012
Maximum1.12507202 × 1012
Range2.401900002 × 1010
Interquartile range (IQR)1.598302 × 1010

Descriptive statistics

Standard deviation7935546698
Coefficient of variation (CV)0.007123391948
Kurtosis-1.434694349
Mean1.114012363 × 1012
Median Absolute Deviation (MAD)7987500022
Skewness-0.2440411236
Sum5.570061815 × 1014
Variance6.297290139 × 1019
MonotonicityNot monotonic
2022-10-29T15:26:40.308481image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.11905403 × 10126
 
1.2%
1.12306403 × 10125
 
1.0%
1.11406608 × 10124
 
0.8%
1.10205504 × 10124
 
0.8%
1.10307302 × 10124
 
0.8%
1.11406607 × 10123
 
0.6%
1.12306403 × 10123
 
0.6%
1.11307504 × 10123
 
0.6%
1.12205403 × 10123
 
0.6%
1.12306502 × 10123
 
0.6%
Other values (404)462
92.4%
ValueCountFrequency (%)
1.10105302 × 10122
0.4%
1.10105401 × 10121
0.2%
1.10105602 × 10121
0.2%
1.10106103 × 10121
0.2%
1.10106103 × 10122
0.4%
1.10106103 × 10121
0.2%
1.10106104 × 10121
0.2%
1.10106302 × 10121
0.2%
1.10106303 × 10121
0.2%
1.10107201 × 10121
0.2%
ValueCountFrequency (%)
1.12507202 × 10121
0.2%
1.12507201 × 10121
0.2%
1.12506702 × 10121
0.2%
1.12505901 × 10121
0.2%
1.12505101 × 10121
0.2%
1.124082021 × 10121
0.2%
1.12408002 × 10122
0.4%
1.12407703 × 10122
0.4%
1.12407105 × 10121
0.2%
1.12407104 × 10121
0.2%
Distinct44
Distinct (%)8.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
SF020816
86 
SF010408
85 
SF010101
62 
SF020713
31 
SF010306
22 
Other values (39)
214 

Length

Max length8
Median length8
Mean length8
Min length8

Unique

Unique7 ?
Unique (%)1.4%

Sample

1st rowSF010408
2nd rowSF020713
3rd rowSF061535
4th rowSF010306
5th rowSF010305

Common Values

ValueCountFrequency (%)
SF02081686
17.2%
SF01040885
17.0%
SF01010162
12.4%
SF02071331
 
6.2%
SF01030622
 
4.4%
SF01030521
 
4.2%
SF01020318
 
3.6%
SF08214816
 
3.2%
SF01020214
 
2.8%
SF08204512
 
2.4%
Other values (34)133
26.6%

Length

2022-10-29T15:26:40.719209image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sf02081686
17.2%
sf01040885
17.0%
sf01010162
12.4%
sf02071331
 
6.2%
sf01030622
 
4.4%
sf01030521
 
4.2%
sf01020318
 
3.6%
sf08214816
 
3.2%
sf01020214
 
2.8%
sf08204512
 
2.4%
Other values (34)133
26.6%

기준년월(TS_YM)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct55
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201878.378
Minimum201701
Maximum202107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:40.981701image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum201701
5-th percentile201703
Q1201802
median201903
Q3202004
95-th percentile202104
Maximum202107
Range406
Interquartile range (IQR)202

Descriptive statistics

Standard deviation129.7829972
Coefficient of variation (CV)0.0006428771546
Kurtosis-1.104051813
Mean201878.378
Median Absolute Deviation (MAD)101
Skewness0.1516933422
Sum100939189
Variance16843.62637
MonotonicityNot monotonic
2022-10-29T15:26:41.362480image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20190317
 
3.4%
20171016
 
3.2%
20170515
 
3.0%
20190814
 
2.8%
20200714
 
2.8%
20180814
 
2.8%
20191213
 
2.6%
20170113
 
2.6%
20180913
 
2.6%
20200113
 
2.6%
Other values (45)358
71.6%
ValueCountFrequency (%)
20170113
2.6%
20170210
2.0%
20170311
2.2%
2017049
1.8%
20170515
3.0%
2017066
 
1.2%
20170710
2.0%
2017083
 
0.6%
2017099
1.8%
20171016
3.2%
ValueCountFrequency (%)
2021078
1.6%
2021063
 
0.6%
2021058
1.6%
2021047
1.4%
2021039
1.8%
20210210
2.0%
2021017
1.4%
2020127
1.4%
2020118
1.6%
2020107
1.4%

일별(TS_YMD)
Real number (ℝ≥0)

HIGH CORRELATION

Distinct429
Distinct (%)85.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20187854.08
Minimum20170101
Maximum20210727
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:41.712067image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20170101
5-th percentile20170302.95
Q120180213.75
median20190319.5
Q320200421.25
95-th percentile20210401.3
Maximum20210727
Range40626
Interquartile range (IQR)20207.5

Descriptive statistics

Standard deviation12979.11085
Coefficient of variation (CV)0.0006429168151
Kurtosis-1.104151269
Mean20187854.08
Median Absolute Deviation (MAD)10105
Skewness0.1514788756
Sum1.009392704 × 1010
Variance168457318.5
MonotonicityNot monotonic
2022-10-29T15:26:42.018567image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201906203
 
0.6%
201903223
 
0.6%
201712083
 
0.6%
201903253
 
0.6%
201710092
 
0.4%
202007312
 
0.4%
201710072
 
0.4%
201808032
 
0.4%
201710212
 
0.4%
201709192
 
0.4%
Other values (419)476
95.2%
ValueCountFrequency (%)
201701011
0.2%
201701042
0.4%
201701051
0.2%
201701071
0.2%
201701081
0.2%
201701171
0.2%
201701241
0.2%
201701252
0.4%
201701271
0.2%
201701281
0.2%
ValueCountFrequency (%)
202107271
0.2%
202107201
0.2%
202107191
0.2%
202107181
0.2%
202107141
0.2%
202107132
0.4%
202107031
0.2%
202106161
0.2%
202106061
0.2%
202106041
0.2%
Distinct11
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
미국
223 
ETC
119 
일본
52 
중국
33 
영국
23 
Other values (6)
50 

Length

Max length3
Median length2
Mean length2.276
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st rowETC
2nd row미국
3rd row미국
4th row러시아
5th rowETC

Common Values

ValueCountFrequency (%)
미국223
44.6%
ETC119
23.8%
일본52
 
10.4%
중국33
 
6.6%
영국23
 
4.6%
홍콩15
 
3.0%
러시아14
 
2.8%
대만9
 
1.8%
태국7
 
1.4%
싱가폴4
 
0.8%

Length

2022-10-29T15:26:42.319452image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
미국223
44.6%
etc119
23.8%
일본52
 
10.4%
중국33
 
6.6%
영국23
 
4.6%
홍콩15
 
3.0%
러시아14
 
2.8%
대만9
 
1.8%
태국7
 
1.4%
싱가폴4
 
0.8%

카드이용금액계(AMT_CORR)
Real number (ℝ≥0)

Distinct357
Distinct (%)71.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean294458.1392
Minimum400
Maximum25133533
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:42.616199image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum400
5-th percentile2595
Q18500
median21025
Q363790
95-th percentile521615.5
Maximum25133533
Range25133133
Interquartile range (IQR)55290

Descriptive statistics

Standard deviation1667494.508
Coefficient of variation (CV)5.662925511
Kurtosis122.5312819
Mean294458.1392
Median Absolute Deviation (MAD)15800
Skewness10.11052004
Sum147229069.6
Variance2.780537935 × 1012
MonotonicityNot monotonic
2022-10-29T15:26:42.960523image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
60008
 
1.6%
45007
 
1.4%
240006
 
1.2%
90006
 
1.2%
40006
 
1.2%
85005
 
1.0%
80005
 
1.0%
70004
 
0.8%
100004
 
0.8%
104004
 
0.8%
Other values (347)445
89.0%
ValueCountFrequency (%)
4001
0.2%
6001
0.2%
9001
0.2%
9502
0.4%
10002
0.4%
14001
0.2%
15002
0.4%
17001
0.2%
18002
0.4%
19001
0.2%
ValueCountFrequency (%)
251335331
0.2%
157660001
0.2%
119600001
0.2%
102500001
0.2%
9213603.091
0.2%
79336001
0.2%
57000001
0.2%
51500001
0.2%
49500001
0.2%
4565252.81
0.2%

카드이용건수(USECT_CORR)
Real number (ℝ≥0)

Distinct21
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.54154
Minimum1
Maximum107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2022-10-29T15:26:43.272039image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile6
Maximum107
Range106
Interquartile range (IQR)1

Descriptive statistics

Standard deviation6.74558404
Coefficient of variation (CV)2.654132549
Kurtosis145.3715005
Mean2.54154
Median Absolute Deviation (MAD)0
Skewness11.07360889
Sum1270.77
Variance45.50290404
MonotonicityNot monotonic
2022-10-29T15:26:43.572702image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
1316
63.2%
272
 
14.4%
2.8634
 
6.8%
323
 
4.6%
416
 
3.2%
57
 
1.4%
65
 
1.0%
114
 
0.8%
84
 
0.8%
5.713
 
0.6%
Other values (11)16
 
3.2%
ValueCountFrequency (%)
1316
63.2%
272
 
14.4%
2.8634
 
6.8%
323
 
4.6%
416
 
3.2%
57
 
1.4%
5.713
 
0.6%
65
 
1.0%
73
 
0.6%
84
 
0.8%
ValueCountFrequency (%)
1071
 
0.2%
721
 
0.2%
521
 
0.2%
42.841
 
0.2%
271
 
0.2%
17.142
0.4%
14.281
 
0.2%
141
 
0.2%
114
0.8%
102
0.4%

Interactions

2022-10-29T15:26:37.768253image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:31.839320image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:33.506395image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:34.934131image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:36.369545image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:38.026398image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:32.145976image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:33.758890image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:35.175647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:36.631142image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:38.297597image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:32.449771image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:34.044176image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:35.463285image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:37.027923image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:38.532443image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:32.903539image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:34.336041image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:35.822179image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:37.259881image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:38.793801image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:33.232936image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:34.630782image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:36.101980image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-10-29T15:26:37.504912image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-10-29T15:26:43.808328image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-29T15:26:44.190062image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-29T15:26:44.495501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-29T15:26:44.727903image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-10-29T15:26:45.040808image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-10-29T15:26:39.224392image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-29T15:26:39.659714image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

가맹점집계구코드(TOT_REG_CD)외국인관광업종코드(SF_UPJONG_CD)기준년월(TS_YM)일별(TS_YMD)국가별(COUNTRY_ENG_NM)카드이용금액계(AMT_CORR)카드이용건수(USECT_CORR)
01114069010029SF01040820171220171202ETC19000.01.0
11105066010704SF02071320181120181103미국6000.01.0
21103066030003SF06153520170620170607미국921360.314.0
31114069010019SF01030620191220191225러시아4700.01.0
41103069010101SF01030520210520210508ETC106900.02.0
51123052020005SF01030620191220191218미국200000.04.0
61122054030002SF01030520170720170716미국59490.01.0
71105054030011SF02071320210220210219중국9213603.091.0
81102055040001SF06153620200720200711일본6000.02.0
91114071010001SF01040820171120171105미국10300.02.86

Last rows

가맹점집계구코드(TOT_REG_CD)외국인관광업종코드(SF_UPJONG_CD)기준년월(TS_YM)일별(TS_YMD)국가별(COUNTRY_ENG_NM)카드이용금액계(AMT_CORR)카드이용건수(USECT_CORR)
4901120055030008SF01010120181120181120미국15050.02.86
4911113075030010SF01010120210720210713미국65200.01.0
4921123066010003SF03102020190320190308ETC82922.431.0
4931121073020007SF02081620181120181124중국73710.02.86
4941106071020007SF01030620201020201011미국16500.01.0
4951121058010021SF02081620200920200901ETC4500.01.0
4961123064030006SF02071320190920190920일본17200.02.0
4971121061020004SF01040820190920190917미국22100.01.0
4981122054030002SF01010120201120201117미국6000.05.0
4991123064050003SF01040820190420190406ETC13820.41.0