Overview

Dataset statistics

Number of variables7
Number of observations500
Missing cells58
Missing cells (%)1.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory29.4 KiB
Average record size in memory60.3 B

Variable types

Categorical2
Text1
Numeric4

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=318

Alerts

가맹점주소광역시도(SIDO) is highly imbalanced (60.1%)Imbalance
가맹점주소시군구(SGG) has 58 (11.6%) missing valuesMissing

Reproduction

Analysis started2023-12-10 14:58:52.954675
Analysis finished2023-12-10 14:58:57.193635
Duration4.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

가맹점주소광역시도(SIDO)
Categorical

IMBALANCE 

Distinct16
Distinct (%)3.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
서울
362 
경기
72 
강원
 
10
인천
 
9
부산
 
7
Other values (11)
40 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row서울
2nd row서울
3rd row서울
4th row서울
5th row서울

Common Values

ValueCountFrequency (%)
서울 362
72.4%
경기 72
 
14.4%
강원 10
 
2.0%
인천 9
 
1.8%
부산 7
 
1.4%
경북 6
 
1.2%
대구 6
 
1.2%
제주 5
 
1.0%
경남 5
 
1.0%
충남 5
 
1.0%
Other values (6) 13
 
2.6%

Length

2023-12-10T23:58:57.514890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
서울 362
72.4%
경기 72
 
14.4%
강원 10
 
2.0%
인천 9
 
1.8%
부산 7
 
1.4%
경북 6
 
1.2%
대구 6
 
1.2%
제주 5
 
1.0%
경남 5
 
1.0%
충남 5
 
1.0%
Other values (6) 13
 
2.6%
Distinct53
Distinct (%)12.0%
Missing58
Missing (%)11.6%
Memory size4.0 KiB
2023-12-10T23:58:58.011975image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length4
Median length3
Mean length3.0113122
Min length2

Characters and Unicode

Total characters1331
Distinct characters58
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)1.8%

Sample

1st row강남구
2nd row관악구
3rd row마포구
4th row동작구
5th row마포구
ValueCountFrequency (%)
중구 39
 
8.8%
마포구 38
 
8.6%
강남구 29
 
6.6%
송파구 26
 
5.9%
용산구 23
 
5.2%
서초구 18
 
4.1%
성남시 18
 
4.1%
영등포구 17
 
3.8%
서대문구 16
 
3.6%
동작구 14
 
3.2%
Other values (43) 204
46.2%
2023-12-10T23:58:58.677327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
361
27.1%
96
 
7.2%
57
 
4.3%
54
 
4.1%
49
 
3.7%
49
 
3.7%
47
 
3.5%
45
 
3.4%
38
 
2.9%
35
 
2.6%
Other values (48) 500
37.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1331
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
361
27.1%
96
 
7.2%
57
 
4.3%
54
 
4.1%
49
 
3.7%
49
 
3.7%
47
 
3.5%
45
 
3.4%
38
 
2.9%
35
 
2.6%
Other values (48) 500
37.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1331
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
361
27.1%
96
 
7.2%
57
 
4.3%
54
 
4.1%
49
 
3.7%
49
 
3.7%
47
 
3.5%
45
 
3.4%
38
 
2.9%
35
 
2.6%
Other values (48) 500
37.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1331
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
361
27.1%
96
 
7.2%
57
 
4.3%
54
 
4.1%
49
 
3.7%
49
 
3.7%
47
 
3.5%
45
 
3.4%
38
 
2.9%
35
 
2.6%
Other values (48) 500
37.6%
Distinct14
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
요식/유흥
127 
유통
116 
전자상거래
71 
음/식료품
32 
의료
31 
Other values (9)
123 

Length

Max length9
Median length5
Mean length4.208
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row전자상거래
2nd row요식/유흥
3rd row주유
4th row의료
5th row가정생활/서비스

Common Values

ValueCountFrequency (%)
요식/유흥 127
25.4%
유통 116
23.2%
전자상거래 71
14.2%
음/식료품 32
 
6.4%
의료 31
 
6.2%
주유 26
 
5.2%
가정생활/서비스 25
 
5.0%
스포츠/문화/레저 23
 
4.6%
여행/교통 12
 
2.4%
미용 10
 
2.0%
Other values (4) 27
 
5.4%

Length

2023-12-10T23:58:58.961035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
요식/유흥 127
25.4%
유통 116
23.2%
전자상거래 71
14.2%
음/식료품 32
 
6.4%
의료 31
 
6.2%
주유 26
 
5.2%
가정생활/서비스 25
 
5.0%
스포츠/문화/레저 23
 
4.6%
여행/교통 12
 
2.4%
미용 10
 
2.0%
Other values (4) 27
 
5.4%

기준일자(YMD)
Real number (ℝ)

Distinct440
Distinct (%)88.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20183077
Minimum20160101
Maximum20210729
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:59.234745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20160101
5-th percentile20160321
Q120170324
median20180762
Q320200310
95-th percentile20210504
Maximum20210729
Range50628
Interquartile range (IQR)29986

Descriptive statistics

Standard deviation16729.635
Coefficient of variation (CV)0.00082889419
Kurtosis-1.2265561
Mean20183077
Median Absolute Deviation (MAD)10596
Skewness0.14832841
Sum1.0091538 × 1010
Variance2.7988069 × 108
MonotonicityNot monotonic
2023-12-10T23:58:59.544485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20160715 3
 
0.6%
20171027 3
 
0.6%
20180901 3
 
0.6%
20170116 3
 
0.6%
20200727 3
 
0.6%
20201021 2
 
0.4%
20190327 2
 
0.4%
20180428 2
 
0.4%
20160705 2
 
0.4%
20180707 2
 
0.4%
Other values (430) 475
95.0%
ValueCountFrequency (%)
20160101 1
0.2%
20160104 1
0.2%
20160109 2
0.4%
20160112 1
0.2%
20160113 1
0.2%
20160131 2
0.4%
20160204 2
0.4%
20160206 1
0.2%
20160208 1
0.2%
20160217 1
0.2%
ValueCountFrequency (%)
20210729 2
0.4%
20210715 1
0.2%
20210714 2
0.4%
20210710 1
0.2%
20210709 1
0.2%
20210706 1
0.2%
20210705 1
0.2%
20210630 1
0.2%
20210628 1
0.2%
20210621 2
0.4%
Distinct492
Distinct (%)98.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.1148495 × 1012
Minimum1.101055 × 1012
Maximum1.125074 × 1012
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:59.907411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1.101055 × 1012
5-th percentile1.104055 × 1012
Q11.1100538 × 1012
median1.115059 × 1012
Q31.1210645 × 1012
95-th percentile1.12408 × 1012
Maximum1.125074 × 1012
Range2.4019 × 1010
Interquartile range (IQR)1.1010763 × 1010

Descriptive statistics

Standard deviation6.6449191 × 109
Coefficient of variation (CV)0.0059603734
Kurtosis-1.0952925
Mean1.1148495 × 1012
Median Absolute Deviation (MAD)5.9950152 × 109
Skewness-0.14973229
Sum5.5742473 × 1014
Variance4.415495 × 1019
MonotonicityNot monotonic
2023-12-10T23:59:00.199644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1114075010206 2
 
0.4%
1117061010005 2
 
0.4%
1121062020001 2
 
0.4%
1105059010004 2
 
0.4%
1107068020011 2
 
0.4%
1124081022104 2
 
0.4%
1121054010205 2
 
0.4%
1123058010501 2
 
0.4%
1103073030002 1
 
0.2%
1107062080101 1
 
0.2%
Other values (482) 482
96.4%
ValueCountFrequency (%)
1101055020004 1
0.2%
1101056010003 1
0.2%
1101069010001 1
0.2%
1101072010002 1
0.2%
1102060070001 1
0.2%
1102068020005 1
0.2%
1102069030008 1
0.2%
1102070010006 1
0.2%
1102070020003 1
0.2%
1102071020002 1
0.2%
ValueCountFrequency (%)
1125074020031 1
0.2%
1125073020006 1
0.2%
1125073010801 1
0.2%
1125073010003 1
0.2%
1125067020020 1
0.2%
1125067020001 1
0.2%
1125066020801 1
0.2%
1125065020012 1
0.2%
1125063010004 1
0.2%
1125059010220 1
0.2%
Distinct349
Distinct (%)69.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean321541.93
Minimum2515
Maximum6850860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:59:00.544437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2515
5-th percentile14561.85
Q137712.5
median112295
Q3274147.5
95-th percentile1200892.4
Maximum6850860
Range6848345
Interquartile range (IQR)236435

Descriptive statistics

Standard deviation707236.83
Coefficient of variation (CV)2.1995167
Kurtosis38.225801
Mean321541.93
Median Absolute Deviation (MAD)88905
Skewness5.528861
Sum1.6077096 × 108
Variance5.0018393 × 1011
MonotonicityNot monotonic
2023-12-10T23:59:00.863566image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22635 17
 
3.4%
45270 10
 
2.0%
251500 8
 
1.6%
40240 8
 
1.6%
50300 7
 
1.4%
15090 7
 
1.4%
30180 6
 
1.2%
75450 6
 
1.2%
25150 6
 
1.2%
12575 5
 
1.0%
Other values (339) 420
84.0%
ValueCountFrequency (%)
2515 1
 
0.2%
5030 1
 
0.2%
5533 1
 
0.2%
5785 1
 
0.2%
6036 1
 
0.2%
6539 1
 
0.2%
7042 1
 
0.2%
8048 2
0.4%
9557 1
 
0.2%
10060 4
0.8%
ValueCountFrequency (%)
6850860 1
0.2%
6614450 1
0.2%
5149865 1
0.2%
5099414 1
0.2%
4200503 1
0.2%
3311299 1
0.2%
3281673 1
0.2%
3219200 1
0.2%
2863780 1
0.2%
2615600 1
0.2%
Distinct24
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.648
Minimum5
Maximum186
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:59:01.101272image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile5
Q15
median5
Q310
95-th percentile50
Maximum186
Range181
Interquartile range (IQR)5

Descriptive statistics

Standard deviation21.971016
Coefficient of variation (CV)1.6098341
Kurtosis23.863151
Mean13.648
Median Absolute Deviation (MAD)0
Skewness4.5162268
Sum6824
Variance482.72555
MonotonicityNot monotonic
2023-12-10T23:59:01.294911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
5 297
59.4%
10 84
 
16.8%
15 37
 
7.4%
20 22
 
4.4%
30 11
 
2.2%
25 10
 
2.0%
35 7
 
1.4%
45 4
 
0.8%
50 4
 
0.8%
121 3
 
0.6%
Other values (14) 21
 
4.2%
ValueCountFrequency (%)
5 297
59.4%
10 84
 
16.8%
15 37
 
7.4%
20 22
 
4.4%
25 10
 
2.0%
30 11
 
2.2%
35 7
 
1.4%
40 2
 
0.4%
45 4
 
0.8%
50 4
 
0.8%
ValueCountFrequency (%)
186 1
 
0.2%
171 1
 
0.2%
161 1
 
0.2%
136 1
 
0.2%
121 3
0.6%
116 2
0.4%
101 1
 
0.2%
96 2
0.4%
86 2
0.4%
80 1
 
0.2%

Interactions

2023-12-10T23:58:55.995909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:53.524416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:54.342761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:55.123916image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:56.175762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:53.708704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:54.531220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:55.331651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:56.369350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:53.898267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:54.699242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:55.515640image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:56.569398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:54.141966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:54.915117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:55.775147image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:59:01.922838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
가맹점주소광역시도(SIDO)가맹점주소시군구(SGG)업종대분류(UPJONG_CLASS1)기준일자(YMD)고객주소집계구별(TOT_REG_CD)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
가맹점주소광역시도(SIDO)1.0000.0000.0000.1300.1890.0000.000
가맹점주소시군구(SGG)0.0001.0000.0000.0000.1940.0000.000
업종대분류(UPJONG_CLASS1)0.0000.0001.0000.0000.0480.0000.147
기준일자(YMD)0.1300.0000.0001.0000.1150.1150.000
고객주소집계구별(TOT_REG_CD)0.1890.1940.0480.1151.0000.1200.162
카드이용금액계(AMT_CORR)0.0000.0000.0000.1150.1201.0000.179
카드이용건수계(USECT_CORR)0.0000.0000.1470.0000.1620.1791.000
2023-12-10T23:59:02.184762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종대분류(UPJONG_CLASS1)가맹점주소광역시도(SIDO)
업종대분류(UPJONG_CLASS1)1.0000.000
가맹점주소광역시도(SIDO)0.0001.000
2023-12-10T23:59:02.352885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준일자(YMD)고객주소집계구별(TOT_REG_CD)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)가맹점주소광역시도(SIDO)업종대분류(UPJONG_CLASS1)
기준일자(YMD)1.000-0.025-0.020-0.0730.0510.000
고객주소집계구별(TOT_REG_CD)-0.0251.0000.0700.0770.0720.021
카드이용금액계(AMT_CORR)-0.0200.0701.000-0.0170.0000.000
카드이용건수계(USECT_CORR)-0.0730.077-0.0171.0000.0000.059
가맹점주소광역시도(SIDO)0.0510.0720.0000.0001.0000.000
업종대분류(UPJONG_CLASS1)0.0000.0210.0000.0590.0001.000

Missing values

2023-12-10T23:58:56.808891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:58:57.046310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

가맹점주소광역시도(SIDO)가맹점주소시군구(SGG)업종대분류(UPJONG_CLASS1)기준일자(YMD)고객주소집계구별(TOT_REG_CD)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
0서울강남구전자상거래20201021112307702011925150025
1서울관악구요식/유흥20170312111205201010715844520
2서울마포구주유201612051113068050004804815
3서울동작구의료202012101105066020601226355
4서울<NA>가정생활/서비스2021021611170520200016539010
5경기마포구유통2020031911080600200102012025
6인천종로구요식/유흥2018112311080580205031104095
7경기송파구요식/유흥20160418111205801003316045750
8경기용인시미용2020100211130740200028048005
9서울중구유통20170405111907602060218271015
가맹점주소광역시도(SIDO)가맹점주소시군구(SGG)업종대분류(UPJONG_CLASS1)기준일자(YMD)고객주소집계구별(TOT_REG_CD)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
490서울영등포구자동차20190729112106802002210603245
491경기구리시의료2016112311210830100078143575
492서울중구의료2016022311110730401092012005
493서울구로구미용202104281124059020102165995
494서울중구주유202103091110054010002794745
495서울종로구가전/가구2020110811200550300055533010
496부산부평구의료2018090111220580201028535915
497서울강남구유통20160326112306402000820522405
498서울서초구스포츠/문화/레저201608041124061010014226355
499인천마포구음/식료품202106281122052030002667481116