Overview

Dataset statistics

Number of variables7
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory29.4 KiB
Average record size in memory60.3 B

Variable types

Text1
Numeric4
Categorical2

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=318

Reproduction

Analysis started2023-12-10 14:58:21.092946
Analysis finished2023-12-10 14:58:28.174410
Duration7.08 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct65
Distinct (%)13.0%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
2023-12-10T23:58:28.467405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters2500
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10 ?
Unique (%)2.0%

Sample

1st rowSS013
2nd rowSS048
3rd rowSS016
4th rowSS001
5th rowSS044
ValueCountFrequency (%)
ss016 31
 
6.2%
ss069 25
 
5.0%
ss001 25
 
5.0%
ss058 22
 
4.4%
ss008 21
 
4.2%
ss013 20
 
4.0%
ss048 18
 
3.6%
ss055 17
 
3.4%
ss004 16
 
3.2%
ss068 15
 
3.0%
Other values (55) 290
58.0%
2023-12-10T23:58:29.056917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
S 1000
40.0%
0 650
26.0%
1 152
 
6.1%
4 130
 
5.2%
6 126
 
5.0%
5 114
 
4.6%
8 92
 
3.7%
3 75
 
3.0%
2 73
 
2.9%
9 52
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1500
60.0%
Uppercase Letter 1000
40.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 650
43.3%
1 152
 
10.1%
4 130
 
8.7%
6 126
 
8.4%
5 114
 
7.6%
8 92
 
6.1%
3 75
 
5.0%
2 73
 
4.9%
9 52
 
3.5%
7 36
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
S 1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1500
60.0%
Latin 1000
40.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 650
43.3%
1 152
 
10.1%
4 130
 
8.7%
6 126
 
8.4%
5 114
 
7.6%
8 92
 
6.1%
3 75
 
5.0%
2 73
 
4.9%
9 52
 
3.5%
7 36
 
2.4%
Latin
ValueCountFrequency (%)
S 1000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2500
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 1000
40.0%
0 650
26.0%
1 152
 
6.1%
4 130
 
5.2%
6 126
 
5.0%
5 114
 
4.6%
8 92
 
3.7%
3 75
 
3.0%
2 73
 
2.9%
9 52
 
2.1%

기준년월(YM)
Real number (ℝ)

Distinct67
Distinct (%)13.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201834.71
Minimum201601
Maximum202107
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:29.348969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum201601
5-th percentile201603
Q1201705
median201810
Q3202003
95-th percentile202104
Maximum202107
Range506
Interquartile range (IQR)298

Descriptive statistics

Standard deviation163.52945
Coefficient of variation (CV)0.00081021469
Kurtosis-1.1957063
Mean201834.71
Median Absolute Deviation (MAD)107.5
Skewness0.086923505
Sum1.0091736 × 108
Variance26741.881
MonotonicityNot monotonic
2023-12-10T23:58:29.646769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201812 13
 
2.6%
202012 13
 
2.6%
201907 12
 
2.4%
202104 12
 
2.4%
202003 11
 
2.2%
201903 11
 
2.2%
201604 11
 
2.2%
201804 11
 
2.2%
202011 11
 
2.2%
202106 10
 
2.0%
Other values (57) 385
77.0%
ValueCountFrequency (%)
201601 10
2.0%
201602 9
1.8%
201603 7
1.4%
201604 11
2.2%
201605 9
1.8%
201606 10
2.0%
201607 8
1.6%
201608 10
2.0%
201609 2
 
0.4%
201610 7
1.4%
ValueCountFrequency (%)
202107 6
1.2%
202106 10
2.0%
202105 7
1.4%
202104 12
2.4%
202103 6
1.2%
202102 6
1.2%
202101 6
1.2%
202012 13
2.6%
202011 11
2.2%
202010 5
 
1.0%
Distinct498
Distinct (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean209152.71
Minimum1700
Maximum502146
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:29.884112image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1700
5-th percentile12691.4
Q1147162.5
median218893.5
Q3320160.25
95-th percentile415276.5
Maximum502146
Range500446
Interquartile range (IQR)172997.75

Descriptive statistics

Standard deviation130034.57
Coefficient of variation (CV)0.6217207
Kurtosis-0.92883365
Mean209152.71
Median Absolute Deviation (MAD)78651.5
Skewness-0.079739872
Sum1.0457635 × 108
Variance1.6908989 × 1010
MonotonicityNot monotonic
2023-12-10T23:58:30.117851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
363705 2
 
0.4%
152747 2
 
0.4%
14037 1
 
0.2%
217781 1
 
0.2%
215407 1
 
0.2%
231070 1
 
0.2%
283313 1
 
0.2%
282459 1
 
0.2%
228606 1
 
0.2%
278916 1
 
0.2%
Other values (488) 488
97.6%
ValueCountFrequency (%)
1700 1
0.2%
1795 1
0.2%
5648 1
0.2%
7175 1
0.2%
8321 1
0.2%
8406 1
0.2%
8470 1
0.2%
8705 1
0.2%
8880 1
0.2%
9022 1
0.2%
ValueCountFrequency (%)
502146 1
0.2%
501646 1
0.2%
423014 1
0.2%
422551 1
0.2%
422515 1
0.2%
421887 1
0.2%
421556 1
0.2%
421419 1
0.2%
420859 1
0.2%
420603 1
0.2%

성별(GEDNER)
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
M
260 
F
240 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M 260
52.0%
F 240
48.0%

Length

2023-12-10T23:58:30.360679image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:58:30.705990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 260
52.0%
f 240
48.0%
Distinct7
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
40대
109 
30대
104 
20대
97 
50대
77 
60대
63 
Other values (2)
50 

Length

Max length5
Median length3
Mean length3.152
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row50대
2nd row30대
3rd row50대
4th row30대
5th row30대

Common Values

ValueCountFrequency (%)
40대 109
21.8%
30대 104
20.8%
20대 97
19.4%
50대 77
15.4%
60대 63
12.6%
70대이상 38
 
7.6%
10대 12
 
2.4%

Length

2023-12-10T23:58:31.350613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:58:31.676812image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
40대 109
21.8%
30대 104
20.8%
20대 97
19.4%
50대 77
15.4%
60대 63
12.6%
70대이상 38
 
7.6%
10대 12
 
2.4%
Distinct413
Distinct (%)82.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean907636.86
Minimum15
Maximum50385007
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:32.084044image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum15
5-th percentile20044.55
Q180480
median238422
Q3717554.5
95-th percentile3602858.2
Maximum50385007
Range50384992
Interquartile range (IQR)637074.5

Descriptive statistics

Standard deviation2952486.3
Coefficient of variation (CV)3.2529379
Kurtosis168.53302
Mean907636.86
Median Absolute Deviation (MAD)188625
Skewness11.341779
Sum4.5381843 × 108
Variance8.7171756 × 1012
MonotonicityNot monotonic
2023-12-10T23:58:32.573353image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50300 8
 
1.6%
75450 7
 
1.4%
15090 5
 
1.0%
60360 5
 
1.0%
100600 4
 
0.8%
251500 4
 
0.8%
20120 4
 
0.8%
30180 4
 
0.8%
226350 4
 
0.8%
65390 4
 
0.8%
Other values (403) 451
90.2%
ValueCountFrequency (%)
15 1
 
0.2%
2515 1
 
0.2%
3521 1
 
0.2%
4024 1
 
0.2%
5533 2
0.4%
6036 1
 
0.2%
7545 2
0.4%
8048 3
0.6%
10060 1
 
0.2%
11066 1
 
0.2%
ValueCountFrequency (%)
50385007 1
0.2%
23551179 1
0.2%
19322795 1
0.2%
13478242 1
0.2%
11102382 1
0.2%
9590198 1
0.2%
8722976 1
0.2%
8620414 1
0.2%
7207291 1
0.2%
7108134 1
0.2%
Distinct45
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.316
Minimum5
Maximum795
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:58:32.879048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile5
Q15
median10
Q335
95-th percentile161
Maximum795
Range790
Interquartile range (IQR)30

Descriptive statistics

Standard deviation72.176555
Coefficient of variation (CV)1.9874588
Kurtosis44.012155
Mean36.316
Median Absolute Deviation (MAD)5
Skewness5.6742988
Sum18158
Variance5209.4551
MonotonicityNot monotonic
2023-12-10T23:58:33.153637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
5 165
33.0%
10 86
17.2%
15 55
 
11.0%
25 27
 
5.4%
20 23
 
4.6%
30 18
 
3.6%
45 14
 
2.8%
40 10
 
2.0%
55 9
 
1.8%
50 8
 
1.6%
Other values (35) 85
17.0%
ValueCountFrequency (%)
5 165
33.0%
10 86
17.2%
15 55
 
11.0%
20 23
 
4.6%
25 27
 
5.4%
30 18
 
3.6%
35 7
 
1.4%
40 10
 
2.0%
45 14
 
2.8%
50 8
 
1.6%
ValueCountFrequency (%)
795 1
 
0.2%
684 1
 
0.2%
488 1
 
0.2%
463 1
 
0.2%
423 1
 
0.2%
297 1
 
0.2%
282 1
 
0.2%
267 1
 
0.2%
246 1
 
0.2%
231 3
0.6%

Interactions

2023-12-10T23:58:26.904123image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:24.593469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:25.419907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:26.146634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:27.103795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:24.821312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:25.603857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:26.332064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:27.337017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:24.999979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:25.769066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:26.508914image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:27.527624image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:25.199061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:25.950154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:58:26.706320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:58:33.402580image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
서울시민업종코드(UPJONG_CD)기준년월(YM)고객주소블록코드(BLOCK_CD)성별(GEDNER)연령대별(AGE)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
서울시민업종코드(UPJONG_CD)1.0000.0000.3390.0000.2060.0000.632
기준년월(YM)0.0001.0000.0000.0660.0040.0000.000
고객주소블록코드(BLOCK_CD)0.3390.0001.0000.1320.0000.0720.000
성별(GEDNER)0.0000.0660.1321.0000.0280.0000.000
연령대별(AGE)0.2060.0040.0000.0281.0000.1620.000
카드이용금액계(AMT_CORR)0.0000.0000.0720.0000.1621.0000.000
카드이용건수계(USECT_CORR)0.6320.0000.0000.0000.0000.0001.000
2023-12-10T23:58:34.013320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
성별(GEDNER)연령대별(AGE)
성별(GEDNER)1.0000.030
연령대별(AGE)0.0301.000
2023-12-10T23:58:34.198309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
기준년월(YM)고객주소블록코드(BLOCK_CD)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)성별(GEDNER)연령대별(AGE)
기준년월(YM)1.000-0.0420.075-0.0620.0330.016
고객주소블록코드(BLOCK_CD)-0.0421.000-0.0040.1030.1000.000
카드이용금액계(AMT_CORR)0.075-0.0041.0000.0100.0000.097
카드이용건수계(USECT_CORR)-0.0620.1030.0101.0000.0000.000
성별(GEDNER)0.0330.1000.0000.0001.0000.030
연령대별(AGE)0.0160.0000.0970.0000.0301.000

Missing values

2023-12-10T23:58:27.808594image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:58:28.056657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

서울시민업종코드(UPJONG_CD)기준년월(YM)고객주소블록코드(BLOCK_CD)성별(GEDNER)연령대별(AGE)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
0SS01320190614037F50대4828865
1SS048201608156830M30대13279210
2SS01620200932925M50대10060025
3SS001201805214245F30대21387640
4SS044201705279472M30대1559305
5SS048201911220797M40대2012015
6SS058202006152728F40대28520191
7SS02020171183127F60대1509015
8SS007201808366838M30대1979315
9SS00520160115146M50대1559305
서울시민업종코드(UPJONG_CD)기준년월(YM)고객주소블록코드(BLOCK_CD)성별(GEDNER)연령대별(AGE)카드이용금액계(AMT_CORR)카드이용건수계(USECT_CORR)
490SS01520200717960M20대7298535
491SS04420190817038F30대8043475
492SS06920190724110F20대150905
493SS007201905227869F30대3420405
494SS06820171025185F60대603605
495SS01220170911449F30대7545045
496SS044201707353037F30대11870825
497SS054201901418149F20대105630015
498SS021202009269015F70대이상251505
499SS00320181228101M40대5690945