Overview

Dataset statistics

Number of variables9
Number of observations500
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory37.7 KiB
Average record size in memory77.3 B

Variable types

Categorical6
Numeric3

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=4

Alerts

섹터코드(SECTOR_CD) has constant value ""Constant
매출건수합(SUM_COUNT) is highly imbalanced (76.3%)Imbalance

Reproduction

Analysis started2023-12-10 14:53:24.278482
Analysis finished2023-12-10 14:53:26.374195
Duration2.1 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

섹터코드(SECTOR_CD)
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
1
500 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 500
100.0%

Length

2023-12-10T23:53:26.455450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:26.585459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 500
100.0%

년월일(DATE)
Real number (ℝ)

Distinct33
Distinct (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20151086
Minimum20151022
Maximum20151123
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:53:26.716618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20151022
5-th percentile20151023
Q120151030
median20151106
Q320151115
95-th percentile20151121
Maximum20151123
Range101
Interquartile range (IQR)85

Descriptive statistics

Standard deviation39.582779
Coefficient of variation (CV)1.9643 × 10-6
Kurtosis-1.2570635
Mean20151086
Median Absolute Deviation (MAD)11
Skewness-0.80420889
Sum1.0075543 × 1010
Variance1566.7964
MonotonicityNot monotonic
2023-12-10T23:53:26.909849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
20151102 23
 
4.6%
20151030 21
 
4.2%
20151105 21
 
4.2%
20151117 21
 
4.2%
20151103 19
 
3.8%
20151026 19
 
3.8%
20151120 19
 
3.8%
20151111 19
 
3.8%
20151121 19
 
3.8%
20151113 18
 
3.6%
Other values (23) 301
60.2%
ValueCountFrequency (%)
20151022 18
3.6%
20151023 15
3.0%
20151024 11
2.2%
20151025 6
 
1.2%
20151026 19
3.8%
20151027 18
3.6%
20151028 15
3.0%
20151029 14
2.8%
20151030 21
4.2%
20151031 14
2.8%
ValueCountFrequency (%)
20151123 16
3.2%
20151122 9
1.8%
20151121 19
3.8%
20151120 19
3.8%
20151119 16
3.2%
20151118 12
2.4%
20151117 21
4.2%
20151116 12
2.4%
20151115 8
 
1.6%
20151114 13
2.6%
Distinct32
Distinct (%)6.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
편의점
85 
한식음식점
78 
커피음료
69 
패스트푸드점
50 
슈퍼마켓
25 
Other values (27)
193 

Length

Max length7
Median length6
Mean length4.074
Min length2

Unique

Unique10 ?
Unique (%)2.0%

Sample

1st row한식음식점
2nd row제과점
3rd row분식집
4th row중국집
5th row한식음식점

Common Values

ValueCountFrequency (%)
편의점 85
17.0%
한식음식점 78
15.6%
커피음료 69
13.8%
패스트푸드점 50
10.0%
슈퍼마켓 25
 
5.0%
외식업기타 25
 
5.0%
제과점 22
 
4.4%
분식집 19
 
3.8%
호프간이주점 19
 
3.8%
주차장 16
 
3.2%
Other values (22) 92
18.4%

Length

2023-12-10T23:53:27.104295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
편의점 85
17.0%
한식음식점 78
15.6%
커피음료 69
13.8%
패스트푸드점 50
10.0%
슈퍼마켓 25
 
5.0%
외식업기타 25
 
5.0%
제과점 22
 
4.4%
분식집 19
 
3.8%
호프간이주점 19
 
3.8%
주차장 16
 
3.2%
Other values (22) 92
18.4%
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
M
258 
F
242 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M 258
51.6%
F 242
48.4%

Length

2023-12-10T23:53:27.263180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:27.422071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 258
51.6%
f 242
48.4%
Distinct9
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
20대후반
92 
30대초반
84 
20대초반이하
77 
30대후반
75 
40대초반
60 
Other values (4)
112 

Length

Max length7
Median length5
Mean length5.308
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row20대후반
2nd row20대초반이하
3rd row30대후반
4th row30대초반
5th row20대초반이하

Common Values

ValueCountFrequency (%)
20대후반 92
18.4%
30대초반 84
16.8%
20대초반이하 77
15.4%
30대후반 75
15.0%
40대초반 60
12.0%
40대후반 48
9.6%
50대초반 27
 
5.4%
60대이상 22
 
4.4%
50대후반 15
 
3.0%

Length

2023-12-10T23:53:27.572811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:27.761676image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20대후반 92
18.4%
30대초반 84
16.8%
20대초반이하 77
15.4%
30대후반 75
15.0%
40대초반 60
12.0%
40대후반 48
9.6%
50대초반 27
 
5.4%
60대이상 22
 
4.4%
50대후반 15
 
3.0%
Distinct219
Distinct (%)43.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7355324.7
Minimum26320
Maximum11740110
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:53:27.939693image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum26320
5-th percentile28245
Q141440
median11260102
Q311500104
95-th percentile11710102
Maximum11740110
Range11713790
Interquartile range (IQR)11458664

Descriptive statistics

Standard deviation5471235.3
Coefficient of variation (CV)0.74384688
Kurtosis-1.6530753
Mean7355324.7
Median Absolute Deviation (MAD)374999
Skewness-0.59169431
Sum3.6776623 × 109
Variance2.9934415 × 1013
MonotonicityNot monotonic
2023-12-10T23:53:28.128146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
41281 12
 
2.4%
41135 10
 
2.0%
41465 9
 
1.8%
11650107 9
 
1.8%
41287 8
 
1.6%
11500103 8
 
1.6%
28260 7
 
1.4%
41463 7
 
1.4%
11410111 7
 
1.4%
28237 6
 
1.2%
Other values (209) 417
83.4%
ValueCountFrequency (%)
26320 2
 
0.4%
26500 1
 
0.2%
27110 2
 
0.4%
27140 1
 
0.2%
27230 1
 
0.2%
27290 1
 
0.2%
28170 2
 
0.4%
28185 2
 
0.4%
28200 2
 
0.4%
28237 6
1.2%
ValueCountFrequency (%)
11740110 1
 
0.2%
11740109 4
0.8%
11740108 2
0.4%
11740106 2
0.4%
11740105 1
 
0.2%
11740103 1
 
0.2%
11710114 1
 
0.2%
11710111 2
0.4%
11710108 2
0.4%
11710107 3
0.6%

시간대(TIME)
Categorical

Distinct6
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
02:점심
156 
04:저녁
133 
03:오후
88 
01:오전
80 
05:늦은저녁
33 

Length

Max length7
Median length5
Mean length5.132
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row04:저녁
2nd row02:점심
3rd row02:점심
4th row03:오후
5th row01:오전

Common Values

ValueCountFrequency (%)
02:점심 156
31.2%
04:저녁 133
26.6%
03:오후 88
17.6%
01:오전 80
16.0%
05:늦은저녁 33
 
6.6%
06:심야 10
 
2.0%

Length

2023-12-10T23:53:28.342308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:28.527712image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
02:점심 156
31.2%
04:저녁 133
26.6%
03:오후 88
17.6%
01:오전 80
16.0%
05:늦은저녁 33
 
6.6%
06:심야 10
 
2.0%
Distinct207
Distinct (%)41.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20308.29
Minimum670
Maximum635200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.5 KiB
2023-12-10T23:53:28.699549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum670
5-th percentile1998
Q14500
median7200
Q318000
95-th percentile64010
Maximum635200
Range634530
Interquartile range (IQR)13500

Descriptive statistics

Standard deviation50648.386
Coefficient of variation (CV)2.4939759
Kurtosis76.434631
Mean20308.29
Median Absolute Deviation (MAD)4300
Skewness7.9702541
Sum10154145
Variance2.565259 × 109
MonotonicityNot monotonic
2023-12-10T23:53:28.883833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4500 18
 
3.6%
6000 18
 
3.6%
7000 15
 
3.0%
2500 15
 
3.0%
8000 12
 
2.4%
3000 11
 
2.2%
9000 11
 
2.2%
5500 11
 
2.2%
15000 11
 
2.2%
4000 10
 
2.0%
Other values (197) 368
73.6%
ValueCountFrequency (%)
670 1
 
0.2%
800 2
 
0.4%
900 1
 
0.2%
1000 5
1.0%
1100 1
 
0.2%
1200 3
0.6%
1350 1
 
0.2%
1400 3
0.6%
1500 3
0.6%
1700 1
 
0.2%
ValueCountFrequency (%)
635200 1
0.2%
500000 1
0.2%
435600 1
0.2%
412000 1
0.2%
298000 1
0.2%
196000 1
0.2%
191000 1
0.2%
150000 1
0.2%
119900 1
0.2%
116000 1
0.2%

매출건수합(SUM_COUNT)
Categorical

IMBALANCE 

Distinct4
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size4.0 KiB
1
458 
2
 
36
3
 
5
4
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique1 ?
Unique (%)0.2%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 458
91.6%
2 36
 
7.2%
3 5
 
1.0%
4 1
 
0.2%

Length

2023-12-10T23:53:29.030295image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T23:53:29.169677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 458
91.6%
2 36
 
7.2%
3 5
 
1.0%
4 1
 
0.2%

Interactions

2023-12-10T23:53:25.646293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:24.865690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.286733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.775803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:24.997289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.410323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.888857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.123238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T23:53:25.523023image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T23:53:29.255915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년월일(DATE)업종코드(C_CD)성별코드(GEN_CD)연령대별코드(AGE_CD)유입지코드(INFLOW_CD)시간대(TIME)매출금액합(SUM_MONEY)매출건수합(SUM_COUNT)
년월일(DATE)1.0000.1230.0000.0000.0000.0500.0000.082
업종코드(C_CD)0.1231.0000.1130.2430.1250.0000.4990.123
성별코드(GEN_CD)0.0000.1131.0000.0000.0000.0760.0000.000
연령대별코드(AGE_CD)0.0000.2430.0001.0000.0840.0000.1520.000
유입지코드(INFLOW_CD)0.0000.1250.0000.0841.0000.1220.0090.000
시간대(TIME)0.0500.0000.0760.0000.1221.0000.0000.000
매출금액합(SUM_MONEY)0.0000.4990.0000.1520.0090.0001.0000.000
매출건수합(SUM_COUNT)0.0820.1230.0000.0000.0000.0000.0001.000
2023-12-10T23:53:29.433167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
연령대별코드(AGE_CD)매출건수합(SUM_COUNT)업종코드(C_CD)성별코드(GEN_CD)시간대(TIME)
연령대별코드(AGE_CD)1.0000.0000.0890.0000.000
매출건수합(SUM_COUNT)0.0001.0000.0560.0000.000
업종코드(C_CD)0.0890.0561.0000.0870.000
성별코드(GEN_CD)0.0000.0000.0871.0000.054
시간대(TIME)0.0000.0000.0000.0541.000
2023-12-10T23:53:29.555327image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
년월일(DATE)유입지코드(INFLOW_CD)매출금액합(SUM_MONEY)업종코드(C_CD)성별코드(GEN_CD)연령대별코드(AGE_CD)시간대(TIME)매출건수합(SUM_COUNT)
년월일(DATE)1.000-0.0110.0190.0770.0000.0000.0520.018
유입지코드(INFLOW_CD)-0.0111.000-0.0030.1040.0000.0840.0860.000
매출금액합(SUM_MONEY)0.019-0.0031.0000.2090.0000.0740.0000.000
업종코드(C_CD)0.0770.1040.2091.0000.0870.0890.0000.056
성별코드(GEN_CD)0.0000.0000.0000.0871.0000.0000.0540.000
연령대별코드(AGE_CD)0.0000.0840.0740.0890.0001.0000.0000.000
시간대(TIME)0.0520.0860.0000.0000.0540.0001.0000.000
매출건수합(SUM_COUNT)0.0180.0000.0000.0560.0000.0000.0001.000

Missing values

2023-12-10T23:53:26.066817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:53:26.287950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

섹터코드(SECTOR_CD)년월일(DATE)업종코드(C_CD)성별코드(GEN_CD)연령대별코드(AGE_CD)유입지코드(INFLOW_CD)시간대(TIME)매출금액합(SUM_MONEY)매출건수합(SUM_COUNT)
0120151118한식음식점M20대후반1121510304:저녁98001
1120151118제과점F20대초반이하1135010502:점심70001
2120151110분식집F30대후반1117010102:점심310001
3120151121중국집M30대초반1129013303:오후620001
4120151116한식음식점M20대초반이하1117011901:오전40001
5120151024편의점M20대초반이하4111503:오후80001
6120151117편의점F20대초반이하4129001:오전65001
7120151119커피음료F20대후반4146302:점심289001
8120151121슈퍼마켓F20대초반이하1123010504:저녁20001
9120151122편의점F30대후반4128103:오후103501
섹터코드(SECTOR_CD)년월일(DATE)업종코드(C_CD)성별코드(GEN_CD)연령대별코드(AGE_CD)유입지코드(INFLOW_CD)시간대(TIME)매출금액합(SUM_MONEY)매출건수합(SUM_COUNT)
490120151115주차장M20대후반1117010102:점심630001
491120151102한식음식점M40대초반1171010303:오후1120001
492120151112한식음식점F40대후반1150010301:오전74001
493120151105커피음료F30대후반1117013004:저녁59001
494120151022분식집F60대이상4131003:오후48002
495120151123편의점M40대후반2826005:늦은저녁49001
496120151027한식음식점F30대후반4420004:저녁200002
497120151029커피음료M40대초반4611004:저녁890001
498120151111편의점F20대후반2817004:저녁36001
499120151121슈퍼마켓F20대초반이하2826003:오후83001