Dataset statistics
Number of variables | 9 |
---|---|
Number of observations | 500 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 37.7 KiB |
Average record size in memory | 77.3 B |
Variable types
Categorical | 6 |
---|---|
Numeric | 3 |
Dataset
Description | 샘플 데이터 |
---|---|
Author | 신한카드 |
URL | https://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=4 |
섹터코드(SECTOR_CD) has constant value "" | Constant |
매출건수합(SUM_COUNT) is highly imbalanced (73.4%) | Imbalance |
Reproduction
Analysis started | 2023-12-10 14:53:30.544661 |
---|---|
Analysis finished | 2023-12-10 14:53:32.808612 |
Duration | 2.26 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
섹터코드(SECTOR_CD)
Categorical
CONSTANT
 
Distinct | 1 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
1 |
---|
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 1 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 500 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 500 |
년월일(DATE)
Real number (ℝ)
Distinct | 47 |
---|---|
Distinct (%) | 9.4% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 20151109 |
Minimum | 20151022 |
---|---|
Maximum | 20151207 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 20151022 |
---|---|
5-th percentile | 20151023 |
Q1 | 20151101 |
median | 20151112 |
Q3 | 20151125 |
95-th percentile | 20151205 |
Maximum | 20151207 |
Range | 185 |
Interquartile range (IQR) | 24 |
Descriptive statistics
Standard deviation | 54.719055 |
---|---|
Coefficient of variation (CV) | 2.7154364 × 10-6 |
Kurtosis | -0.40562706 |
Mean | 20151109 |
Median Absolute Deviation (MAD) | 11.5 |
Skewness | 0.052677185 |
Sum | 1.0075554 × 1010 |
Variance | 2994.175 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
20151103 | 20 | 4.0% |
20151022 | 17 | 3.4% |
20151201 | 17 | 3.4% |
20151027 | 17 | 3.4% |
20151119 | 15 | 3.0% |
20151029 | 15 | 3.0% |
20151023 | 15 | 3.0% |
20151106 | 15 | 3.0% |
20151127 | 15 | 3.0% |
20151101 | 14 | 2.8% |
Other values (37) | 340 |
Value | Count | Frequency (%) |
20151022 | 17 | |
20151023 | 15 | |
20151024 | 7 | |
20151025 | 4 | 0.8% |
20151026 | 9 | |
20151027 | 17 | |
20151028 | 11 | |
20151029 | 15 | |
20151030 | 12 | |
20151031 | 5 | 1.0% |
Value | Count | Frequency (%) |
20151207 | 11 | |
20151206 | 8 | |
20151205 | 13 | |
20151204 | 11 | |
20151203 | 8 | |
20151202 | 9 | |
20151201 | 17 | |
20151130 | 6 | 1.2% |
20151129 | 6 | 1.2% |
20151128 | 8 |
업종코드(C_CD)
Categorical
Distinct | 35 |
---|---|
Distinct (%) | 7.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
한식음식점 | |
---|---|
편의점 | |
커피음료 | |
외식업기타 | |
슈퍼마켓 | |
Other values (30) |
Length
Max length | 7 |
---|---|
Median length | 6 |
Mean length | 4.028 |
Min length | 2 |
Unique
Unique | 9 ? |
---|---|
Unique (%) | 1.8% |
Sample
1st row | 한식음식점 |
---|---|
2nd row | 커피음료 |
3rd row | 커피음료 |
4th row | 편의점 |
5th row | 커피음료 |
Common Values
Value | Count | Frequency (%) |
한식음식점 | 91 | |
편의점 | 84 | |
커피음료 | 63 | |
외식업기타 | 42 | |
슈퍼마켓 | 29 | 5.8% |
호프간이주점 | 22 | 4.4% |
제과점 | 20 | 4.0% |
패스트푸드점 | 18 | 3.6% |
약국 | 18 | 3.6% |
일반의원 | 16 | 3.2% |
Other values (25) | 97 |
Length
Value | Count | Frequency (%) |
한식음식점 | 91 | |
편의점 | 84 | |
커피음료 | 63 | |
외식업기타 | 42 | |
슈퍼마켓 | 29 | 5.8% |
호프간이주점 | 22 | 4.4% |
제과점 | 20 | 4.0% |
약국 | 18 | 3.6% |
패스트푸드점 | 18 | 3.6% |
일반의원 | 16 | 3.2% |
Other values (25) | 97 |
성별코드(GEN_CD)
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
M | |
---|---|
F |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | F |
---|---|
2nd row | F |
3rd row | M |
4th row | M |
5th row | F |
Common Values
Value | Count | Frequency (%) |
M | 285 | |
F | 215 |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
m | 285 | |
f | 215 |
연령대별코드(AGE_CD)
Categorical
Distinct | 9 |
---|---|
Distinct (%) | 1.8% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
20대후반 | |
---|---|
20대초반이하 | |
30대초반 | |
30대후반 | |
40대후반 | |
Other values (4) |
Length
Max length | 7 |
---|---|
Median length | 5 |
Mean length | 5.392 |
Min length | 5 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 20대초반이하 |
---|---|
2nd row | 50대초반 |
3rd row | 20대초반이하 |
4th row | 40대후반 |
5th row | 20대후반 |
Common Values
Value | Count | Frequency (%) |
20대후반 | 110 | |
20대초반이하 | 98 | |
30대초반 | 77 | |
30대후반 | 57 | |
40대후반 | 50 | |
40대초반 | 45 | |
50대초반 | 32 | 6.4% |
60대이상 | 17 | 3.4% |
50대후반 | 14 | 2.8% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
20대후반 | 110 | |
20대초반이하 | 98 | |
30대초반 | 77 | |
30대후반 | 57 | |
40대후반 | 50 | |
40대초반 | 45 | |
50대초반 | 32 | 6.4% |
60대이상 | 17 | 3.4% |
50대후반 | 14 | 2.8% |
유입지코드(INFLOW_CD)
Real number (ℝ)
Distinct | 248 |
---|---|
Distinct (%) | 49.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 7368269.8 |
Minimum | 26260 |
---|---|
Maximum | 11740110 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 26260 |
---|---|
5-th percentile | 28237 |
Q1 | 41410 |
median | 11230106 |
Q3 | 11500102 |
95-th percentile | 11710108 |
Maximum | 11740110 |
Range | 11713850 |
Interquartile range (IQR) | 11458692 |
Descriptive statistics
Standard deviation | 5457130.7 |
---|---|
Coefficient of variation (CV) | 0.7406258 |
Kurtosis | -1.6420943 |
Mean | 7368269.8 |
Median Absolute Deviation (MAD) | 389996.5 |
Skewness | -0.60058829 |
Sum | 3.6841349 × 109 |
Variance | 2.9780276 × 1013 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
41135 | 8 | 1.6% |
11500103 | 8 | 1.6% |
41173 | 8 | 1.6% |
41287 | 6 | 1.2% |
28200 | 6 | 1.2% |
41199 | 6 | 1.2% |
41210 | 6 | 1.2% |
41285 | 6 | 1.2% |
11710111 | 6 | 1.2% |
11710101 | 5 | 1.0% |
Other values (238) | 435 |
Value | Count | Frequency (%) |
26260 | 1 | 0.2% |
26320 | 2 | 0.4% |
26350 | 2 | 0.4% |
26380 | 1 | 0.2% |
26410 | 1 | 0.2% |
27290 | 1 | 0.2% |
28140 | 1 | 0.2% |
28170 | 3 | |
28185 | 3 | |
28200 | 6 |
Value | Count | Frequency (%) |
11740110 | 2 | |
11740109 | 1 | 0.2% |
11740108 | 2 | |
11740107 | 1 | 0.2% |
11740106 | 2 | |
11740105 | 2 | |
11740103 | 1 | 0.2% |
11740101 | 3 | |
11710113 | 1 | 0.2% |
11710112 | 4 |
시간대(TIME)
Categorical
Distinct | 6 |
---|---|
Distinct (%) | 1.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
04:저녁 | |
---|---|
02:점심 | |
01:오전 | |
03:오후 | |
05:늦은저녁 |
Length
Max length | 7 |
---|---|
Median length | 5 |
Mean length | 5.276 |
Min length | 5 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 01:오전 |
---|---|
2nd row | 04:저녁 |
3rd row | 01:오전 |
4th row | 01:오전 |
5th row | 02:점심 |
Common Values
Value | Count | Frequency (%) |
04:저녁 | 135 | |
02:점심 | 126 | |
01:오전 | 79 | |
03:오후 | 75 | |
05:늦은저녁 | 69 | |
06:심야 | 16 | 3.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
04:저녁 | 135 | |
02:점심 | 126 | |
01:오전 | 79 | |
03:오후 | 75 | |
05:늦은저녁 | 69 | |
06:심야 | 16 | 3.2% |
매출금액합(SUM_MONEY)
Real number (ℝ)
Distinct | 234 |
---|---|
Distinct (%) | 46.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 21437.884 |
Minimum | 840 |
---|---|
Maximum | 1042000 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 4.5 KiB |
Quantile statistics
Minimum | 840 |
---|---|
5-th percentile | 1547.5 |
Q1 | 4400 |
median | 7850 |
Q3 | 20000 |
95-th percentile | 65171 |
Maximum | 1042000 |
Range | 1041160 |
Interquartile range (IQR) | 15600 |
Descriptive statistics
Standard deviation | 60601.339 |
---|---|
Coefficient of variation (CV) | 2.826834 |
Kurtosis | 172.73607 |
Mean | 21437.884 |
Median Absolute Deviation (MAD) | 4850 |
Skewness | 11.540924 |
Sum | 10718942 |
Variance | 3.6725223 × 109 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
4500 | 19 | 3.8% |
6000 | 14 | 2.8% |
8000 | 12 | 2.4% |
5000 | 11 | 2.2% |
3500 | 9 | 1.8% |
2000 | 9 | 1.8% |
20000 | 9 | 1.8% |
9000 | 9 | 1.8% |
6500 | 9 | 1.8% |
7000 | 9 | 1.8% |
Other values (224) | 390 |
Value | Count | Frequency (%) |
840 | 1 | 0.2% |
900 | 5 | |
1000 | 2 | 0.4% |
1020 | 1 | 0.2% |
1100 | 1 | 0.2% |
1200 | 2 | 0.4% |
1250 | 1 | 0.2% |
1280 | 1 | 0.2% |
1300 | 3 | |
1350 | 1 | 0.2% |
Value | Count | Frequency (%) |
1042000 | 1 | |
495000 | 1 | |
426700 | 1 | |
208800 | 1 | |
201580 | 1 | |
198100 | 1 | |
197800 | 1 | |
194500 | 1 | |
182000 | 1 | |
140000 | 1 |
매출건수합(SUM_COUNT)
Categorical
IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | 1.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.0 KiB |
1 | |
---|---|
2 | |
3 | 8 |
5 | 2 |
4 | 1 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 1 ? |
---|---|
Unique (%) | 0.2% |
Sample
1st row | 1 |
---|---|
2nd row | 1 |
3rd row | 2 |
4th row | 1 |
5th row | 1 |
Common Values
Value | Count | Frequency (%) |
1 | 443 | |
2 | 46 | 9.2% |
3 | 8 | 1.6% |
5 | 2 | 0.4% |
4 | 1 | 0.2% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
1 | 443 | |
2 | 46 | 9.2% |
3 | 8 | 1.6% |
5 | 2 | 0.4% |
4 | 1 | 0.2% |
년월일(DATE) | 업종코드(C_CD) | 성별코드(GEN_CD) | 연령대별코드(AGE_CD) | 유입지코드(INFLOW_CD) | 시간대(TIME) | 매출금액합(SUM_MONEY) | 매출건수합(SUM_COUNT) | |
---|---|---|---|---|---|---|---|---|
년월일(DATE) | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.117 | 0.000 | 0.103 |
업종코드(C_CD) | 0.000 | 1.000 | 0.063 | 0.000 | 0.157 | 0.000 | 0.115 | 0.277 |
성별코드(GEN_CD) | 0.000 | 0.063 | 1.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.060 |
연령대별코드(AGE_CD) | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 | 0.000 | 0.130 | 0.162 |
유입지코드(INFLOW_CD) | 0.000 | 0.157 | 0.000 | 0.000 | 1.000 | 0.201 | 0.000 | 0.007 |
시간대(TIME) | 0.117 | 0.000 | 0.000 | 0.000 | 0.201 | 1.000 | 0.000 | 0.000 |
매출금액합(SUM_MONEY) | 0.000 | 0.115 | 0.000 | 0.130 | 0.000 | 0.000 | 1.000 | 0.000 |
매출건수합(SUM_COUNT) | 0.103 | 0.277 | 0.060 | 0.162 | 0.007 | 0.000 | 0.000 | 1.000 |
연령대별코드(AGE_CD) | 매출건수합(SUM_COUNT) | 업종코드(C_CD) | 성별코드(GEN_CD) | 시간대(TIME) | |
---|---|---|---|---|---|
연령대별코드(AGE_CD) | 1.000 | 0.093 | 0.000 | 0.000 | 0.000 |
매출건수합(SUM_COUNT) | 0.093 | 1.000 | 0.119 | 0.073 | 0.000 |
업종코드(C_CD) | 0.000 | 0.119 | 1.000 | 0.050 | 0.000 |
성별코드(GEN_CD) | 0.000 | 0.073 | 0.050 | 1.000 | 0.000 |
시간대(TIME) | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
년월일(DATE) | 유입지코드(INFLOW_CD) | 매출금액합(SUM_MONEY) | 업종코드(C_CD) | 성별코드(GEN_CD) | 연령대별코드(AGE_CD) | 시간대(TIME) | 매출건수합(SUM_COUNT) | |
---|---|---|---|---|---|---|---|---|
년월일(DATE) | 1.000 | 0.034 | -0.055 | 0.000 | 0.000 | 0.000 | 0.080 | 0.087 |
유입지코드(INFLOW_CD) | 0.034 | 1.000 | -0.086 | 0.125 | 0.000 | 0.000 | 0.141 | 0.000 |
매출금액합(SUM_MONEY) | -0.055 | -0.086 | 1.000 | 0.052 | 0.000 | 0.083 | 0.000 | 0.000 |
업종코드(C_CD) | 0.000 | 0.125 | 0.052 | 1.000 | 0.050 | 0.000 | 0.000 | 0.119 |
성별코드(GEN_CD) | 0.000 | 0.000 | 0.000 | 0.050 | 1.000 | 0.000 | 0.000 | 0.073 |
연령대별코드(AGE_CD) | 0.000 | 0.000 | 0.083 | 0.000 | 0.000 | 1.000 | 0.000 | 0.093 |
시간대(TIME) | 0.080 | 0.141 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 | 0.000 |
매출건수합(SUM_COUNT) | 0.087 | 0.000 | 0.000 | 0.119 | 0.073 | 0.093 | 0.000 | 1.000 |
섹터코드(SECTOR_CD) | 년월일(DATE) | 업종코드(C_CD) | 성별코드(GEN_CD) | 연령대별코드(AGE_CD) | 유입지코드(INFLOW_CD) | 시간대(TIME) | 매출금액합(SUM_MONEY) | 매출건수합(SUM_COUNT) | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 20151118 | 한식음식점 | F | 20대초반이하 | 11710111 | 01:오전 | 3400 | 1 |
1 | 1 | 20151023 | 커피음료 | F | 50대초반 | 41131 | 04:저녁 | 11300 | 1 |
2 | 1 | 20151128 | 커피음료 | M | 20대초반이하 | 41610 | 01:오전 | 10800 | 2 |
3 | 1 | 20151201 | 편의점 | M | 40대후반 | 11680105 | 01:오전 | 21000 | 1 |
4 | 1 | 20151027 | 커피음료 | F | 20대후반 | 45140 | 02:점심 | 8000 | 1 |
5 | 1 | 20151118 | 의류점 | F | 20대초반이하 | 26350 | 05:늦은저녁 | 5000 | 1 |
6 | 1 | 20151104 | 한식음식점 | F | 40대후반 | 28185 | 03:오후 | 20000 | 1 |
7 | 1 | 20151104 | 약국 | M | 20대초반이하 | 28237 | 04:저녁 | 9000 | 1 |
8 | 1 | 20151101 | 편의점 | F | 20대초반이하 | 11350104 | 01:오전 | 20000 | 1 |
9 | 1 | 20151110 | 문구점 | M | 40대후반 | 29155 | 05:늦은저녁 | 7700 | 1 |
섹터코드(SECTOR_CD) | 년월일(DATE) | 업종코드(C_CD) | 성별코드(GEN_CD) | 연령대별코드(AGE_CD) | 유입지코드(INFLOW_CD) | 시간대(TIME) | 매출금액합(SUM_MONEY) | 매출건수합(SUM_COUNT) | |
---|---|---|---|---|---|---|---|---|---|
490 | 1 | 20151106 | 한식음식점 | F | 20대초반이하 | 28170 | 02:점심 | 40500 | 1 |
491 | 1 | 20151111 | 분식집 | M | 30대후반 | 11380109 | 02:점심 | 16000 | 1 |
492 | 1 | 20151028 | 슈퍼마켓 | M | 30대초반 | 11140110 | 04:저녁 | 25000 | 1 |
493 | 1 | 20151120 | 당구장 | F | 30대초반 | 11710101 | 02:점심 | 5500 | 2 |
494 | 1 | 20151116 | 패션잡화 | M | 50대초반 | 11170130 | 06:심야 | 4500 | 2 |
495 | 1 | 20151130 | 의류점 | F | 30대초반 | 28140 | 04:저녁 | 22600 | 2 |
496 | 1 | 20151025 | 호프간이주점 | F | 50대후반 | 11215105 | 01:오전 | 426700 | 2 |
497 | 1 | 20151112 | 커피음료 | F | 30대후반 | 11560133 | 04:저녁 | 4900 | 1 |
498 | 1 | 20151116 | 한식음식점 | F | 30대초반 | 11230110 | 04:저녁 | 5500 | 2 |
499 | 1 | 20151029 | 약국 | F | 40대후반 | 11440127 | 02:점심 | 17000 | 1 |