Overview

Dataset statistics

Number of variables11
Number of observations400
Missing cells400
Missing cells (%)9.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory36.8 KiB
Average record size in memory94.3 B

Variable types

Categorical6
Unsupported1
Numeric3
Text1

Dataset

DescriptionSample
Author소상공인연합회
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KFMZEROSTT004

Alerts

소상공인결제분류코드 has constant value ""Constant
년월 has constant value ""Constant
광역시도코드 has constant value ""Constant
광역시도명 has constant value ""Constant
소상공인시스템로그일시 has constant value ""Constant
결제건수 is highly overall correlated with 합계금액High correlation
합계금액 is highly overall correlated with 결제건수High correlation
표준산업업종상세분류코드 is highly overall correlated with 표준산업업종대분류코드High correlation
표준산업업종대분류코드 is highly overall correlated with 표준산업업종상세분류코드High correlation
표준산업업종대분류코드 is highly imbalanced (51.3%)Imbalance
소상공인시스템로그ID has 400 (100.0%) missing valuesMissing
표준산업업종상세분류코드 has unique valuesUnique
표준산업업종상세분류명 has unique valuesUnique
소상공인시스템로그ID is an unsupported type, check if it needs cleaning or further analysisUnsupported
결제건수 has 273 (68.2%) zerosZeros
합계금액 has 273 (68.2%) zerosZeros

Reproduction

Analysis started2023-12-10 06:25:59.358135
Analysis finished2023-12-10 06:26:01.871494
Duration2.51 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

소상공인결제분류코드
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
ZEROP48000
400 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowZEROP48000
2nd rowZEROP48000
3rd rowZEROP48000
4th rowZEROP48000
5th rowZEROP48000

Common Values

ValueCountFrequency (%)
ZEROP48000 400
100.0%

Length

2023-12-10T15:26:01.977075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:02.149508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
zerop48000 400
100.0%

년월
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
202008
400 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202008
2nd row202008
3rd row202008
4th row202008
5th row202008

Common Values

ValueCountFrequency (%)
202008 400
100.0%

Length

2023-12-10T15:26:02.349853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:02.892428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202008 400
100.0%

소상공인시스템로그ID
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing400
Missing (%)100.0%
Memory size3.6 KiB

광역시도코드
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
48
400 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row48
2nd row48
3rd row48
4th row48
5th row48

Common Values

ValueCountFrequency (%)
48 400
100.0%

Length

2023-12-10T15:26:03.097913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:03.280817image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
48 400
100.0%

광역시도명
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
경상남도
400 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row경상남도
2nd row경상남도
3rd row경상남도
4th row경상남도
5th row경상남도

Common Values

ValueCountFrequency (%)
경상남도 400
100.0%

Length

2023-12-10T15:26:03.464943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:03.640898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
경상남도 400
100.0%

결제건수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct50
Distinct (%)12.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean19.685
Minimum0
Maximum1116
Zeros273
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size3.6 KiB
2023-12-10T15:26:03.863950image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile84.2
Maximum1116
Range1116
Interquartile range (IQR)2

Descriptive statistics

Standard deviation96.917792
Coefficient of variation (CV)4.9234337
Kurtosis67.755351
Mean19.685
Median Absolute Deviation (MAD)0
Skewness7.7285232
Sum7874
Variance9393.0584
MonotonicityNot monotonic
2023-12-10T15:26:04.176409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 273
68.2%
1 18
 
4.5%
2 16
 
4.0%
3 10
 
2.5%
6 7
 
1.8%
11 7
 
1.8%
4 6
 
1.5%
5 4
 
1.0%
22 4
 
1.0%
8 4
 
1.0%
Other values (40) 51
 
12.8%
ValueCountFrequency (%)
0 273
68.2%
1 18
 
4.5%
2 16
 
4.0%
3 10
 
2.5%
4 6
 
1.5%
5 4
 
1.0%
6 7
 
1.8%
7 3
 
0.8%
8 4
 
1.0%
9 1
 
0.2%
ValueCountFrequency (%)
1116 1
0.2%
860 1
0.2%
731 1
0.2%
585 1
0.2%
519 1
0.2%
505 1
0.2%
436 1
0.2%
231 1
0.2%
217 1
0.2%
166 2
0.5%

합계금액
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct121
Distinct (%)30.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1254825.6
Minimum0
Maximum77606270
Zeros273
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size3.6 KiB
2023-12-10T15:26:04.483927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3111410
95-th percentile4093432.2
Maximum77606270
Range77606270
Interquartile range (IQR)111410

Descriptive statistics

Standard deviation6624895.6
Coefficient of variation (CV)5.2795348
Kurtosis89.132432
Mean1254825.6
Median Absolute Deviation (MAD)0
Skewness8.9436194
Sum5.0193026 × 108
Variance4.3889242 × 1013
MonotonicityNot monotonic
2023-12-10T15:26:04.869868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 273
68.2%
30000 3
 
0.8%
160000 2
 
0.5%
100000 2
 
0.5%
1020000 2
 
0.5%
40000 2
 
0.5%
15000 2
 
0.5%
77606270 1
 
0.2%
1960000 1
 
0.2%
2950000 1
 
0.2%
Other values (111) 111
27.8%
ValueCountFrequency (%)
0 273
68.2%
1000 1
 
0.2%
7500 1
 
0.2%
10000 1
 
0.2%
12000 1
 
0.2%
15000 2
 
0.5%
25000 1
 
0.2%
26500 1
 
0.2%
30000 3
 
0.8%
37900 1
 
0.2%
ValueCountFrequency (%)
77606270 1
0.2%
72746000 1
0.2%
52655200 1
0.2%
39250020 1
0.2%
24810950 1
0.2%
21898300 1
0.2%
15072580 1
0.2%
13459230 1
0.2%
12882300 1
0.2%
11013718 1
0.2%

표준산업업종대분류코드
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
C
305 
F
38 
A
 
27
G
 
19
E
 
8

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowA
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
C 305
76.2%
F 38
 
9.5%
A 27
 
6.8%
G 19
 
4.8%
E 8
 
2.0%
D 3
 
0.8%

Length

2023-12-10T15:26:05.134256image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:05.314243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
c 305
76.2%
f 38
 
9.5%
a 27
 
6.8%
g 19
 
4.8%
e 8
 
2.0%
d 3
 
0.8%

표준산업업종상세분류코드
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24124.443
Minimum1110
Maximum46202
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.6 KiB
2023-12-10T15:26:05.533340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1110
5-th percentile3057.45
Q114957.25
median25112.5
Q330577.75
95-th percentile42505
Maximum46202
Range45092
Interquartile range (IQR)15620.5

Descriptive statistics

Standard deviation11617.213
Coefficient of variation (CV)0.48155365
Kurtosis-0.61071012
Mean24124.443
Median Absolute Deviation (MAD)8190
Skewness9.608659 × 10-6
Sum9649777
Variance1.3495964 × 108
MonotonicityStrictly increasing
2023-12-10T15:26:05.798110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110 1
 
0.2%
28909 1
 
0.2%
29180 1
 
0.2%
29176 1
 
0.2%
29175 1
 
0.2%
29171 1
 
0.2%
29169 1
 
0.2%
29162 1
 
0.2%
29133 1
 
0.2%
29132 1
 
0.2%
Other values (390) 390
97.5%
ValueCountFrequency (%)
1110 1
0.2%
1121 1
0.2%
1122 1
0.2%
1123 1
0.2%
1131 1
0.2%
1132 1
0.2%
1140 1
0.2%
1151 1
0.2%
1152 1
0.2%
1159 1
0.2%
ValueCountFrequency (%)
46202 1
0.2%
46201 1
0.2%
46109 1
0.2%
46107 1
0.2%
46106 1
0.2%
46105 1
0.2%
46104 1
0.2%
46103 1
0.2%
46102 1
0.2%
46101 1
0.2%
Distinct400
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
2023-12-10T15:26:06.249453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length20
Mean length13.78
Min length3

Characters and Unicode

Total characters5512
Distinct characters323
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique400 ?
Unique (%)100.0%

Sample

1st row곡물 및 기타 식량작물 재배업
2nd row채소작물 재배업
3rd row화훼작물 재배업
4th row종자 및 묘목 생산업
5th row과실작물 재배업
ValueCountFrequency (%)
제조업 263
 
15.9%
181
 
10.9%
기타 102
 
6.1%
29
 
1.7%
28
 
1.7%
공사업 20
 
1.2%
유사 16
 
1.0%
금속 14
 
0.8%
기기 13
 
0.8%
처리업 12
 
0.7%
Other values (634) 981
59.1%
2023-12-10T15:26:06.977007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1259
22.8%
422
 
7.7%
354
 
6.4%
311
 
5.6%
211
 
3.8%
181
 
3.3%
109
 
2.0%
107
 
1.9%
101
 
1.8%
68
 
1.2%
Other values (313) 2389
43.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4246
77.0%
Space Separator 1259
 
22.8%
Decimal Number 3
 
0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
422
 
9.9%
354
 
8.3%
311
 
7.3%
211
 
5.0%
181
 
4.3%
109
 
2.6%
107
 
2.5%
101
 
2.4%
68
 
1.6%
64
 
1.5%
Other values (309) 2318
54.6%
Space Separator
ValueCountFrequency (%)
1259
100.0%
Decimal Number
ValueCountFrequency (%)
1 3
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 4246
77.0%
Common 1266
 
23.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
422
 
9.9%
354
 
8.3%
311
 
7.3%
211
 
5.0%
181
 
4.3%
109
 
2.6%
107
 
2.5%
101
 
2.4%
68
 
1.6%
64
 
1.5%
Other values (309) 2318
54.6%
Common
ValueCountFrequency (%)
1259
99.4%
1 3
 
0.2%
( 2
 
0.2%
) 2
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 4231
76.8%
ASCII 1266
 
23.0%
Compat Jamo 15
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1259
99.4%
1 3
 
0.2%
( 2
 
0.2%
) 2
 
0.2%
Hangul
ValueCountFrequency (%)
422
 
10.0%
354
 
8.4%
311
 
7.4%
211
 
5.0%
181
 
4.3%
109
 
2.6%
107
 
2.5%
101
 
2.4%
68
 
1.6%
64
 
1.5%
Other values (308) 2303
54.4%
Compat Jamo
ValueCountFrequency (%)
15
100.0%

소상공인시스템로그일시
Categorical

CONSTANT 

Distinct1
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
2020-10-21 12:28:44.0
400 

Length

Max length21
Median length21
Mean length21
Min length21

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-10-21 12:28:44.0
2nd row2020-10-21 12:28:44.0
3rd row2020-10-21 12:28:44.0
4th row2020-10-21 12:28:44.0
5th row2020-10-21 12:28:44.0

Common Values

ValueCountFrequency (%)
2020-10-21 12:28:44.0 400
100.0%

Length

2023-12-10T15:26:07.240437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:26:07.463309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-10-21 400
50.0%
12:28:44.0 400
50.0%

Interactions

2023-12-10T15:26:00.921577image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:25:59.911765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:00.433441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:01.062308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:00.088486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:00.602917image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:01.236457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:00.269475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:26:00.766544image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:26:07.569740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
결제건수합계금액표준산업업종대분류코드표준산업업종상세분류코드
결제건수1.0000.8460.1230.496
합계금액0.8461.0000.2450.222
표준산업업종대분류코드0.1230.2451.0000.863
표준산업업종상세분류코드0.4960.2220.8631.000
2023-12-10T15:26:07.790041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
결제건수합계금액표준산업업종상세분류코드표준산업업종대분류코드
결제건수1.0000.989-0.1770.060
합계금액0.9891.000-0.1470.148
표준산업업종상세분류코드-0.177-0.1471.0000.683
표준산업업종대분류코드0.0600.1480.6831.000

Missing values

2023-12-10T15:26:01.414493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:26:01.757309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

소상공인결제분류코드년월소상공인시스템로그ID광역시도코드광역시도명결제건수합계금액표준산업업종대분류코드표준산업업종상세분류코드표준산업업종상세분류명소상공인시스템로그일시
0ZEROP48000202008<NA>48경상남도16999000A1110곡물 및 기타 식량작물 재배업2020-10-21 12:28:44.0
1ZEROP48000202008<NA>48경상남도00A1121채소작물 재배업2020-10-21 12:28:44.0
2ZEROP48000202008<NA>48경상남도00A1122화훼작물 재배업2020-10-21 12:28:44.0
3ZEROP48000202008<NA>48경상남도115000A1123종자 및 묘목 생산업2020-10-21 12:28:44.0
4ZEROP48000202008<NA>48경상남도00A1131과실작물 재배업2020-10-21 12:28:44.0
5ZEROP48000202008<NA>48경상남도00A1132음료용 및 향신용 작물 재배업2020-10-21 12:28:44.0
6ZEROP48000202008<NA>48경상남도226500A1140기타 작물 재배업2020-10-21 12:28:44.0
7ZEROP48000202008<NA>48경상남도00A1151콩나물 재배업2020-10-21 12:28:44.0
8ZEROP48000202008<NA>48경상남도240000A1152채소화훼 및 과실작물 시설 재배업2020-10-21 12:28:44.0
9ZEROP48000202008<NA>48경상남도00A1159기타 시설작물 재배업2020-10-21 12:28:44.0
소상공인결제분류코드년월소상공인시스템로그ID광역시도코드광역시도명결제건수합계금액표준산업업종대분류코드표준산업업종상세분류코드표준산업업종상세분류명소상공인시스템로그일시
390ZEROP48000202008<NA>48경상남도5608100G46101산업용 농ㆍ축산물섬유 원료 및 동물 중개업2020-10-21 12:28:44.0
391ZEROP48000202008<NA>48경상남도70567090G46102음ㆍ식료품 및 담배 중개업2020-10-21 12:28:44.0
392ZEROP48000202008<NA>48경상남도111922100G46103섬유의복신발 및 가죽제품 중개업2020-10-21 12:28:44.0
393ZEROP48000202008<NA>48경상남도2365400G46104목재 및 건축자재 중개업2020-10-21 12:28:44.0
394ZEROP48000202008<NA>48경상남도00G46105연료광물1차 금속비료 및 화학제품 중개업2020-10-21 12:28:44.0
395ZEROP48000202008<NA>48경상남도00G46106기계 및 장비 중개업2020-10-21 12:28:44.0
396ZEROP48000202008<NA>48경상남도4351000G46107그 외 기타 특정 상품 중개업2020-10-21 12:28:44.0
397ZEROP48000202008<NA>48경상남도11625300G46109상품 종합 중개업2020-10-21 12:28:44.0
398ZEROP48000202008<NA>48경상남도58515072580G46201곡물 및 유지작물 도매업2020-10-21 12:28:44.0
399ZEROP48000202008<NA>48경상남도51913459230G46202종자 및 묘목 도매업2020-10-21 12:28:44.0