Overview

Dataset statistics

Number of variables8
Number of observations70
Missing cells93
Missing cells (%)16.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.9 KiB
Average record size in memory71.9 B

Variable types

Categorical2
Numeric4
Text1
Boolean1

Dataset

Description아임셀러 업체별 카테고리 정보를 제공합니다. 기준연도, 기준월, 업체카테고리, 카테고리 사용여부 등을 제공합니다.
Author(주)중소기업유통센터
URLhttps://www.data.go.kr/data/15067139/fileData.do

Alerts

기준연도 has constant value ""Constant
기준월 has constant value ""Constant
업체카테고리번호 is highly overall correlated with 상위업체카테고리번호 and 1 other fieldsHigh correlation
상위업체카테고리번호 is highly overall correlated with 업체카테고리번호High correlation
기준카테고리번호 is highly overall correlated with 업체카테고리번호 and 1 other fieldsHigh correlation
사용여부 is highly overall correlated with 기준카테고리번호High correlation
사용여부 is highly imbalanced (89.2%)Imbalance
상위업체카테고리번호 has 41 (58.6%) missing valuesMissing
기준카테고리번호 has 52 (74.3%) missing valuesMissing
업체카테고리번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:54:42.778028
Analysis finished2023-12-12 12:54:45.316713
Duration2.54 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

기준연도
Categorical

CONSTANT 

Distinct1
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size692.0 B
2020
70 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020
2nd row2020
3rd row2020
4th row2020
5th row2020

Common Values

ValueCountFrequency (%)
2020 70
100.0%

Length

2023-12-12T21:54:45.380938image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:54:45.472890image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020 70
100.0%

기준월
Categorical

CONSTANT 

Distinct1
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Memory size692.0 B
9
70 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9
2nd row9
3rd row9
4th row9
5th row9

Common Values

ValueCountFrequency (%)
9 70
100.0%

Length

2023-12-12T21:54:45.566750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:54:45.667881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
9 70
100.0%

업체번호
Real number (ℝ)

Distinct19
Distinct (%)27.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean985.57143
Minimum152
Maximum1652
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size762.0 B
2023-12-12T21:54:45.769833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum152
5-th percentile171
Q1556.75
median1234
Q31234
95-th percentile1459.1
Maximum1652
Range1500
Interquartile range (IQR)677.25

Descriptive statistics

Standard deviation441.80029
Coefficient of variation (CV)0.44826816
Kurtosis-0.63385659
Mean985.57143
Median Absolute Deviation (MAD)55
Skewness-0.93787704
Sum68990
Variance195187.49
MonotonicityNot monotonic
2023-12-12T21:54:45.902350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
1234 30
42.9%
1001 6
 
8.6%
171 6
 
8.6%
1284 5
 
7.1%
1174 4
 
5.7%
152 3
 
4.3%
283 2
 
2.9%
510 2
 
2.9%
1537 2
 
2.9%
442 1
 
1.4%
Other values (9) 9
 
12.9%
ValueCountFrequency (%)
152 3
4.3%
171 6
8.6%
253 1
 
1.4%
283 2
 
2.9%
368 1
 
1.4%
442 1
 
1.4%
466 1
 
1.4%
479 1
 
1.4%
510 2
 
2.9%
697 1
 
1.4%
ValueCountFrequency (%)
1652 1
 
1.4%
1537 2
 
2.9%
1469 1
 
1.4%
1447 1
 
1.4%
1433 1
 
1.4%
1284 5
 
7.1%
1234 30
42.9%
1174 4
 
5.7%
1001 6
 
8.6%
697 1
 
1.4%

업체카테고리번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct70
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.357143
Minimum1
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size762.0 B
2023-12-12T21:54:46.039778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4.45
Q118.25
median35.5
Q362.75
95-th percentile76.55
Maximum80
Range79
Interquartile range (IQR)44.5

Descriptive statistics

Standard deviation24.626506
Coefficient of variation (CV)0.62571885
Kurtosis-1.3657145
Mean39.357143
Median Absolute Deviation (MAD)22.5
Skewness0.13632963
Sum2755
Variance606.4648
MonotonicityNot monotonic
2023-12-12T21:54:46.183761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
1.4%
67 1
 
1.4%
38 1
 
1.4%
30 1
 
1.4%
29 1
 
1.4%
28 1
 
1.4%
10 1
 
1.4%
3 1
 
1.4%
63 1
 
1.4%
40 1
 
1.4%
Other values (60) 60
85.7%
ValueCountFrequency (%)
1 1
1.4%
2 1
1.4%
3 1
1.4%
4 1
1.4%
5 1
1.4%
6 1
1.4%
7 1
1.4%
8 1
1.4%
9 1
1.4%
10 1
1.4%
ValueCountFrequency (%)
80 1
1.4%
79 1
1.4%
78 1
1.4%
77 1
1.4%
76 1
1.4%
75 1
1.4%
74 1
1.4%
73 1
1.4%
72 1
1.4%
71 1
1.4%

상위업체카테고리번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct9
Distinct (%)31.0%
Missing41
Missing (%)58.6%
Infinite0
Infinite (%)0.0%
Mean29.241379
Minimum1
Maximum76
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size762.0 B
2023-12-12T21:54:46.288008image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14
Q114
median24
Q331
95-th percentile76
Maximum76
Range75
Interquartile range (IQR)17

Descriptive statistics

Standard deviation20.272527
Coefficient of variation (CV)0.69328219
Kurtosis1.8282228
Mean29.241379
Median Absolute Deviation (MAD)7
Skewness1.6217943
Sum848
Variance410.97537
MonotonicityNot monotonic
2023-12-12T21:54:46.374444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
14 7
 
10.0%
31 6
 
8.6%
23 4
 
5.7%
24 3
 
4.3%
76 3
 
4.3%
20 2
 
2.9%
28 2
 
2.9%
1 1
 
1.4%
75 1
 
1.4%
(Missing) 41
58.6%
ValueCountFrequency (%)
1 1
 
1.4%
14 7
10.0%
20 2
 
2.9%
23 4
5.7%
24 3
4.3%
28 2
 
2.9%
31 6
8.6%
75 1
 
1.4%
76 3
4.3%
ValueCountFrequency (%)
76 3
4.3%
75 1
 
1.4%
31 6
8.6%
28 2
 
2.9%
24 3
4.3%
23 4
5.7%
20 2
 
2.9%
14 7
10.0%
1 1
 
1.4%
Distinct69
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size692.0 B
2023-12-12T21:54:46.807755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length4.9
Min length2

Characters and Unicode

Total characters343
Distinct characters171
Distinct categories4 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique68 ?
Unique (%)97.1%

Sample

1st row장갑
2nd row제초기
3rd row통기타
4th row전두부
5th row캐릭터상품
ValueCountFrequency (%)
오리지날산양유 2
 
2.9%
가방 1
 
1.4%
팜핑/팜파티 1
 
1.4%
방문 1
 
1.4%
의약용품 1
 
1.4%
산양유화장품 1
 
1.4%
미용/건강/의약품 1
 
1.4%
건조대 1
 
1.4%
테스트 1
 
1.4%
장갑 1
 
1.4%
Other values (59) 59
84.3%
2023-12-12T21:54:47.381348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
16
 
4.7%
12
 
3.5%
11
 
3.2%
11
 
3.2%
10
 
2.9%
/ 10
 
2.9%
7
 
2.0%
r 6
 
1.7%
e 6
 
1.7%
5
 
1.5%
Other values (161) 249
72.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 290
84.5%
Lowercase Letter 31
 
9.0%
Uppercase Letter 12
 
3.5%
Other Punctuation 10
 
2.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
16
 
5.5%
12
 
4.1%
11
 
3.8%
11
 
3.8%
10
 
3.4%
7
 
2.4%
5
 
1.7%
5
 
1.7%
4
 
1.4%
4
 
1.4%
Other values (135) 205
70.7%
Lowercase Letter
ValueCountFrequency (%)
r 6
19.4%
e 6
19.4%
t 3
9.7%
o 3
9.7%
i 2
 
6.5%
a 2
 
6.5%
y 1
 
3.2%
h 1
 
3.2%
l 1
 
3.2%
d 1
 
3.2%
Other values (5) 5
16.1%
Uppercase Letter
ValueCountFrequency (%)
G 2
16.7%
E 2
16.7%
S 1
8.3%
M 1
8.3%
T 1
8.3%
W 1
8.3%
K 1
8.3%
D 1
8.3%
L 1
8.3%
P 1
8.3%
Other Punctuation
ValueCountFrequency (%)
/ 10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 290
84.5%
Latin 43
 
12.5%
Common 10
 
2.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
16
 
5.5%
12
 
4.1%
11
 
3.8%
11
 
3.8%
10
 
3.4%
7
 
2.4%
5
 
1.7%
5
 
1.7%
4
 
1.4%
4
 
1.4%
Other values (135) 205
70.7%
Latin
ValueCountFrequency (%)
r 6
 
14.0%
e 6
 
14.0%
t 3
 
7.0%
o 3
 
7.0%
i 2
 
4.7%
G 2
 
4.7%
E 2
 
4.7%
a 2
 
4.7%
y 1
 
2.3%
S 1
 
2.3%
Other values (15) 15
34.9%
Common
ValueCountFrequency (%)
/ 10
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 290
84.5%
ASCII 53
 
15.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
16
 
5.5%
12
 
4.1%
11
 
3.8%
11
 
3.8%
10
 
3.4%
7
 
2.4%
5
 
1.7%
5
 
1.7%
4
 
1.4%
4
 
1.4%
Other values (135) 205
70.7%
ASCII
ValueCountFrequency (%)
/ 10
18.9%
r 6
 
11.3%
e 6
 
11.3%
t 3
 
5.7%
o 3
 
5.7%
i 2
 
3.8%
G 2
 
3.8%
E 2
 
3.8%
a 2
 
3.8%
y 1
 
1.9%
Other values (16) 16
30.2%

기준카테고리번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct17
Distinct (%)94.4%
Missing52
Missing (%)74.3%
Infinite0
Infinite (%)0.0%
Mean4550.6667
Minimum36
Maximum8066
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size762.0 B
2023-12-12T21:54:47.541820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum36
5-th percentile501.8
Q11681
median4938
Q37735.5
95-th percentile8026.9
Maximum8066
Range8030
Interquartile range (IQR)6054.5

Descriptive statistics

Standard deviation3048.7521
Coefficient of variation (CV)0.66995724
Kurtosis-1.8520343
Mean4550.6667
Median Absolute Deviation (MAD)3061
Skewness-0.083999633
Sum81912
Variance9294889.3
MonotonicityNot monotonic
2023-12-12T21:54:47.700996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
7997 2
 
2.9%
8001 1
 
1.4%
584 1
 
1.4%
1711 1
 
1.4%
1634 1
 
1.4%
6951 1
 
1.4%
2987 1
 
1.4%
36 1
 
1.4%
6491 1
 
1.4%
3427 1
 
1.4%
Other values (7) 7
 
10.0%
(Missing) 52
74.3%
ValueCountFrequency (%)
36 1
1.4%
584 1
1.4%
1370 1
1.4%
1634 1
1.4%
1671 1
1.4%
1711 1
1.4%
2040 1
1.4%
2987 1
1.4%
3427 1
1.4%
6449 1
1.4%
ValueCountFrequency (%)
8066 1
1.4%
8020 1
1.4%
8001 1
1.4%
7997 2
2.9%
6951 1
1.4%
6491 1
1.4%
6480 1
1.4%
6449 1
1.4%
3427 1
1.4%
2987 1
1.4%

사용여부
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.9%
Missing0
Missing (%)0.0%
Memory size202.0 B
True
69 
False
 
1
ValueCountFrequency (%)
True 69
98.6%
False 1
 
1.4%
2023-12-12T21:54:47.841193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Interactions

2023-12-12T21:54:44.587837image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.050269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.423754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.172971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.691358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.136528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.525551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.278245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.805358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.244692image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.959518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.375907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.902153image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:43.326943image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.082723image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T21:54:44.475732image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:54:47.917689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체번호업체카테고리번호상위업체카테고리번호업체카테고리명기준카테고리번호사용여부
업체번호1.0000.8211.0001.0000.8970.000
업체카테고리번호0.8211.0000.9020.9530.6550.000
상위업체카테고리번호1.0000.9021.0001.000NaN0.000
업체카테고리명1.0000.9531.0001.0001.0000.000
기준카테고리번호0.8970.655NaN1.0001.000NaN
사용여부0.0000.0000.0000.000NaN1.000
2023-12-12T21:54:48.032422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체번호업체카테고리번호상위업체카테고리번호기준카테고리번호사용여부
업체번호1.0000.338-0.357-0.3770.000
업체카테고리번호0.3381.0000.872-0.5210.000
상위업체카테고리번호-0.3570.8721.000NaN0.000
기준카테고리번호-0.377-0.521NaN1.0001.000
사용여부0.0000.0000.0001.0001.000

Missing values

2023-12-12T21:54:45.042149image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:54:45.166851image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T21:54:45.269185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

기준연도기준월업체번호업체카테고리번호상위업체카테고리번호업체카테고리명기준카테고리번호사용여부
0202095101<NA>장갑<NA>Y
1202091524<NA>제초기6491Y
22020969711<NA>통기타3427Y
32020947913<NA>전두부8020Y
42020912342524캐릭터상품<NA>Y
52020912342624의류<NA>Y
62020912342724양가죽제품<NA>Y
720209123431<NA>TheGoatWorld<NA>Y
82020917162<NA>네일클리퍼<NA>Y
920209153764<NA>방향제1671Y
기준연도기준월업체번호업체카테고리번호상위업체카테고리번호업체카테고리명기준카테고리번호사용여부
6020209128466<NA>칼라복합기<NA>Y
6120209128468<NA>칼라프린터<NA>Y
6220209146974<NA>향초선물세트1711Y
6320209100176<NA>스킨케어<NA>Y
6420209100175<NA>헤어/바디<NA>Y
652020910017775바디로션/오일<NA>Y
662020910017976토너/에멀젼<NA>Y
672020910018076에센스/크림<NA>Y
6820209144773<NA>가방584Y
692020910017876입술<NA>Y