Overview

Dataset statistics

Number of variables3
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows563
Duplicate rows (%)5.6%
Total size in memory312.5 KiB
Average record size in memory32.0 B

Variable types

Categorical2
Text1

Dataset

Description온라인수출 B2B거래를 하기 위해 고비즈코리아 웹사이트에 가입하는 기업 회원 관련, 회원유형별 회원가입 경로, 고비즈 아이디 등의 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15118979/fileData.do

Alerts

Dataset has 563 (5.6%) duplicate rowsDuplicates
회원가입경로 is highly imbalanced (82.8%)Imbalance

Reproduction

Analysis started2023-12-12 16:31:42.159649
Analysis finished2023-12-12 16:31:42.630414
Duration0.47 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

회원유형
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중소기업회원
6448 
데이터 미집계
3552 

Length

Max length7
Median length6
Mean length6.3552
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row데이터 미집계
2nd row중소기업회원
3rd row중소기업회원
4th row중소기업회원
5th row중소기업회원

Common Values

ValueCountFrequency (%)
중소기업회원 6448
64.5%
데이터 미집계 3552
35.5%

Length

2023-12-13T01:31:42.733055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:31:42.857998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
중소기업회원 6448
47.6%
데이터 3552
26.2%
미집계 3552
26.2%

회원가입경로
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
고비즈코리아
9553 
플랫폼(웹)
 
432
데이터 미집계
 
15

Length

Max length7
Median length6
Mean length6.0015
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row고비즈코리아
2nd row고비즈코리아
3rd row고비즈코리아
4th row고비즈코리아
5th row고비즈코리아

Common Values

ValueCountFrequency (%)
고비즈코리아 9553
95.5%
플랫폼(웹) 432
 
4.3%
데이터 미집계 15
 
0.1%

Length

2023-12-13T01:31:42.991323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T01:31:43.123283image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
고비즈코리아 9553
95.4%
플랫폼(웹 432
 
4.3%
데이터 15
 
0.1%
미집계 15
 
0.1%
Distinct4768
Distinct (%)47.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-13T01:31:43.463781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length27
Mean length9.9
Min length3

Characters and Unicode

Total characters99000
Distinct characters72
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4141 ?
Unique (%)41.4%

Sample

1st row201*********
2nd rowdsp***
3rd rowhwa*******
4th rowpar*****
5th rowkev******
ValueCountFrequency (%)
201 2936
28.2%
미집계 395
 
3.8%
데이터 395
 
3.8%
199 320
 
3.1%
sys 300
 
2.9%
200 206
 
2.0%
han 75
 
0.7%
a05 58
 
0.6%
sun 54
 
0.5%
sam 46
 
0.4%
Other values (2968) 5610
54.0%
2023-12-13T01:31:43.933446image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
* 67420
68.1%
0 3515
 
3.6%
1 3458
 
3.5%
2 3259
 
3.3%
s 1968
 
2.0%
a 1415
 
1.4%
e 1169
 
1.2%
o 1111
 
1.1%
n 1073
 
1.1%
i 1040
 
1.1%
Other values (62) 13572
 
13.7%

Most occurring categories

ValueCountFrequency (%)
Other Punctuation 67432
68.1%
Lowercase Letter 17413
 
17.6%
Decimal Number 11271
 
11.4%
Other Letter 2370
 
2.4%
Space Separator 395
 
0.4%
Uppercase Letter 104
 
0.1%
Dash Punctuation 14
 
< 0.1%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 1968
 
11.3%
a 1415
 
8.1%
e 1169
 
6.7%
o 1111
 
6.4%
n 1073
 
6.2%
i 1040
 
6.0%
h 773
 
4.4%
m 772
 
4.4%
c 772
 
4.4%
y 733
 
4.2%
Other values (16) 6587
37.8%
Uppercase Letter
ValueCountFrequency (%)
A 15
 
14.4%
S 8
 
7.7%
E 7
 
6.7%
B 5
 
4.8%
G 5
 
4.8%
T 5
 
4.8%
J 5
 
4.8%
H 5
 
4.8%
C 5
 
4.8%
O 4
 
3.8%
Other values (14) 40
38.5%
Decimal Number
ValueCountFrequency (%)
0 3515
31.2%
1 3458
30.7%
2 3259
28.9%
9 671
 
6.0%
5 108
 
1.0%
3 85
 
0.8%
6 62
 
0.6%
8 43
 
0.4%
4 41
 
0.4%
7 29
 
0.3%
Other Letter
ValueCountFrequency (%)
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%
Other Punctuation
ValueCountFrequency (%)
* 67420
> 99.9%
. 7
 
< 0.1%
@ 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
395
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 14
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 79113
79.9%
Latin 17517
 
17.7%
Hangul 2370
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 1968
 
11.2%
a 1415
 
8.1%
e 1169
 
6.7%
o 1111
 
6.3%
n 1073
 
6.1%
i 1040
 
5.9%
h 773
 
4.4%
m 772
 
4.4%
c 772
 
4.4%
y 733
 
4.2%
Other values (40) 6691
38.2%
Common
ValueCountFrequency (%)
* 67420
85.2%
0 3515
 
4.4%
1 3458
 
4.4%
2 3259
 
4.1%
9 671
 
0.8%
395
 
0.5%
5 108
 
0.1%
3 85
 
0.1%
6 62
 
0.1%
8 43
 
0.1%
Other values (6) 97
 
0.1%
Hangul
ValueCountFrequency (%)
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 96630
97.6%
Hangul 2370
 
2.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
* 67420
69.8%
0 3515
 
3.6%
1 3458
 
3.6%
2 3259
 
3.4%
s 1968
 
2.0%
a 1415
 
1.5%
e 1169
 
1.2%
o 1111
 
1.1%
n 1073
 
1.1%
i 1040
 
1.1%
Other values (56) 11202
 
11.6%
Hangul
ValueCountFrequency (%)
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%
395
16.7%

Correlations

2023-12-13T01:31:44.023896image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회원유형회원가입경로
회원유형1.0000.100
회원가입경로0.1001.000
2023-12-13T01:31:44.114589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회원유형회원가입경로
회원유형1.0000.165
회원가입경로0.1651.000
2023-12-13T01:31:44.182863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
회원유형회원가입경로
회원유형1.0000.165
회원가입경로0.1651.000

Missing values

2023-12-13T01:31:42.471522image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T01:31:42.574745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

회원유형회원가입경로고비즈 아이디
77553데이터 미집계고비즈코리아201*********
20216중소기업회원고비즈코리아dsp***
6524중소기업회원고비즈코리아hwa*******
20808중소기업회원고비즈코리아par*****
41276중소기업회원고비즈코리아kev******
93986중소기업회원고비즈코리아kjs***************
28911중소기업회원고비즈코리아199*******
86848중소기업회원플랫폼(웹)데이터 미집계
76591데이터 미집계고비즈코리아wzl*****
89580데이터 미집계고비즈코리아201*********
회원유형회원가입경로고비즈 아이디
26488중소기업회원고비즈코리아199*******
72425데이터 미집계고비즈코리아201*********
52979데이터 미집계고비즈코리아201*********
12298중소기업회원고비즈코리아uni***
87471데이터 미집계고비즈코리아201*********
19499중소기업회원고비즈코리아com**
34349중소기업회원고비즈코리아sys************
87922중소기업회원고비즈코리아jkm**************
3493중소기업회원고비즈코리아kon*******
85089중소기업회원고비즈코리아bra*********************

Duplicate rows

Most frequently occurring

회원유형회원가입경로고비즈 아이디# duplicates
0데이터 미집계고비즈코리아201*********2933
562중소기업회원플랫폼(웹)데이터 미집계377
26중소기업회원고비즈코리아199*******319
501중소기업회원고비즈코리아sys************293
27중소기업회원고비즈코리아200*******190
50중소기업회원고비즈코리아a05**********56
228중소기업회원고비즈코리아han*****22
490중소기업회원고비즈코리아sun*****15
28중소기업회원고비즈코리아200********14
230중소기업회원고비즈코리아han*******13