Overview

Dataset statistics

Number of variables10
Number of observations77
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.1 KiB
Average record size in memory81.7 B

Variable types

Text2
Categorical5
Boolean3

Dataset

DescriptionSample
Author소상공인연합회
URLhttps://www.bigdata-telecom.kr/invoke/SOKBP2603/?goodsCode=KFMECMS003

Alerts

schtwr_at is highly overall correlated with ordtm_labrr_coHigh correlation
ordtm_labrr_co is highly overall correlated with schtwr_atHigh correlation
area_nm is highly imbalanced (56.1%)Imbalance
ordtm_labrr_co is highly imbalanced (62.0%)Imbalance
schtwr_at is highly imbalanced (90.0%)Imbalance
ten_phospho_belo_bplc_at is highly imbalanced (56.1%)Imbalance
bizrno has unique valuesUnique

Reproduction

Analysis started2023-12-10 06:39:19.163074
Analysis finished2023-12-10 06:39:20.625194
Duration1.46 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct76
Distinct (%)98.7%
Missing0
Missing (%)0.0%
Memory size748.0 B
2023-12-10T15:39:20.971166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length10
Mean length5.0649351
Min length1

Characters and Unicode

Total characters390
Distinct characters208
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique75 ?
Unique (%)97.4%

Sample

1st row개인택시
2nd rowJeep진주모다
3rd row따요몰
4th row따요몰
5th row단비와그린비
ValueCountFrequency (%)
따요몰 2
 
2.4%
생각나는사람들 1
 
1.2%
1 1
 
1.2%
호스텔바닐라 1
 
1.2%
호스텔바닐라2 1
 
1.2%
엘리슈슈 1
 
1.2%
종로트래블 1
 
1.2%
에이스인터내셔널 1
 
1.2%
브릿지 1
 
1.2%
아트 1
 
1.2%
Other values (72) 72
86.7%
2023-12-10T15:39:21.674245image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
12
 
3.1%
12
 
3.1%
9
 
2.3%
8
 
2.1%
7
 
1.8%
6
 
1.5%
5
 
1.3%
5
 
1.3%
5
 
1.3%
5
 
1.3%
Other values (198) 316
81.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 339
86.9%
Lowercase Letter 18
 
4.6%
Space Separator 12
 
3.1%
Uppercase Letter 12
 
3.1%
Open Punctuation 3
 
0.8%
Close Punctuation 3
 
0.8%
Decimal Number 3
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
12
 
3.5%
9
 
2.7%
8
 
2.4%
7
 
2.1%
6
 
1.8%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
Other values (173) 272
80.2%
Lowercase Letter
ValueCountFrequency (%)
e 4
22.2%
h 3
16.7%
a 3
16.7%
i 1
 
5.6%
r 1
 
5.6%
l 1
 
5.6%
t 1
 
5.6%
s 1
 
5.6%
o 1
 
5.6%
w 1
 
5.6%
Uppercase Letter
ValueCountFrequency (%)
S 3
25.0%
O 2
16.7%
F 1
 
8.3%
D 1
 
8.3%
M 1
 
8.3%
L 1
 
8.3%
J 1
 
8.3%
N 1
 
8.3%
B 1
 
8.3%
Decimal Number
ValueCountFrequency (%)
2 2
66.7%
1 1
33.3%
Space Separator
ValueCountFrequency (%)
12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 3
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 339
86.9%
Latin 30
 
7.7%
Common 21
 
5.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
12
 
3.5%
9
 
2.7%
8
 
2.4%
7
 
2.1%
6
 
1.8%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
Other values (173) 272
80.2%
Latin
ValueCountFrequency (%)
e 4
 
13.3%
h 3
 
10.0%
S 3
 
10.0%
a 3
 
10.0%
O 2
 
6.7%
F 1
 
3.3%
D 1
 
3.3%
i 1
 
3.3%
r 1
 
3.3%
M 1
 
3.3%
Other values (10) 10
33.3%
Common
ValueCountFrequency (%)
12
57.1%
( 3
 
14.3%
) 3
 
14.3%
2 2
 
9.5%
1 1
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 339
86.9%
ASCII 51
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12
23.5%
e 4
 
7.8%
( 3
 
5.9%
h 3
 
5.9%
) 3
 
5.9%
S 3
 
5.9%
a 3
 
5.9%
2 2
 
3.9%
O 2
 
3.9%
F 1
 
2.0%
Other values (15) 15
29.4%
Hangul
ValueCountFrequency (%)
12
 
3.5%
9
 
2.7%
8
 
2.4%
7
 
2.1%
6
 
1.8%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
5
 
1.5%
Other values (173) 272
80.2%

bizrno
Text

UNIQUE 

Distinct77
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size748.0 B
2023-12-10T15:39:22.159393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length10.090909
Min length9

Characters and Unicode

Total characters777
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique77 ?
Unique (%)100.0%

Sample

1st row(1191797694)
2nd row000000000
3rd row1002448612700
4th row1002448612722
5th row1010116480
ValueCountFrequency (%)
1191797694 1
 
1.3%
1011040982 1
 
1.3%
1011275786 1
 
1.3%
1011271344 1
 
1.3%
1011248182 1
 
1.3%
1011194660 1
 
1.3%
1011191722 1
 
1.3%
1011176911 1
 
1.3%
1011157632 1
 
1.3%
1011149447 1
 
1.3%
Other values (67) 67
87.0%
2023-12-10T15:39:22.891243image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 225
29.0%
0 155
19.9%
2 68
 
8.8%
6 54
 
6.9%
9 50
 
6.4%
7 49
 
6.3%
4 49
 
6.3%
8 44
 
5.7%
3 42
 
5.4%
5 39
 
5.0%
Other values (2) 2
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 775
99.7%
Open Punctuation 1
 
0.1%
Close Punctuation 1
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 225
29.0%
0 155
20.0%
2 68
 
8.8%
6 54
 
7.0%
9 50
 
6.5%
7 49
 
6.3%
4 49
 
6.3%
8 44
 
5.7%
3 42
 
5.4%
5 39
 
5.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 777
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 225
29.0%
0 155
19.9%
2 68
 
8.8%
6 54
 
6.9%
9 50
 
6.4%
7 49
 
6.3%
4 49
 
6.3%
8 44
 
5.7%
3 42
 
5.4%
5 39
 
5.0%
Other values (2) 2
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 777
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 225
29.0%
0 155
19.9%
2 68
 
8.8%
6 54
 
6.9%
9 50
 
6.4%
7 49
 
6.3%
4 49
 
6.3%
8 44
 
5.7%
3 42
 
5.4%
5 39
 
5.0%
Other values (2) 2
 
0.3%

rprsntv_nm
Categorical

Distinct28
Distinct (%)36.4%
Missing0
Missing (%)0.0%
Memory size748.0 B
김**
22 
이**
12 
박**
정**
 
3
손**
 
3
Other values (23)
30 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique17 ?
Unique (%)22.1%

Sample

1st row정**
2nd row정**
3rd rowL**
4th rowL**
5th row박**

Common Values

ValueCountFrequency (%)
김** 22
28.6%
이** 12
15.6%
박** 7
 
9.1%
정** 3
 
3.9%
손** 3
 
3.9%
최** 3
 
3.9%
L** 2
 
2.6%
서** 2
 
2.6%
백** 2
 
2.6%
오** 2
 
2.6%
Other values (18) 19
24.7%

Length

2023-12-10T15:39:23.145230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
22
28.6%
12
15.6%
7
 
9.1%
3
 
3.9%
3
 
3.9%
3
 
3.9%
2
 
2.6%
2
 
2.6%
2
 
2.6%
2
 
2.6%
Other values (18) 19
24.7%

rprsntv_age
Categorical

Distinct5
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size748.0 B
50대
27 
40대
27 
60대
15 
30대
70대
 
2

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row60대
2nd row50대
3rd row30대
4th row30대
5th row50대

Common Values

ValueCountFrequency (%)
50대 27
35.1%
40대 27
35.1%
60대 15
19.5%
30대 6
 
7.8%
70대 2
 
2.6%

Length

2023-12-10T15:39:23.362489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:23.545613image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
50대 27
35.1%
40대 27
35.1%
60대 15
19.5%
30대 6
 
7.8%
70대 2
 
2.6%

area_nm
Categorical

IMBALANCE 

Distinct9
Distinct (%)11.7%
Missing0
Missing (%)0.0%
Memory size748.0 B
서울
59 
경기
전남
 
3
경남
 
2
부산
 
2
Other values (4)
 
4

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique4 ?
Unique (%)5.2%

Sample

1st row서울
2nd row경남
3rd row서울
4th row서울
5th row경기

Common Values

ValueCountFrequency (%)
서울 59
76.6%
경기 7
 
9.1%
전남 3
 
3.9%
경남 2
 
2.6%
부산 2
 
2.6%
대구 1
 
1.3%
전북 1
 
1.3%
제주 1
 
1.3%
충남 1
 
1.3%

Length

2023-12-10T15:39:23.765370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:23.968390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울 59
76.6%
경기 7
 
9.1%
전남 3
 
3.9%
경남 2
 
2.6%
부산 2
 
2.6%
대구 1
 
1.3%
전북 1
 
1.3%
제주 1
 
1.3%
충남 1
 
1.3%

ordtm_labrr_co
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)6.5%
Missing0
Missing (%)0.0%
Memory size748.0 B
1인
64 
2인
3인
 
2
5인
 
1
7인
 
1

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique2 ?
Unique (%)2.6%

Sample

1st row1인
2nd row1인
3rd row2인
4th row1인
5th row1인

Common Values

ValueCountFrequency (%)
1인 64
83.1%
2인 9
 
11.7%
3인 2
 
2.6%
5인 1
 
1.3%
7인 1
 
1.3%

Length

2023-12-10T15:39:24.177931image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:24.371025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1인 64
83.1%
2인 9
 
11.7%
3인 2
 
2.6%
5인 1
 
1.3%
7인 1
 
1.3%

bsns_pd_value
Categorical

Distinct4
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Memory size748.0 B
3년 미만
25 
10년 이상
22 
5년 이상~10년 미만
19 
3년 이상~5년 미만
11 

Length

Max length12
Median length11
Mean length7.8701299
Min length5

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5년 이상~10년 미만
2nd row3년 미만
3rd row10년 이상
4th row3년 미만
5th row5년 이상~10년 미만

Common Values

ValueCountFrequency (%)
3년 미만 25
32.5%
10년 이상 22
28.6%
5년 이상~10년 미만 19
24.7%
3년 이상~5년 미만 11
14.3%

Length

2023-12-10T15:39:24.569025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:39:24.774282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미만 55
29.9%
3년 36
19.6%
10년 22
 
12.0%
이상 22
 
12.0%
5년 19
 
10.3%
이상~10년 19
 
10.3%
이상~5년 11
 
6.0%

schtwr_at
Boolean

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size209.0 B
True
76 
False
 
1
ValueCountFrequency (%)
True 76
98.7%
False 1
 
1.3%
2023-12-10T15:39:24.951706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Distinct2
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size209.0 B
True
66 
False
11 
ValueCountFrequency (%)
True 66
85.7%
False 11
 
14.3%
2023-12-10T15:39:25.119220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

ten_phospho_belo_bplc_at
Boolean

IMBALANCE 

Distinct2
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Memory size209.0 B
False
70 
True
 
7
ValueCountFrequency (%)
False 70
90.9%
True 7
 
9.1%
2023-12-10T15:39:25.332561image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:39:25.452057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
cmpnm_nmbizrnorprsntv_nmrprsntv_agearea_nmordtm_labrr_cobsns_pd_valueschtwr_atfive_phospho_belo_bplc_atten_phospho_belo_bplc_at
cmpnm_nm1.0001.0001.0001.0001.0000.8800.9081.0000.0001.000
bizrno1.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
rprsntv_nm1.0001.0001.0000.5210.0000.0000.1010.0000.0000.272
rprsntv_age1.0001.0000.5211.0000.0000.1690.0000.0000.0000.083
area_nm1.0001.0000.0000.0001.0000.6470.0000.0000.0000.000
ordtm_labrr_co0.8801.0000.0000.1690.6471.0000.0001.0000.0000.000
bsns_pd_value0.9081.0000.1010.0000.0000.0001.0000.0000.0000.258
schtwr_at1.0001.0000.0000.0000.0001.0000.0001.0000.0000.000
five_phospho_belo_bplc_at0.0001.0000.0000.0000.0000.0000.0000.0001.0000.000
ten_phospho_belo_bplc_at1.0001.0000.2720.0830.0000.0000.2580.0000.0001.000
2023-12-10T15:39:25.683721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
schtwr_atrprsntv_ageordtm_labrr_coten_phospho_belo_bplc_atarea_nmbsns_pd_valuerprsntv_nmfive_phospho_belo_bplc_at
schtwr_at1.0000.0000.9800.0000.0000.0000.0000.000
rprsntv_age0.0001.0000.0580.0960.0000.0000.2220.000
ordtm_labrr_co0.9800.0581.0000.0000.4320.0000.0000.000
ten_phospho_belo_bplc_at0.0000.0960.0001.0000.0000.1680.1610.000
area_nm0.0000.0000.4320.0001.0000.0000.0000.000
bsns_pd_value0.0000.0000.0000.1680.0001.0000.0000.000
rprsntv_nm0.0000.2220.0000.1610.0000.0001.0000.000
five_phospho_belo_bplc_at0.0000.0000.0000.0000.0000.0000.0001.000
2023-12-10T15:39:25.952844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
rprsntv_nmrprsntv_agearea_nmordtm_labrr_cobsns_pd_valueschtwr_atfive_phospho_belo_bplc_atten_phospho_belo_bplc_at
rprsntv_nm1.0000.2220.0000.0000.0000.0000.0000.161
rprsntv_age0.2221.0000.0000.0580.0000.0000.0000.096
area_nm0.0000.0001.0000.4320.0000.0000.0000.000
ordtm_labrr_co0.0000.0580.4321.0000.0000.9800.0000.000
bsns_pd_value0.0000.0000.0000.0001.0000.0000.0000.168
schtwr_at0.0000.0000.0000.9800.0001.0000.0000.000
five_phospho_belo_bplc_at0.0000.0000.0000.0000.0000.0001.0000.000
ten_phospho_belo_bplc_at0.1610.0960.0000.0000.1680.0000.0001.000

Missing values

2023-12-10T15:39:20.249822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:39:20.526492image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

cmpnm_nmbizrnorprsntv_nmrprsntv_agearea_nmordtm_labrr_cobsns_pd_valueschtwr_atfive_phospho_belo_bplc_atten_phospho_belo_bplc_at
0개인택시(1191797694)정**60대서울1인5년 이상~10년 미만ynn
1Jeep진주모다000000000정**50대경남1인3년 미만yyn
2따요몰1002448612700L**30대서울2인10년 이상yyn
3따요몰1002448612722L**30대서울1인3년 미만ynn
4단비와그린비1010116480박**50대경기1인5년 이상~10년 미만yyn
5신세계투어1010202208박**60대서울1인5년 이상~10년 미만yyn
6아라반도체1010242442백**60대서울1인10년 이상yyn
7상하정1010273834이**50대서울1인3년 미만yyn
8에이스산업1010328205전**50대서울1인3년 이상~5년 미만yyn
9Sae wha hostel(세화 호스텔)1010427334임**40대서울3인10년 이상yyn
cmpnm_nmbizrnorprsntv_nmrprsntv_agearea_nmordtm_labrr_cobsns_pd_valueschtwr_atfive_phospho_belo_bplc_atten_phospho_belo_bplc_at
67통뼈감자탕 사직점1011805808김**50대부산1인10년 이상yyn
68컴퓨터수리1011896992이**40대서울3인5년 이상~10년 미만yyn
69담소정1011902206박**40대전남1인10년 이상yyn
70영광중기1012067845이**50대전남1인3년 미만ynn
71여우뷰티1012180923김**40대경기1인5년 이상~10년 미만yyn
72손원정hair1012181131손**40대충남7인3년 미만yyn
732층이발관1012196385김**60대경기1인3년 이상~5년 미만yyn
74아트존1012592688김**50대서울1인3년 미만yyy
75백현건기1012617560홍**50대경기1인3년 미만ynn
76한평여관1012709994김**60대서울1인3년 미만yyn