Overview

Dataset statistics

Number of variables4
Number of observations75
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.5 KiB
Average record size in memory33.8 B

Variable types

Text2
Categorical2

Dataset

Description샘플 데이터
Author신한카드
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=318

Alerts

대분류(CLASS1) is highly overall correlated with 중분류(CLASS2)High correlation
중분류(CLASS2) is highly overall correlated with 대분류(CLASS1)High correlation
업종코드(UPJONG_CD) has unique valuesUnique
소분류(CLASS3) has unique valuesUnique

Reproduction

Analysis started2023-12-10 14:58:47.893814
Analysis finished2023-12-10 14:58:48.601318
Duration0.71 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct75
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size732.0 B
2023-12-10T23:58:48.965163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters375
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique75 ?
Unique (%)100.0%

Sample

1st rowss001
2nd rowss002
3rd rowss003
4th rowss004
5th rowss005
ValueCountFrequency (%)
ss001 1
 
1.3%
ss041 1
 
1.3%
ss058 1
 
1.3%
ss057 1
 
1.3%
ss056 1
 
1.3%
ss055 1
 
1.3%
ss054 1
 
1.3%
ss053 1
 
1.3%
ss052 1
 
1.3%
ss050 1
 
1.3%
Other values (65) 65
86.7%
2023-12-10T23:58:49.713648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 150
40.0%
0 92
24.5%
3 19
 
5.1%
1 18
 
4.8%
4 18
 
4.8%
2 17
 
4.5%
6 16
 
4.3%
5 15
 
4.0%
8 11
 
2.9%
9 11
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 225
60.0%
Lowercase Letter 150
40.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 92
40.9%
3 19
 
8.4%
1 18
 
8.0%
4 18
 
8.0%
2 17
 
7.6%
6 16
 
7.1%
5 15
 
6.7%
8 11
 
4.9%
9 11
 
4.9%
7 8
 
3.6%
Lowercase Letter
ValueCountFrequency (%)
s 150
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 225
60.0%
Latin 150
40.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 92
40.9%
3 19
 
8.4%
1 18
 
8.0%
4 18
 
8.0%
2 17
 
7.6%
6 16
 
7.1%
5 15
 
6.7%
8 11
 
4.9%
9 11
 
4.9%
7 8
 
3.6%
Latin
ValueCountFrequency (%)
s 150
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 375
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 150
40.0%
0 92
24.5%
3 19
 
5.1%
1 18
 
4.8%
4 18
 
4.8%
2 17
 
4.5%
6 16
 
4.3%
5 15
 
4.0%
8 11
 
2.9%
9 11
 
2.9%

대분류(CLASS1)
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)18.7%
Missing0
Missing (%)0.0%
Memory size732.0 B
스포츠/문화/레저
12 
요식/유흥
11 
교육/학원
유통
여행/교통
Other values (9)
33 

Length

Max length9
Median length8
Mean length5.12
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row요식/유흥
2nd row요식/유흥
3rd row요식/유흥
4th row요식/유흥
5th row요식/유흥

Common Values

ValueCountFrequency (%)
스포츠/문화/레저 12
16.0%
요식/유흥 11
14.7%
교육/학원 7
9.3%
유통 6
8.0%
여행/교통 6
8.0%
가정생활/서비스 6
8.0%
의료 6
8.0%
의류/잡화 4
 
5.3%
음/식료품 3
 
4.0%
미용 3
 
4.0%
Other values (4) 11
14.7%

Length

2023-12-10T23:58:49.989076image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
스포츠/문화/레저 12
16.0%
요식/유흥 11
14.7%
교육/학원 7
9.3%
유통 6
8.0%
여행/교통 6
8.0%
가정생활/서비스 6
8.0%
의료 6
8.0%
의류/잡화 4
 
5.3%
음/식료품 3
 
4.0%
미용 3
 
4.0%
Other values (4) 11
14.7%

중분류(CLASS2)
Categorical

HIGH CORRELATION 

Distinct33
Distinct (%)44.0%
Missing0
Missing (%)0.0%
Memory size732.0 B
스포츠/문화/레저
학원
 
5
병원
 
4
음/식료품
 
3
서비스
 
3
Other values (28)
51 

Length

Max length11
Median length9
Mean length5.16
Min length2

Unique

Unique14 ?
Unique (%)18.7%

Sample

1st row한식
2nd row일식/중식/양식
3rd row일식/중식/양식
4th row일식/중식/양식
5th row제과/커피/패스트푸드

Common Values

ValueCountFrequency (%)
스포츠/문화/레저 9
 
12.0%
학원 5
 
6.7%
병원 4
 
5.3%
음/식료품 3
 
4.0%
서비스 3
 
4.0%
일식/중식/양식 3
 
4.0%
여행 3
 
4.0%
스포츠/문화/레저용품 3
 
4.0%
전자상거래 3
 
4.0%
가전/가구 3
 
4.0%
Other values (23) 36
48.0%

Length

2023-12-10T23:58:50.214410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
스포츠/문화/레저 9
 
12.0%
학원 5
 
6.7%
병원 4
 
5.3%
음/식료품 3
 
4.0%
서비스 3
 
4.0%
일식/중식/양식 3
 
4.0%
여행 3
 
4.0%
스포츠/문화/레저용품 3
 
4.0%
전자상거래 3
 
4.0%
가전/가구 3
 
4.0%
Other values (23) 36
48.0%

소분류(CLASS3)
Text

UNIQUE 

Distinct75
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size732.0 B
2023-12-10T23:58:50.605589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length7
Mean length4.08
Min length2

Characters and Unicode

Total characters306
Distinct characters144
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique75 ?
Unique (%)100.0%

Sample

1st row한식
2nd row일식
3rd row양식
4th row중식
5th row제과점
ValueCountFrequency (%)
한식 1
 
1.3%
미용실 1
 
1.3%
약국 1
 
1.3%
한의원 1
 
1.3%
치과병원 1
 
1.3%
일반병원 1
 
1.3%
종합병원 1
 
1.3%
교육용품 1
 
1.3%
유아교육 1
 
1.3%
독서실 1
 
1.3%
Other values (65) 65
86.7%
2023-12-10T23:58:51.309629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
/ 13
 
4.2%
9
 
2.9%
9
 
2.9%
9
 
2.9%
7
 
2.3%
7
 
2.3%
7
 
2.3%
6
 
2.0%
6
 
2.0%
6
 
2.0%
Other values (134) 227
74.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 286
93.5%
Other Punctuation 13
 
4.2%
Uppercase Letter 5
 
1.6%
Open Punctuation 1
 
0.3%
Close Punctuation 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
9
 
3.1%
9
 
3.1%
9
 
3.1%
7
 
2.4%
7
 
2.4%
7
 
2.4%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
Other values (128) 214
74.8%
Uppercase Letter
ValueCountFrequency (%)
P 2
40.0%
G 2
40.0%
L 1
20.0%
Other Punctuation
ValueCountFrequency (%)
/ 13
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 286
93.5%
Common 15
 
4.9%
Latin 5
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
9
 
3.1%
9
 
3.1%
9
 
3.1%
7
 
2.4%
7
 
2.4%
7
 
2.4%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
Other values (128) 214
74.8%
Common
ValueCountFrequency (%)
/ 13
86.7%
( 1
 
6.7%
) 1
 
6.7%
Latin
ValueCountFrequency (%)
P 2
40.0%
G 2
40.0%
L 1
20.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 286
93.5%
ASCII 20
 
6.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
/ 13
65.0%
P 2
 
10.0%
G 2
 
10.0%
( 1
 
5.0%
L 1
 
5.0%
) 1
 
5.0%
Hangul
ValueCountFrequency (%)
9
 
3.1%
9
 
3.1%
9
 
3.1%
7
 
2.4%
7
 
2.4%
7
 
2.4%
6
 
2.1%
6
 
2.1%
6
 
2.1%
6
 
2.1%
Other values (128) 214
74.8%

Correlations

2023-12-10T23:58:51.505293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업종코드(UPJONG_CD)대분류(CLASS1)중분류(CLASS2)소분류(CLASS3)
업종코드(UPJONG_CD)1.0001.0001.0001.000
대분류(CLASS1)1.0001.0001.0001.000
중분류(CLASS2)1.0001.0001.0001.000
소분류(CLASS3)1.0001.0001.0001.000
2023-12-10T23:58:51.736765image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
중분류(CLASS2)대분류(CLASS1)
중분류(CLASS2)1.0000.830
대분류(CLASS1)0.8301.000
2023-12-10T23:58:51.989631image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대분류(CLASS1)중분류(CLASS2)
대분류(CLASS1)1.0000.830
중분류(CLASS2)0.8301.000

Missing values

2023-12-10T23:58:48.335160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T23:58:48.521343image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

업종코드(UPJONG_CD)대분류(CLASS1)중분류(CLASS2)소분류(CLASS3)
0ss001요식/유흥한식한식
1ss002요식/유흥일식/중식/양식일식
2ss003요식/유흥일식/중식/양식양식
3ss004요식/유흥일식/중식/양식중식
4ss005요식/유흥제과/커피/패스트푸드제과점
5ss006요식/유흥제과/커피/패스트푸드커피전문점
6ss007요식/유흥제과/커피/패스트푸드패스트푸드
7ss008요식/유흥기타요식기타요식
8ss009요식/유흥유흥노래방
9ss010요식/유흥유흥기타유흥업소
업종코드(UPJONG_CD)대분류(CLASS1)중분류(CLASS2)소분류(CLASS3)
65ss069전자상거래전자상거래결제대행(PG)
66ss070전자상거래전자상거래홈쇼핑
67ss081스포츠/문화/레저스포츠/문화/레저실내골프
68ss082스포츠/문화/레저스포츠/문화/레저헬스
69ss083스포츠/문화/레저스포츠/문화/레저실외골프
70ss084스포츠/문화/레저스포츠/문화/레저스키
71ss090교육/학원학원입시보습학원
72ss091교육/학원학원외국어학원
73ss092교육/학원학원예체능학원
74ss093교육/학원학원취미/전문학원