Overview

Dataset statistics

Number of variables6
Number of observations122
Missing cells410
Missing cells (%)56.0%
Duplicate rows4
Duplicate rows (%)3.3%
Total size in memory6.1 KiB
Average record size in memory51.1 B

Variable types

Unsupported2
Text2
Numeric1
Categorical1

Dataset

Description샘플 데이터
Author컨슈머인사이트
URLhttp://www.datastore.or.kr/product/file/d79bd378-0b0a-406d-9b59-df472f66b6c7

Alerts

Dataset has 4 (3.3%) duplicate rowsDuplicates
Unnamed: 3 is highly overall correlated with Unnamed: 4High correlation
Unnamed: 4 is highly overall correlated with Unnamed: 3High correlation
Unnamed: 0 has 122 (100.0%) missing valuesMissing
Column 정의서 has 82 (67.2%) missing valuesMissing
Unnamed: 2 has 83 (68.0%) missing valuesMissing
Unnamed: 3 has 40 (32.8%) missing valuesMissing
Unnamed: 5 has 83 (68.0%) missing valuesMissing
Unnamed: 0 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 5 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-11 03:18:48.043105
Analysis finished2024-03-11 03:18:49.645628
Duration1.6 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Unnamed: 0
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing122
Missing (%)100.0%
Memory size1.2 KiB

Column 정의서
Text

MISSING 

Distinct40
Distinct (%)100.0%
Missing82
Missing (%)67.2%
Memory size1.1 KiB
2024-03-11T12:18:49.784206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length44
Median length8
Mean length6.275
Min length2

Characters and Unicode

Total characters251
Distinct characters44
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)100.0%

Sample

1st row1. 컨슈머인사이트 이동통신 기획조사 _ 통신사 및 통신서비스 브랜드 Index
2nd row항목
3rd rowidx
4th rowA01011
5th rowA0101
ValueCountFrequency (%)
1 1
 
2.0%
g0102 1
 
2.0%
g0104 1
 
2.0%
g0105 1
 
2.0%
g0106 1
 
2.0%
g0107 1
 
2.0%
g0108 1
 
2.0%
g0109 1
 
2.0%
g010101 1
 
2.0%
g010601 1
 
2.0%
Other values (39) 39
79.6%
2024-03-11T12:18:50.113206image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 74
29.5%
1 51
20.3%
G 25
 
10.0%
A 12
 
4.8%
4 11
 
4.4%
9
 
3.6%
2 7
 
2.8%
6 6
 
2.4%
3 6
 
2.4%
3
 
1.2%
Other values (34) 47
18.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 163
64.9%
Uppercase Letter 41
 
16.3%
Other Letter 29
 
11.6%
Space Separator 9
 
3.6%
Lowercase Letter 7
 
2.8%
Other Punctuation 1
 
0.4%
Connector Punctuation 1
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3
 
10.3%
3
 
10.3%
3
 
10.3%
2
 
6.9%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
Other values (12) 12
41.4%
Decimal Number
ValueCountFrequency (%)
0 74
45.4%
1 51
31.3%
4 11
 
6.7%
2 7
 
4.3%
6 6
 
3.7%
3 6
 
3.7%
9 2
 
1.2%
8 2
 
1.2%
7 2
 
1.2%
5 2
 
1.2%
Lowercase Letter
ValueCountFrequency (%)
x 2
28.6%
d 2
28.6%
i 1
14.3%
e 1
14.3%
n 1
14.3%
Uppercase Letter
ValueCountFrequency (%)
G 25
61.0%
A 12
29.3%
M 3
 
7.3%
I 1
 
2.4%
Space Separator
ValueCountFrequency (%)
9
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 174
69.3%
Latin 48
 
19.1%
Hangul 29
 
11.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3
 
10.3%
3
 
10.3%
3
 
10.3%
2
 
6.9%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
Other values (12) 12
41.4%
Common
ValueCountFrequency (%)
0 74
42.5%
1 51
29.3%
4 11
 
6.3%
9
 
5.2%
2 7
 
4.0%
6 6
 
3.4%
3 6
 
3.4%
9 2
 
1.1%
8 2
 
1.1%
7 2
 
1.1%
Other values (3) 4
 
2.3%
Latin
ValueCountFrequency (%)
G 25
52.1%
A 12
25.0%
M 3
 
6.2%
x 2
 
4.2%
d 2
 
4.2%
i 1
 
2.1%
e 1
 
2.1%
n 1
 
2.1%
I 1
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 222
88.4%
Hangul 29
 
11.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 74
33.3%
1 51
23.0%
G 25
 
11.3%
A 12
 
5.4%
4 11
 
5.0%
9
 
4.1%
2 7
 
3.2%
6 6
 
2.7%
3 6
 
2.7%
M 3
 
1.4%
Other values (12) 18
 
8.1%
Hangul
ValueCountFrequency (%)
3
 
10.3%
3
 
10.3%
3
 
10.3%
2
 
6.9%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
1
 
3.4%
Other values (12) 12
41.4%

Unnamed: 2
Text

MISSING 

Distinct39
Distinct (%)100.0%
Missing83
Missing (%)68.0%
Memory size1.1 KiB
2024-03-11T12:18:50.302250image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length31
Median length20
Mean length14.076923
Min length2

Characters and Unicode

Total characters549
Distinct characters107
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39 ?
Unique (%)100.0%

Sample

1st row변수명
2nd row사용자 식별 번호
3rd row통신사 비보조 인지 [최초 인지]
4th row통신사 비보조 인지 [1+2+3순위]
5th row5G 서비스 비보조 인지 [최초 인지]
ValueCountFrequency (%)
인지 18
 
12.7%
비보조 12
 
8.5%
브랜드 10
 
7.0%
5g 8
 
5.6%
서비스 6
 
4.2%
데이터 6
 
4.2%
최초 6
 
4.2%
1+2+3순위 5
 
3.5%
통신사 4
 
2.8%
최선호 4
 
2.8%
Other values (47) 63
44.4%
2024-03-11T12:18:50.605194image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
103
 
18.8%
21
 
3.8%
20
 
3.6%
20
 
3.6%
14
 
2.6%
13
 
2.4%
12
 
2.2%
] 12
 
2.2%
[ 12
 
2.2%
12
 
2.2%
Other values (97) 310
56.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 355
64.7%
Space Separator 103
 
18.8%
Decimal Number 26
 
4.7%
Uppercase Letter 24
 
4.4%
Close Punctuation 14
 
2.6%
Open Punctuation 14
 
2.6%
Math Symbol 11
 
2.0%
Other Punctuation 2
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
21
 
5.9%
20
 
5.6%
20
 
5.6%
14
 
3.9%
13
 
3.7%
12
 
3.4%
12
 
3.4%
12
 
3.4%
10
 
2.8%
10
 
2.8%
Other values (79) 211
59.4%
Uppercase Letter
ValueCountFrequency (%)
G 8
33.3%
I 5
20.8%
P 3
 
12.5%
T 3
 
12.5%
V 3
 
12.5%
A 2
 
8.3%
Decimal Number
ValueCountFrequency (%)
5 9
34.6%
1 7
26.9%
2 5
19.2%
3 5
19.2%
Close Punctuation
ValueCountFrequency (%)
] 12
85.7%
) 2
 
14.3%
Open Punctuation
ValueCountFrequency (%)
[ 12
85.7%
( 2
 
14.3%
Math Symbol
ValueCountFrequency (%)
+ 10
90.9%
~ 1
 
9.1%
Space Separator
ValueCountFrequency (%)
103
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 355
64.7%
Common 170
31.0%
Latin 24
 
4.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
21
 
5.9%
20
 
5.6%
20
 
5.6%
14
 
3.9%
13
 
3.7%
12
 
3.4%
12
 
3.4%
12
 
3.4%
10
 
2.8%
10
 
2.8%
Other values (79) 211
59.4%
Common
ValueCountFrequency (%)
103
60.6%
] 12
 
7.1%
[ 12
 
7.1%
+ 10
 
5.9%
5 9
 
5.3%
1 7
 
4.1%
2 5
 
2.9%
3 5
 
2.9%
/ 2
 
1.2%
( 2
 
1.2%
Other values (2) 3
 
1.8%
Latin
ValueCountFrequency (%)
G 8
33.3%
I 5
20.8%
P 3
 
12.5%
T 3
 
12.5%
V 3
 
12.5%
A 2
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 355
64.7%
ASCII 194
35.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
103
53.1%
] 12
 
6.2%
[ 12
 
6.2%
+ 10
 
5.2%
5 9
 
4.6%
G 8
 
4.1%
1 7
 
3.6%
I 5
 
2.6%
2 5
 
2.6%
3 5
 
2.6%
Other values (8) 18
 
9.3%
Hangul
ValueCountFrequency (%)
21
 
5.9%
20
 
5.6%
20
 
5.6%
14
 
3.9%
13
 
3.7%
12
 
3.4%
12
 
3.4%
12
 
3.4%
10
 
2.8%
10
 
2.8%
Other values (79) 211
59.4%

Unnamed: 3
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct11
Distinct (%)13.4%
Missing40
Missing (%)32.8%
Infinite0
Infinite (%)0.0%
Mean6.7073171
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2024-03-11T12:18:50.700598image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q34.75
95-th percentile10
Maximum99
Range98
Interquartile range (IQR)3.75

Descriptive statistics

Standard deviation18.218487
Coefficient of variation (CV)2.7162108
Kurtosis22.052087
Mean6.7073171
Median Absolute Deviation (MAD)1
Skewness4.7596449
Sum550
Variance331.91328
MonotonicityNot monotonic
2024-03-11T12:18:50.781448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1 37
30.3%
2 22
18.0%
10 15
 
12.3%
99 1
 
0.8%
3 1
 
0.8%
4 1
 
0.8%
5 1
 
0.8%
6 1
 
0.8%
7 1
 
0.8%
97 1
 
0.8%
(Missing) 40
32.8%
ValueCountFrequency (%)
1 37
30.3%
2 22
18.0%
3 1
 
0.8%
4 1
 
0.8%
5 1
 
0.8%
6 1
 
0.8%
7 1
 
0.8%
10 15
12.3%
97 1
 
0.8%
98 1
 
0.8%
ValueCountFrequency (%)
99 1
 
0.8%
98 1
 
0.8%
97 1
 
0.8%
10 15
12.3%
7 1
 
0.8%
6 1
 
0.8%
5 1
 
0.8%
4 1
 
0.8%
3 1
 
0.8%
2 22
18.0%

Unnamed: 4
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)23.8%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
35 
KT
20 
매우 불만족한다
14 
매우 만족한다
14 
SKT
Other values (24)
33 

Length

Max length37
Median length27
Mean length5.2704918
Min length1

Unique

Unique19 ?
Unique (%)15.6%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row내용
5th row-

Common Values

ValueCountFrequency (%)
35
28.7%
KT 20
16.4%
매우 불만족한다 14
 
11.5%
매우 만족한다 14
 
11.5%
SKT 6
 
4.9%
SK텔레콤 5
 
4.1%
<NA> 3
 
2.5%
SKB(B TV) 2
 
1.6%
SK 2
 
1.6%
SKB 2
 
1.6%
Other values (19) 19
15.6%

Length

2024-03-11T12:18:50.888495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
35
16.9%
매우 28
 
13.5%
kt 20
 
9.7%
불만족한다 14
 
6.8%
만족한다 14
 
6.8%
5g 6
 
2.9%
skt 6
 
2.9%
sk텔레콤 5
 
2.4%
sk 4
 
1.9%
4
 
1.9%
Other values (54) 71
34.3%

Unnamed: 5
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing83
Missing (%)68.0%
Memory size1.1 KiB

Interactions

2024-03-11T12:18:49.277519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-11T12:18:50.955542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Column 정의서Unnamed: 2Unnamed: 3Unnamed: 4
Column 정의서1.0001.000NaN1.000
Unnamed: 21.0001.000NaN1.000
Unnamed: 3NaNNaN1.0001.000
Unnamed: 41.0001.0001.0001.000
2024-03-11T12:18:51.042420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Unnamed: 3Unnamed: 4
Unnamed: 31.0000.844
Unnamed: 40.8441.000

Missing values

2024-03-11T12:18:49.414071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-11T12:18:49.499238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-11T12:18:49.591285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Unnamed: 0Column 정의서Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5
0<NA><NA><NA><NA><NA>NaN
1<NA>1. 컨슈머인사이트 이동통신 기획조사 _ 통신사 및 통신서비스 브랜드 Index<NA><NA><NA>NaN
2<NA><NA><NA><NA><NA>NaN
3<NA>항목변수명<NA>내용DATA 예시
4<NA>idx사용자 식별 번호<NA>-29906
5<NA>A01011통신사 비보조 인지 [최초 인지]1SKTSKT
6<NA><NA><NA>2KTNaN
7<NA><NA><NA><NA>NaN
8<NA>A0101통신사 비보조 인지 [1+2+3순위]1SKTSKT
9<NA><NA><NA>2KTNaN
Unnamed: 0Column 정의서Unnamed: 2Unnamed: 3Unnamed: 4Unnamed: 5
112<NA><NA><NA>98특별한 이유 없음NaN
113<NA>G10최선호 유/무선 통합 브랜드1SK통신계열사(SK텔레콤, SK브로드밴드)1
114<NA><NA><NA>2KTNaN
115<NA><NA><NA><NA>NaN
116<NA>G11최선호 유선 초고속 인터넷 브랜드1SK B인터넷1
117<NA><NA><NA>2KTNaN
118<NA><NA><NA><NA>NaN
119<NA>G12최선호 IPTV 브랜드1SK B TV1
120<NA><NA><NA>2KTNaN
121<NA><NA><NA><NA>NaN

Duplicate rows

Most frequently occurring

Column 정의서Unnamed: 2Unnamed: 3Unnamed: 4# duplicates
2<NA><NA><NA>35
0<NA><NA>2KT20
1<NA><NA>10매우 만족한다14
3<NA><NA><NA><NA>2