gimi9 Pandas Profiling

Dataset statistics

Number of variables	1
Number of observations	4993
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	39.1 KiB
Average record size in memory	8.0 B

Variable types

Text	1

Dataset

Description	한국사학진흥재단 홈페이지에서 국민을 대상으로 실시한 설문조사에 응답한 설문조사 참여자에 대한 정보(아이디)를 제공합니다.
Author	한국사학진흥재단
URL	https://www.data.go.kr/data/15067219/fileData.do

Alerts

메시지아이디 has unique values Unique

Reproduction

Analysis started	2023-12-11 23:22:50.017341
Analysis finished	2023-12-11 23:22:50.230284
Duration	0.21 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

메시지아이디
Text

UNIQUE

Distinct	4993
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	39.1 KiB

Length

Max length	12
Median length	11
Mean length	7.7810935
Min length	3

Characters and Unicode

Total characters	38851
Distinct characters	58
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	4993 ?
Unique (%)	100.0%

Sample

1st row	YMHan1201
2nd row	dsthih
3rd row	elee2060
4th row	amls01
5th row	hjfreea

Value	Count	Frequency (%)
andrei	2	< 0.1%
csrcha	2	< 0.1%
fortress17	1	< 0.1%
snshin33	1	< 0.1%
brang1123	1	< 0.1%
csleere	1	< 0.1%
penderah	1	< 0.1%
jhkim3273	1	< 0.1%
try2000	1	< 0.1%
yhl05050	1	< 0.1%
Other values (4981)	4981	99.8%

Most occurring characters

Value	Count	Frequency (%)
n	2136	5.5%
o	2105	5.4%
a	2101	5.4%
s	2071	5.3%
e	1974	5.1%
0	1937	5.0%
1	1799	4.6%
i	1693	4.4%
k	1627	4.2%
h	1553	4.0%
Other values (48)	19855	51.1%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	27918	71.9%
Decimal Number	10856	27.9%
Uppercase Letter	76	0.2%
Connector Punctuation	1	< 0.1%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
n	2136	7.7%
o	2105	7.5%
a	2101	7.5%
s	2071	7.4%
e	1974	7.1%
i	1693	6.1%
k	1627	5.8%
h	1553	5.6%
j	1177	4.2%
m	1175	4.2%
Other values (16)	10306	36.9%

Uppercase Letter

Value	Count	Frequency (%)
H	9	11.8%
A	7	9.2%
J	5	6.6%
Y	5	6.6%
C	4	5.3%
M	4	5.3%
I	4	5.3%
K	4	5.3%
S	4	5.3%
T	4	5.3%
Other values (11)	26	34.2%

Decimal Number

Value	Count	Frequency (%)
0	1937	17.8%
1	1799	16.6%
2	1406	13.0%
7	1009	9.3%
3	926	8.5%
9	850	7.8%
8	790	7.3%
5	788	7.3%
4	707	6.5%
6	644	5.9%

Connector Punctuation

Value	Count	Frequency (%)
_	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	27994	72.1%
Common	10857	27.9%

Most frequent character per script

Latin

Value	Count	Frequency (%)
n	2136	7.6%
o	2105	7.5%
a	2101	7.5%
s	2071	7.4%
e	1974	7.1%
i	1693	6.0%
k	1627	5.8%
h	1553	5.5%
j	1177	4.2%
m	1175	4.2%
Other values (37)	10382	37.1%

Common

Value	Count	Frequency (%)
0	1937	17.8%
1	1799	16.6%
2	1406	13.0%
7	1009	9.3%
3	926	8.5%
9	850	7.8%
8	790	7.3%
5	788	7.3%
4	707	6.5%
6	644	5.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	38851	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
n	2136	5.5%
o	2105	5.4%
a	2101	5.4%
s	2071	5.3%
e	1974	5.1%
0	1937	5.0%
1	1799	4.6%
i	1693	4.4%
k	1627	4.2%
h	1553	4.0%
Other values (48)	19855	51.1%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	메시지아이디
0	YMHan1201
1	dsthih
2	elee2060
3	amls01
4	hjfreea
5	romane
6	longli22
7	a0075a
8	jhs123123
9	kmj2047

	메시지아이디
4983	ksl8787
4984	bjh2413
4985	simongsp
4986	tm0880
4987	dlcktodi
4988	nagnehoon
4989	kuis2007
4990	todana
4991	yhpark00
4992	sumi2223

Overview

Variables