gimi9 Pandas Profiling

Dataset statistics

Number of variables	7
Number of observations	10000
Missing cells	13
Missing cells (%)	< 0.1%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	634.8 KiB
Average record size in memory	65.0 B

Variable types

Numeric	1
Text	5
DateTime	1

Dataset

Description	송파어린이영어도서관 보유 영어도서 목록에 대한 등록번호, 청구기호, 도서명, 저자명, 발행자명, 기준일자 데이터를 제공합니다.
Author	서울특별시 송파구
URL	https://www.data.go.kr/data/15112402/fileData.do

Alerts

`기준일자` has constant value ""	Constant
`연번` has unique values	Unique
`등록번호` has unique values	Unique

Reproduction

Analysis started	2023-12-12 22:29:44.286753
Analysis finished	2023-12-12 22:29:45.940828
Duration	1.65 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

연번
Real number (ℝ)

UNIQUE

Distinct	10000
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	14764.645

Minimum	5
Maximum	29590
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	166.0 KiB

Quantile statistics

Minimum	5
5-th percentile	1470.8
Q1	7343.75
median	14752
Q3	22115.5
95-th percentile	28166.1
Maximum	29590
Range	29585
Interquartile range (IQR)	14771.75

Descriptive statistics

Standard deviation	8549.954
Coefficient of variation (CV)	0.57908292
Kurtosis	-1.2013744
Mean	14764.645
Median Absolute Deviation (MAD)	7392.5
Skewness	0.0081907472
Sum	1.4764645 × 10⁸
Variance	73101713
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
13340	1	< 0.1%
4128	1	< 0.1%
24503	1	< 0.1%
10216	1	< 0.1%
12840	1	< 0.1%
24700	1	< 0.1%
19300	1	< 0.1%
7331	1	< 0.1%
1563	1	< 0.1%
20206	1	< 0.1%
Other values (9990)	9990	99.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
5	1	< 0.1%
12	1	< 0.1%
21	1	< 0.1%
26	1	< 0.1%
31	1	< 0.1%
33	1	< 0.1%
35	1	< 0.1%
37	1	< 0.1%
38	1	< 0.1%
39	1	< 0.1%

Value	Count	Frequency (%)
29590	1	< 0.1%
29589	1	< 0.1%
29587	1	< 0.1%
29585	1	< 0.1%
29584	1	< 0.1%
29576	1	< 0.1%
29573	1	< 0.1%
29571	1	< 0.1%
29570	1	< 0.1%
29560	1	< 0.1%

등록번호
Text

UNIQUE

Distinct	10000
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	12
Median length	12
Mean length	12
Min length	12

Characters and Unicode

Total characters	120000
Distinct characters	15
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	10000 ?
Unique (%)	100.0%

Sample

1st row	IM0000016815
2nd row	IM0000008302
3rd row	IM0000023762
4th row	IM0000031052
5th row	IM0000017023

Value	Count	Frequency (%)
im0000016815	1	< 0.1%
im0000008316	1	< 0.1%
im0000007152	1	< 0.1%
im0000010448	1	< 0.1%
im0000026286	1	< 0.1%
im0000012642	1	< 0.1%
im0000018054	1	< 0.1%
im0000026171	1	< 0.1%
xs0000000731	1	< 0.1%
im0000003596	1	< 0.1%
Other values (9990)	9990	99.9%

Most occurring characters

Value	Count	Frequency (%)
0	58274	48.6%
M	9369	7.8%
I	9066	7.6%
1	7325	6.1%
2	6928	5.8%
3	4528	3.8%
5	3918	3.3%
8	3880	3.2%
7	3874	3.2%
9	3801	3.2%
Other values (5)	9037	7.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	100000	83.3%
Uppercase Letter	20000	16.7%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	58274	58.3%
1	7325	7.3%
2	6928	6.9%
3	4528	4.5%
5	3918	3.9%
8	3880	3.9%
7	3874	3.9%
9	3801	3.8%
6	3743	3.7%
4	3729	3.7%

Uppercase Letter

Value	Count	Frequency (%)
M	9369	46.8%
I	9066	45.3%
X	631	3.2%
S	631	3.2%
H	303	1.5%

Most occurring scripts

Value	Count	Frequency (%)
Common	100000	83.3%
Latin	20000	16.7%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	58274	58.3%
1	7325	7.3%
2	6928	6.9%
3	4528	4.5%
5	3918	3.9%
8	3880	3.9%
7	3874	3.9%
9	3801	3.8%
6	3743	3.7%
4	3729	3.7%

Latin

Value	Count	Frequency (%)
M	9369	46.8%
I	9066	45.3%
X	631	3.2%
S	631	3.2%
H	303	1.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	120000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	58274	48.6%
M	9369	7.8%
I	9066	7.6%
1	7325	6.1%
2	6928	5.8%
3	4528	3.8%
5	3918	3.3%
8	3880	3.2%
7	3874	3.2%
9	3801	3.2%
Other values (5)	9037	7.5%

청구기호
Text

Distinct	9282
Distinct (%)	92.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	25
Mean length	16.2113
Min length	11

Characters and Unicode

Total characters	162113
Distinct characters	88
Distinct categories	9 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	8921 ?
Unique (%)	89.2%

Sample

1st row	AU 843.5-W651f=2
2nd row	AU 843.5-B966wb
3rd row	BET 843.5-J12c=2
4th row	AU 843.6-C278v-el
5th row	GR 843-S368l-L1-ab

Value	Count	Frequency (%)
gr	2005	10.0%
ch	1766	8.8%
bet	1691	8.4%
nf	867	4.3%
au	793	4.0%
pk	656	3.3%
lil	583	2.9%
grl	550	2.7%
wds	383	1.9%
fav	287	1.4%
Other values (9234)	10488	52.3%

Most occurring characters

Value	Count	Frequency (%)
-	16643	10.3%
8	12915	8.0%
3	11749	7.2%
4	11699	7.2%
	10069	6.2%
6	8970	5.5%
.	7149	4.4%
2	5664	3.5%
1	5093	3.1%
5	4947	3.1%
Other values (78)	67215	41.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	72014	44.4%
Uppercase Letter	37230	23.0%
Lowercase Letter	17028	10.5%
Dash Punctuation	16643	10.3%
Space Separator	10069	6.2%
Other Punctuation	7157	4.4%
Math Symbol	1923	1.2%
Other Letter	34	< 0.1%
Connector Punctuation	15	< 0.1%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
R	3975	10.7%
G	3210	8.6%
B	2832	7.6%
C	2727	7.3%
H	2560	6.9%
S	2439	6.6%
T	2317	6.2%
L	2292	6.2%
E	2071	5.6%
A	1777	4.8%
Other values (16)	11030	29.6%

Lowercase Letter

Value	Count	Frequency (%)
s	1218	7.2%
a	1201	7.1%
m	1078	6.3%
t	982	5.8%
h	972	5.7%
i	957	5.6%
e	957	5.6%
p	915	5.4%
l	871	5.1%
c	864	5.1%
Other values (16)	7013	41.2%

Other Letter

Value	Count	Frequency (%)
안	12	35.3%
영	3	8.8%
권	2	5.9%
전	2	5.9%
능	2	5.9%
손	1	2.9%
시	1	2.9%
이	1	2.9%
ㅣ	1	2.9%
ㄱ	1	2.9%
Other values (8)	8	23.5%

Decimal Number

Value	Count	Frequency (%)
8	12915	17.9%
3	11749	16.3%
4	11699	16.2%
6	8970	12.5%
2	5664	7.9%
1	5093	7.1%
5	4947	6.9%
7	4885	6.8%
9	4540	6.3%
0	1552	2.2%

Other Punctuation

Value	Count	Frequency (%)
.	7149	99.9%
,	7	0.1%
/	1	< 0.1%

Math Symbol

Value	Count	Frequency (%)
=	1851	96.3%
+	72	3.7%

Dash Punctuation

Value	Count	Frequency (%)
-	16643	100.0%

Space Separator

Value	Count	Frequency (%)
	10069	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	15	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	107821	66.5%
Latin	54258	33.5%
Hangul	34	< 0.1%

Most frequent character per script

Latin

Value	Count	Frequency (%)
R	3975	7.3%
G	3210	5.9%
B	2832	5.2%
C	2727	5.0%
H	2560	4.7%
S	2439	4.5%
T	2317	4.3%
L	2292	4.2%
E	2071	3.8%
A	1777	3.3%
Other values (42)	28058	51.7%

Common

Value	Count	Frequency (%)
-	16643	15.4%
8	12915	12.0%
3	11749	10.9%
4	11699	10.9%
	10069	9.3%
6	8970	8.3%
.	7149	6.6%
2	5664	5.3%
1	5093	4.7%
5	4947	4.6%
Other values (8)	12923	12.0%

Hangul

Value	Count	Frequency (%)
안	12	35.3%
영	3	8.8%
권	2	5.9%
전	2	5.9%
능	2	5.9%
손	1	2.9%
시	1	2.9%
이	1	2.9%
ㅣ	1	2.9%
ㄱ	1	2.9%
Other values (8)	8	23.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	162079	> 99.9%
Hangul	32	< 0.1%
Compat Jamo	2	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
-	16643	10.3%
8	12915	8.0%
3	11749	7.2%
4	11699	7.2%
	10069	6.2%
6	8970	5.5%
.	7149	4.4%
2	5664	3.5%
1	5093	3.1%
5	4947	3.1%
Other values (60)	67181	41.4%

Hangul

Value	Count	Frequency (%)
안	12	37.5%
영	3	9.4%
권	2	6.2%
전	2	6.2%
능	2	6.2%
손	1	3.1%
시	1	3.1%
이	1	3.1%
길	1	3.1%
김	1	3.1%
Other values (6)	6	18.8%

Compat Jamo

Value	Count	Frequency (%)
ㅣ	1	50.0%
ㄱ	1	50.0%

도서명
Text

Distinct	9202
Distinct (%)	92.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	166
Median length	92
Mean length	26.6691
Min length	2

Characters and Unicode

Total characters	266691
Distinct characters	248
Distinct categories	13 ?
Distinct scripts	4 ?
Distinct blocks	6 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	8554 ?
Unique (%)	85.5%

Sample

1st row	Free Fall
2nd row	Where's Julius?
3rd row	(The) Caterpillar and the Polliwog
4th row	(The) Very Hungry Caterpillar Eats Lunch : A Colors Book
5th row	(Lego) City, All Aboard!

Value	Count	Frequency (%)
the	3980	8.3%
	1729	3.6%
and	1310	2.7%
a	1165	2.4%
of	1076	2.2%
to	507	1.1%
in	482	1.0%
book	341	0.7%
is	293	0.6%
my	280	0.6%
Other values (7728)	36732	76.7%

Most occurring characters

Value	Count	Frequency (%)
	38043	14.3%
e	23465	8.8%
o	15697	5.9%
a	15593	5.8%
t	13174	4.9%
r	13174	4.9%
n	12821	4.8%
i	12623	4.7%
s	11740	4.4%
h	9095	3.4%
Other values (238)	101266	38.0%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	179081	67.1%
Space Separator	38043	14.3%
Uppercase Letter	33497	12.6%
Other Punctuation	8108	3.0%
Close Punctuation	2587	1.0%
Open Punctuation	2587	1.0%
Decimal Number	1928	0.7%
Dash Punctuation	535	0.2%
Other Letter	270	0.1%
Math Symbol	48	< 0.1%
Other values (3)	7	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
리	7	2.6%
이	7	2.6%
아	6	2.2%
트	6	2.2%
는	5	1.9%
영	5	1.9%
어	5	1.9%
한	4	1.5%
의	4	1.5%
도	4	1.5%
Other values (145)	217	80.4%

Lowercase Letter

Value	Count	Frequency (%)
e	23465	13.1%
o	15697	8.8%
a	15593	8.7%
t	13174	7.4%
r	13174	7.4%
n	12821	7.2%
i	12623	7.0%
s	11740	6.6%
h	9095	5.1%
l	8238	4.6%
Other values (16)	43461	24.3%

Uppercase Letter

Value	Count	Frequency (%)
T	4062	12.1%
S	3048	9.1%
B	2425	7.2%
M	2310	6.9%
A	2304	6.9%
W	1914	5.7%
C	1907	5.7%
D	1676	5.0%
P	1676	5.0%
F	1583	4.7%
Other values (16)	10592	31.6%

Other Punctuation

Value	Count	Frequency (%)
,	1929	23.8%
.	1642	20.3%
:	1564	19.3%
'	1272	15.7%
!	1004	12.4%
?	455	5.6%
&	174	2.1%
;	18	0.2%
/	15	0.2%
"	12	0.1%
Other values (6)	23	0.3%

Decimal Number

Value	Count	Frequency (%)
1	512	26.6%
2	311	16.1%
3	259	13.4%
0	193	10.0%
4	166	8.6%
5	122	6.3%
6	103	5.3%
9	98	5.1%
7	83	4.3%
8	81	4.2%

Math Symbol

Value	Count	Frequency (%)
=	24	50.0%
+	19	39.6%
~	3	6.2%
>	1	2.1%
<	1	2.1%

Close Punctuation

Value	Count	Frequency (%)
)	2573	99.5%
]	14	0.5%

Open Punctuation

Value	Count	Frequency (%)
(	2573	99.5%
[	14	0.5%

Letter Number

Value	Count	Frequency (%)
Ⅱ	2	66.7%
Ⅴ	1	33.3%

Space Separator

Value	Count	Frequency (%)
	38043	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	535	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	2	100.0%

Modifier Symbol

Value	Count	Frequency (%)
`	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	212581	79.7%
Common	53840	20.2%
Hangul	268	0.1%
Han	2	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
리	7	2.6%
이	7	2.6%
아	6	2.2%
트	6	2.2%
는	5	1.9%
영	5	1.9%
어	5	1.9%
한	4	1.5%
의	4	1.5%
도	4	1.5%
Other values (143)	215	80.2%

Latin

Value	Count	Frequency (%)
e	23465	11.0%
o	15697	7.4%
a	15593	7.3%
t	13174	6.2%
r	13174	6.2%
n	12821	6.0%
i	12623	5.9%
s	11740	5.5%
h	9095	4.3%
l	8238	3.9%
Other values (44)	76961	36.2%

Common

Value	Count	Frequency (%)
	38043	70.7%
)	2573	4.8%
(	2573	4.8%
,	1929	3.6%
.	1642	3.0%
:	1564	2.9%
'	1272	2.4%
!	1004	1.9%
-	535	1.0%
1	512	1.0%
Other values (29)	2193	4.1%

Han

Value	Count	Frequency (%)
睡	1	50.0%
上	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	266404	99.9%
Hangul	268	0.1%
None	8	< 0.1%
Punctuation	6	< 0.1%
Number Forms	3	< 0.1%
CJK	2	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	38043	14.3%
e	23465	8.8%
o	15697	5.9%
a	15593	5.9%
t	13174	4.9%
r	13174	4.9%
n	12821	4.8%
i	12623	4.7%
s	11740	4.4%
h	9095	3.4%
Other values (76)	100979	37.9%

Hangul

Value	Count	Frequency (%)
리	7	2.6%
이	7	2.6%
아	6	2.2%
트	6	2.2%
는	5	1.9%
영	5	1.9%
어	5	1.9%
한	4	1.5%
의	4	1.5%
도	4	1.5%
Other values (143)	215	80.2%

None

Value	Count	Frequency (%)
·	5	62.5%
＆	2	25.0%
＇	1	12.5%

Punctuation

Value	Count	Frequency (%)
…	4	66.7%
’	2	33.3%

Number Forms

Value	Count	Frequency (%)
Ⅱ	2	66.7%
Ⅴ	1	33.3%

CJK

Value	Count	Frequency (%)
睡	1	50.0%
上	1	50.0%

저자명
Text

Distinct	5426
Distinct (%)	54.3%
Missing	11
Missing (%)	0.1%
Memory size	156.2 KiB

Length

Max length	237
Median length	147
Mean length	29.347282
Min length	1

Characters and Unicode

Total characters	293150
Distinct characters	170
Distinct categories	13 ?
Distinct scripts	4 ?
Distinct blocks	6 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	4077 ?
Unique (%)	40.8%

Sample

1st row	David Wiesner
2nd row	John Burningham
3rd row	by Jack Kent
4th row	Eric Carle
5th row	Quinlan B. Lee

Value	Count	Frequency (%)
	5312	11.0%
by	4594	9.5%
illustrated	3096	6.4%
david	329	0.7%
pictures	318	0.7%
alex	312	0.6%
hunt	292	0.6%
roderick	283	0.6%
brychta	278	0.6%
john	241	0.5%
Other values (6245)	33398	68.9%

Most occurring characters

Value	Count	Frequency (%)
	38776	13.2%
e	24067	8.2%
a	22259	7.6%
r	18437	6.3%
l	17417	5.9%
i	16618	5.7%
t	16090	5.5%
n	15976	5.4%
o	11504	3.9%
s	11450	3.9%
Other values (160)	100556	34.3%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	212433	72.5%
Space Separator	38776	13.2%
Uppercase Letter	34187	11.7%
Other Punctuation	7061	2.4%
Other Letter	412	0.1%
Dash Punctuation	159	0.1%
Open Punctuation	58	< 0.1%
Close Punctuation	56	< 0.1%
Final Punctuation	3	< 0.1%
Math Symbol	2	< 0.1%
Other values (3)	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
수	24	5.8%
김	19	4.6%
지	17	4.1%
음	15	3.6%
그	15	3.6%
림	15	3.6%
희	13	3.2%
안	12	2.9%
감	12	2.9%
형	12	2.9%
Other values (85)	258	62.6%

Lowercase Letter

Value	Count	Frequency (%)
e	24067	11.3%
a	22259	10.5%
r	18437	8.7%
l	17417	8.2%
i	16618	7.8%
t	16090	7.6%
n	15976	7.5%
o	11504	5.4%
s	11450	5.4%
y	9497	4.5%
Other values (17)	49118	23.1%

Uppercase Letter

Value	Count	Frequency (%)
M	3084	9.0%
S	2911	8.5%
B	2699	7.9%
J	2386	7.0%
A	2274	6.7%
D	2219	6.5%
C	2155	6.3%
R	2012	5.9%
L	1982	5.8%
H	1736	5.1%
Other values (16)	10729	31.4%

Other Punctuation

Value	Count	Frequency (%)
;	5246	74.3%
.	1139	16.1%
,	436	6.2%
'	130	1.8%
&	52	0.7%
:	24	0.3%
?	21	0.3%
/	9	0.1%
＆	2	< 0.1%
＇	2	< 0.1%

Open Punctuation

Value	Count	Frequency (%)
[	49	84.5%
(	9	15.5%

Close Punctuation

Value	Count	Frequency (%)
]	47	83.9%
)	9	16.1%

Math Symbol

Value	Count	Frequency (%)
>	1	50.0%
<	1	50.0%

Space Separator

Value	Count	Frequency (%)
	38776	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	159	100.0%

Final Punctuation

Value	Count	Frequency (%)
’	3	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅲ	1	100.0%

Modifier Symbol

Value	Count	Frequency (%)
`	1	100.0%

Decimal Number

Value	Count	Frequency (%)
2	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	246621	84.1%
Common	46117	15.7%
Hangul	409	0.1%
Han	3	< 0.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
수	24	5.9%
김	19	4.6%
지	17	4.2%
음	15	3.7%
그	15	3.7%
림	15	3.7%
희	13	3.2%
안	12	2.9%
감	12	2.9%
형	12	2.9%
Other values (82)	255	62.3%

Latin

Value	Count	Frequency (%)
e	24067	9.8%
a	22259	9.0%
r	18437	7.5%
l	17417	7.1%
i	16618	6.7%
t	16090	6.5%
n	15976	6.5%
o	11504	4.7%
s	11450	4.6%
y	9497	3.9%
Other values (44)	83306	33.8%

Common

Value	Count	Frequency (%)
	38776	84.1%
;	5246	11.4%
.	1139	2.5%
,	436	0.9%
-	159	0.3%
'	130	0.3%
&	52	0.1%
[	49	0.1%
]	47	0.1%
:	24	0.1%
Other values (11)	59	0.1%

Han

Value	Count	Frequency (%)
丁	1	33.3%
姬	1	33.3%
童	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	292727	99.9%
Hangul	409	0.1%
None	7	< 0.1%
Punctuation	3	< 0.1%
CJK	3	< 0.1%
Number Forms	1	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	38776	13.2%
e	24067	8.2%
a	22259	7.6%
r	18437	6.3%
l	17417	5.9%
i	16618	5.7%
t	16090	5.5%
n	15976	5.5%
o	11504	3.9%
s	11450	3.9%
Other values (60)	100133	34.2%

Hangul

Value	Count	Frequency (%)
수	24	5.9%
김	19	4.6%
지	17	4.2%
음	15	3.7%
그	15	3.7%
림	15	3.7%
희	13	3.2%
안	12	2.9%
감	12	2.9%
형	12	2.9%
Other values (82)	255	62.3%

Punctuation

Value	Count	Frequency (%)
’	3	100.0%

None

Value	Count	Frequency (%)
ø	3	42.9%
＆	2	28.6%
＇	2	28.6%

Number Forms

Value	Count	Frequency (%)
Ⅲ	1	100.0%

CJK

Value	Count	Frequency (%)
丁	1	33.3%
姬	1	33.3%
童	1	33.3%

발행자명
Text

Distinct	1277
Distinct (%)	12.8%
Missing	2
Missing (%)	< 0.1%
Memory size	156.2 KiB

Length

Max length	77
Median length	47
Mean length	14.493399
Min length	1

Characters and Unicode

Total characters	144905
Distinct characters	137
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	666 ?
Unique (%)	6.7%

Sample

1st row	HarperCollins
2nd row	Red fox
3rd row	Simon & Schuster
4th row	Penguin Young Readers
5th row	Scholastic

Value	Count	Frequency (%)
books	2125	10.5%
scholastic	1316	6.5%
press	945	4.7%
house	465	2.3%
	463	2.3%
oxford	459	2.3%
puffin	450	2.2%
random	407	2.0%
university	404	2.0%
harpercollins	350	1.7%
Other values (956)	12928	63.6%

Most occurring characters

Value	Count	Frequency (%)
o	13702	9.5%
s	10623	7.3%
	10340	7.1%
r	9593	6.6%
e	9016	6.2%
i	8753	6.0%
n	8157	5.6%
a	7464	5.2%
l	6973	4.8%
t	4935	3.4%
Other values (127)	55349	38.2%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	110990	76.6%
Uppercase Letter	21130	14.6%
Space Separator	10340	7.1%
Other Punctuation	1691	1.2%
Other Letter	656	0.5%
Dash Punctuation	28	< 0.1%
Close Punctuation	22	< 0.1%
Open Punctuation	22	< 0.1%
Math Symbol	15	< 0.1%
Decimal Number	10	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
스	81	12.3%
사	69	10.5%
어	68	10.4%
리	66	10.1%
국	66	10.1%
외	65	9.9%
플	65	9.9%
애	65	9.9%
북	16	2.4%
다	13	2.0%
Other values (52)	82	12.5%

Lowercase Letter

Value	Count	Frequency (%)
o	13702	12.3%
s	10623	9.6%
r	9593	8.6%
e	9016	8.1%
i	8753	7.9%
n	8157	7.3%
a	7464	6.7%
l	6973	6.3%
t	4935	4.4%
c	4477	4.0%
Other values (16)	27297	24.6%

Uppercase Letter

Value	Count	Frequency (%)
B	2741	13.0%
P	2577	12.2%
S	2448	11.6%
H	2161	10.2%
C	1959	9.3%
R	1046	5.0%
M	825	3.9%
O	767	3.6%
A	756	3.6%
D	752	3.6%
Other values (16)	5098	24.1%

Other Punctuation

Value	Count	Frequency (%)
&	485	28.7%
'	342	20.2%
.	308	18.2%
:	297	17.6%
,	181	10.7%
＆	24	1.4%
/	14	0.8%
·	13	0.8%
!	12	0.7%
;	11	0.7%

Decimal Number

Value	Count	Frequency (%)
6	3	30.0%
3	3	30.0%
0	3	30.0%
4	1	10.0%

Close Punctuation

Value	Count	Frequency (%)
)	21	95.5%
]	1	4.5%

Open Punctuation

Value	Count	Frequency (%)
(	21	95.5%
[	1	4.5%

Space Separator

Value	Count	Frequency (%)
	10340	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	28	100.0%

Math Symbol

Value	Count	Frequency (%)
+	15	100.0%

Currency Symbol

Value	Count	Frequency (%)
$	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	132120	91.2%
Common	12129	8.4%
Hangul	656	0.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
스	81	12.3%
사	69	10.5%
어	68	10.4%
리	66	10.1%
국	66	10.1%
외	65	9.9%
플	65	9.9%
애	65	9.9%
북	16	2.4%
다	13	2.0%
Other values (52)	82	12.5%

Latin

Value	Count	Frequency (%)
o	13702	10.4%
s	10623	8.0%
r	9593	7.3%
e	9016	6.8%
i	8753	6.6%
n	8157	6.2%
a	7464	5.6%
l	6973	5.3%
t	4935	3.7%
c	4477	3.4%
Other values (42)	48427	36.7%

Common

Value	Count	Frequency (%)
	10340	85.3%
&	485	4.0%
'	342	2.8%
.	308	2.5%
:	297	2.4%
,	181	1.5%
-	28	0.2%
＆	24	0.2%
)	21	0.2%
(	21	0.2%
Other values (13)	82	0.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	144208	99.5%
Hangul	656	0.5%
None	41	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
o	13702	9.5%
s	10623	7.4%
	10340	7.2%
r	9593	6.7%
e	9016	6.3%
i	8753	6.1%
n	8157	5.7%
a	7464	5.2%
l	6973	4.8%
t	4935	3.4%
Other values (62)	54652	37.9%

Hangul

Value	Count	Frequency (%)
스	81	12.3%
사	69	10.5%
어	68	10.4%
리	66	10.1%
국	66	10.1%
외	65	9.9%
플	65	9.9%
애	65	9.9%
북	16	2.4%
다	13	2.0%
Other values (52)	82	12.5%

None

Value	Count	Frequency (%)
＆	24	58.5%
·	13	31.7%
＇	4	9.8%

기준일자
Date

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2023-02-17 00:00:00
Maximum	2023-02-17 00:00:00

Histogram

Histogram with fixed size bins (bins=1)

연번

연번

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	연번	등록번호	청구기호	도서명	저자명	발행자명	기준일자
13339	13340	IM0000016815	AU 843.5-W651f=2	Free Fall	David Wiesner	HarperCollins	2023-02-17
7254	7255	IM0000008302	AU 843.5-B966wb	Where's Julius?	John Burningham	Red fox	2023-02-17
21278	21279	IM0000023762	BET 843.5-J12c=2	(The) Caterpillar and the Polliwog	by Jack Kent	Simon & Schuster	2023-02-17
27401	27402	IM0000031052	AU 843.6-C278v-el	(The) Very Hungry Caterpillar Eats Lunch : A Colors Book	Eric Carle	Penguin Young Readers	2023-02-17
13544	13545	IM0000017023	GR 843-S368l-L1-ab	(Lego) City, All Aboard!	Quinlan B. Lee	Scholastic	2023-02-17
17363	17364	IM0000020710	BET 843.6-D276l	(The) Legend of Rock Paper Scissors	Drew Daywalt ; illustrated by Adam Rex	Balzer + Bray	2023-02-17
2925	2926	IM0000001889	GRL 744-H313r-S	Red Balloons, The	Cindy Harris	HMH	2023-02-17
14822	14823	IM0000015936	NF 408-H293l-LFS 1=2	(A) Nest Full of Eggs	Priscilla Belz Jenkins ; illustrated by Lizzy Rockwell	HarperCollins	2023-02-17
12012	12013	IM0000014770	CH 808.9-S838c-1=3	Classic starts: the adventures of Huckleberry Finn	Mark Twain ; original by Oliver Ho ; Illustrated by Dan Andreasen	Sterling	2023-02-17
14935	14936	IM0000015904	NF 408-N277sl-Kids-Pr	Sleep, Bear!	Shelby Alinsky	National Geographic	2023-02-17

	연번	등록번호	청구기호	도서명	저자명	발행자명	기준일자
20495	20496	IM0000022375	CH 843.6-K14g-4=2	Go Girl!. 4, (The) New Girl	Rowan Mcauley	Square Fish	2023-02-17
1231	1232	IM0000002138	GRL 556-K192d-A	Drive Toward The Future	Ann Kaske	Scholastic	2023-02-17
4916	4917	IM0000007248	AU 843.6-C278wh-ba=2	Baby Bear, Baby Bear, What do you See?	Bill Martin ; pictures by Eric Carle	Puffin Books	2023-02-17
354	355	IM0000000736	GR 843.6-G878m-All2	Martin luther king, jr. and the march on washington	Frances E. ruffin	Scholastic	2023-02-17
21907	21908	IM0000023483	CH 843.6-K94m-8	Magic Bone. 8, Rootin' Tootin' Cow Dog	Nancy Krulik ; illustrated by Sevastien Braun	Grosset ＆ Dunlap	2023-02-17
4348	4349	IM0000007362	AU 843.5-D419m	Tomie dePaola's more mother goose favorites	Tomie DePaola	Grosset & Dunlap	2023-02-17
28097	28098	IM0000030522	BET 843.6-B885d	(A) Dark Dark Tale	Ruth Brown	TWOPONDS	2023-02-17
4025	4026	IM0000003581	GRL 082-R813p-S	(The)panda bear	Deborah Chilek	Rosen Pub.	2023-02-17
5322	5323	IM0000002981	GR 843.6-G878l-All3	Lightning : it's electrifying	Jennifer Dussling ; Lori Osiecki	Grosset & Dunlap	2023-02-17
146	147	IM0000001636	GRL 843-M216w-S	When the king rides by	Margaret Mahy ; Betina Ogden	Mondo	2023-02-17

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Uppercase Letter

Lowercase Letter

Other Letter

Decimal Number

Other Punctuation

Math Symbol

Dash Punctuation

Space Separator

Connector Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Hangul

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Compat Jamo

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Lowercase Letter

Uppercase Letter

Other Punctuation

Decimal Number

Math Symbol

Close Punctuation

Open Punctuation

Letter Number

Space Separator

Dash Punctuation

Final Punctuation

Modifier Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Han

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Punctuation

Number Forms

CJK

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Lowercase Letter

Uppercase Letter

Other Punctuation

Open Punctuation

Close Punctuation

Math Symbol

Space Separator

Dash Punctuation

Final Punctuation

Letter Number