gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	29
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.0 KiB
Average record size in memory	36.6 B

Variable types

Text	3
Categorical	1

Dataset

Description	국립낙동강생물자원관의 부서정보입니다. 해당 데이터에는 부서명, 부서영문명, 팩스번호 데이터 추출일 정보가 포함되어 있습니다.
Author	국립낙동강생물자원관
URL	https://www.data.go.kr/data/15039048/fileData.do

Alerts

`데이터 기준일` has constant value ""	Constant
`부서명` has unique values	Unique

Reproduction

Analysis started	2023-12-12 22:48:22.374279
Analysis finished	2023-12-12 22:48:22.694207
Duration	0.32 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

부서명
Text

UNIQUE

Distinct	29
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	364.0 B

Length

Max length	8
Median length	7
Mean length	5.4827586
Min length	3

Characters and Unicode

Total characters	159
Distinct characters	63
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	29 ?
Unique (%)	100.0%

Sample

1st row	경영관리본부
2nd row	담수생물연구본부
3rd row	전략기획실
4th row	기획부
5th row	혁신성과부

Value	Count	Frequency (%)
경영관리본부	1	3.4%
식물연구팀	1	3.4%
소재상용화연구팀	1	3.4%
산업화지원팀	1	3.4%
산업화지원센터	1	3.4%
생물정보팀	1	3.4%
자원은행팀	1	3.4%
자원은행정보실	1	3.4%
균류연구팀	1	3.4%
원생생물연구팀	1	3.4%
Other values (19)	19	65.5%

Most occurring characters

Value	Count	Frequency (%)
연	11	6.9%
팀	11	6.9%
구	11	6.9%
부	10	6.3%
물	9	5.7%
실	7	4.4%
생	7	4.4%
원	6	3.8%
전	4	2.5%
시	3	1.9%
Other values (53)	80	50.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	159	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
연	11	6.9%
팀	11	6.9%
구	11	6.9%
부	10	6.3%
물	9	5.7%
실	7	4.4%
생	7	4.4%
원	6	3.8%
전	4	2.5%
시	3	1.9%
Other values (53)	80	50.3%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	159	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
연	11	6.9%
팀	11	6.9%
구	11	6.9%
부	10	6.3%
물	9	5.7%
실	7	4.4%
생	7	4.4%
원	6	3.8%
전	4	2.5%
시	3	1.9%
Other values (53)	80	50.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	159	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
연	11	6.9%
팀	11	6.9%
구	11	6.9%
부	10	6.3%
물	9	5.7%
실	7	4.4%
생	7	4.4%
원	6	3.8%
전	4	2.5%
시	3	1.9%
Other values (53)	80	50.3%

부서영문명
Text

Distinct	28
Distinct (%)	96.6%
Missing	0
Missing (%)	0.0%
Memory size	364.0 B

Length

Max length	57
Median length	36
Mean length	31.241379
Min length	17

Characters and Unicode

Total characters	906
Distinct characters	38
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	27 ?
Unique (%)	93.1%

Sample

1st row	Administrative Management Office
2nd row	Freshwater Bioresources Research Office
3rd row	Strategic Planning Department
4th row	Planning Division
5th row	Performance Management Division

Value	Count	Frequency (%)
division	12	12.1%
research	12	12.1%
team	9	9.1%
department	6	6.1%
	6	6.1%
bioresources	5	5.1%
management	5	5.1%
animal	2	2.0%
support	2	2.0%
industrialization	2	2.0%
Other values (28)	38	38.4%

Most occurring characters

Value	Count	Frequency (%)
i	97	10.7%
e	90	9.9%
	74	8.2%
n	72	7.9%
a	69	7.6%
o	62	6.8%
t	53	5.8%
r	52	5.7%
s	43	4.7%
c	37	4.1%
Other values (28)	257	28.4%

Most occurring categories

Value	Count	Frequency (%)
Lowercase Letter	734	81.0%
Uppercase Letter	92	10.2%
Space Separator	74	8.2%
Other Punctuation	6	0.7%

Most frequent character per category

Lowercase Letter

Value	Count	Frequency (%)
i	97	13.2%
e	90	12.3%
n	72	9.8%
a	69	9.4%
o	62	8.4%
t	53	7.2%
r	52	7.1%
s	43	5.9%
c	37	5.0%
m	32	4.4%
Other values (13)	127	17.3%

Uppercase Letter

Value	Count	Frequency (%)
D	18	19.6%
R	13	14.1%
T	10	10.9%
B	9	9.8%
M	7	7.6%
P	7	7.6%
A	6	6.5%
E	6	6.5%
I	4	4.3%
F	4	4.3%
Other values (3)	8	8.7%

Space Separator

Value	Count	Frequency (%)
	74	100.0%

Other Punctuation

Value	Count	Frequency (%)
&	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	826	91.2%
Common	80	8.8%

Most frequent character per script

Latin

Value	Count	Frequency (%)
i	97	11.7%
e	90	10.9%
n	72	8.7%
a	69	8.4%
o	62	7.5%
t	53	6.4%
r	52	6.3%
s	43	5.2%
c	37	4.5%
m	32	3.9%
Other values (26)	219	26.5%

Common

Value	Count	Frequency (%)
	74	92.5%
&	6	7.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	906	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
i	97	10.7%
e	90	9.9%
	74	8.2%
n	72	7.9%
a	69	7.6%
o	62	6.8%
t	53	5.8%
r	52	5.7%
s	43	4.7%
c	37	4.1%
Other values (28)	257	28.4%

팩스번호
Text

Distinct	19
Distinct (%)	65.5%
Missing	0
Missing (%)	0.0%
Memory size	364.0 B

Length

Max length	12
Median length	12
Mean length	12
Min length	12

Characters and Unicode

Total characters	348
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	11 ?
Unique (%)	37.9%

Sample

1st row	054-530-0709
2nd row	054-530-0709
3rd row	054-530-0719
4th row	054-530-0719
5th row	054-530-0729

Value	Count	Frequency (%)
054-530-0829	3	10.3%
054-530-0889	3	10.3%
054-530-0779	2	6.9%
054-530-0899	2	6.9%
054-530-0869	2	6.9%
054-530-0719	2	6.9%
054-530-0709	2	6.9%
054-530-0739	2	6.9%
054-530-0799	1	3.4%
054-530-0769	1	3.4%
Other values (9)	9	31.0%

Most occurring characters

Value	Count	Frequency (%)
0	90	25.9%
5	60	17.2%
-	58	16.7%
9	34	9.8%
3	32	9.2%
4	31	8.9%
8	17	4.9%
7	16	4.6%
2	4	1.1%
6	3	0.9%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	290	83.3%
Dash Punctuation	58	16.7%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	90	31.0%
5	60	20.7%
9	34	11.7%
3	32	11.0%
4	31	10.7%
8	17	5.9%
7	16	5.5%
2	4	1.4%
6	3	1.0%
1	3	1.0%

Dash Punctuation

Value	Count	Frequency (%)
-	58	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	348	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	90	25.9%
5	60	17.2%
-	58	16.7%
9	34	9.8%
3	32	9.2%
4	31	8.9%
8	17	4.9%
7	16	4.6%
2	4	1.1%
6	3	0.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	348	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	90	25.9%
5	60	17.2%
-	58	16.7%
9	34	9.8%
3	32	9.2%
4	31	8.9%
8	17	4.9%
7	16	4.6%
2	4	1.1%
6	3	0.9%

데이터 기준일
Categorical

CONSTANT

Distinct	1
Distinct (%)	3.4%
Missing	0
Missing (%)	0.0%
Memory size	364.0 B

2022-10-28	29

Length

Max length	10
Median length	10
Mean length	10
Min length	10

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2022-10-28
2nd row	2022-10-28
3rd row	2022-10-28
4th row	2022-10-28
5th row	2022-10-28

Common Values

Value	Count	Frequency (%)
2022-10-28	29	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
2022-10-28	29	100.0%

Phik (φk)

Heatmap
Table

	부서명	부서영문명	팩스번호
부서명	1.000	1.000	1.000
부서영문명	1.000	1.000	0.948
팩스번호	1.000	0.948	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	부서명	부서영문명	팩스번호	데이터 기준일
0	경영관리본부	Administrative Management Office	054-530-0709	2022-10-28
1	담수생물연구본부	Freshwater Bioresources Research Office	054-530-0709	2022-10-28
2	전략기획실	Strategic Planning Department	054-530-0719	2022-10-28
3	기획부	Planning Division	054-530-0719	2022-10-28
4	혁신성과부	Performance Management Division	054-530-0729	2022-10-28
5	경영관리실	Administrative Management Department	054-530-0739	2022-10-28
6	인사총무부	Performance Management Division	054-530-0739	2022-10-28
7	재무회계부	Finance & Accounting Division	054-530-0749	2022-10-28
8	시설안전부	Facilities Management Division	054-530-0759	2022-10-28
9	전시교육실	Exhibition & Education Department	054-530-0779	2022-10-28

	부서명	부서영문명	팩스번호	데이터 기준일
19	환경미생물연구팀	Environmental Microbiology Research Team	054-530-0879	2022-10-28
20	원생생물연구팀	Protozoan Research Team	054-530-0849	2022-10-28
21	균류연구팀	Fungi Research Team	054-530-0859	2022-10-28
22	자원은행정보실	Bioresources Collection & Information Technology Division	054-530-0899	2022-10-28
23	자원은행팀	Bioresources Collection & Research Division	054-530-0899	2022-10-28
24	생물정보팀	Bioinformation technology Team	054-530-0909	2022-10-28
25	산업화지원센터	Bioresources Industrialization Support Department	054-530-0889	2022-10-28
26	산업화지원팀	Bioresources Industrialization Support Division	054-530-0889	2022-10-28
27	소재상용화연구팀	Biomaterial Commercialization Research Team	054-530-0889	2022-10-28
28	감사실	Audit & Inspection Division	054-530-0919	2022-10-28

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Most occurring characters

Most occurring categories

Most frequent character per category

Lowercase Letter

Uppercase Letter

Space Separator

Other Punctuation

Most occurring scripts

Most frequent character per script

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Correlations

Missing values

Sample