gimi9 Pandas Profiling

Dataset statistics

Number of variables	1
Number of observations	128
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	6
Duplicate rows (%)	4.7%
Total size in memory	1.1 KiB
Average record size in memory	9.0 B

Variable types

Text	1

Dataset

Description	전라북도탄소산업연관기업현황
Author	전라북도
URL	https://www.bigdatahub.go.kr/opendata/dataSet/detail.nm?contentId=37&rlik=49451aebf056b486&serviceId=202132

Alerts

Dataset has 6 (4.7%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-03-14 02:48:01.979334
Analysis finished	2024-03-14 02:48:02.126573
Duration	0.15 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

전라북도 탄소산업 연관 기업 현황
Text

Distinct	116
Distinct (%)	90.6%
Missing	0
Missing (%)	0.0%
Memory size	1.1 KiB

Length

Max length	17
Median length	2
Mean length	2.2265625
Min length	1

Characters and Unicode

Total characters	285
Distinct characters	35
Distinct categories	7 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	110 ?
Unique (%)	85.9%

Sample

1st row	(‘16. 3. 16기준)
2nd row	;[1]
3rd row	연번
4th row	총계
5th row	1

Value	Count	Frequency (%)
	5	3.7%
연번	5	3.7%
3	4	3.0%
1	3	2.2%
2	3	2.2%
4	3	2.2%
5	2	1.5%
62	1	0.7%
63	1	0.7%
66	1	0.7%
Other values (106)	106	79.1%

Most occurring characters

Value	Count	Frequency (%)
1	32	11.2%
3	24	8.4%
6	23	8.1%
2	23	8.1%
4	23	8.1%
5	22	7.7%
7	20	7.0%
9	20	7.0%
8	20	7.0%
0	17	6.0%
Other values (25)	61	21.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	224	78.6%
Other Letter	27	9.5%
Other Punctuation	12	4.2%
Space Separator	9	3.2%
Open Punctuation	6	2.1%
Close Punctuation	6	2.1%
Initial Punctuation	1	0.4%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
번	5	18.5%
연	5	18.5%
기	2	7.4%
소	2	7.4%
적	1	3.7%
현	1	3.7%
체	1	3.7%
업	1	3.7%
심	1	3.7%
관	1	3.7%
Other values (7)	7	25.9%

Decimal Number

Value	Count	Frequency (%)
1	32	14.3%
3	24	10.7%
6	23	10.3%
2	23	10.3%
4	23	10.3%
5	22	9.8%
7	20	8.9%
9	20	8.9%
8	20	8.9%
0	17	7.6%

Other Punctuation

Value	Count	Frequency (%)
;	10	83.3%
.	2	16.7%

Open Punctuation

Value	Count	Frequency (%)
[	5	83.3%
(	1	16.7%

Close Punctuation

Value	Count	Frequency (%)
]	5	83.3%
)	1	16.7%

Space Separator

Value	Count	Frequency (%)
	9	100.0%

Initial Punctuation

Value	Count	Frequency (%)
‘	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	258	90.5%
Hangul	27	9.5%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	32	12.4%
3	24	9.3%
6	23	8.9%
2	23	8.9%
4	23	8.9%
5	22	8.5%
7	20	7.8%
9	20	7.8%
8	20	7.8%
0	17	6.6%
Other values (8)	34	13.2%

Hangul

Value	Count	Frequency (%)
번	5	18.5%
연	5	18.5%
기	2	7.4%
소	2	7.4%
적	1	3.7%
현	1	3.7%
체	1	3.7%
업	1	3.7%
심	1	3.7%
관	1	3.7%
Other values (7)	7	25.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	257	90.2%
Hangul	27	9.5%
Punctuation	1	0.4%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	32	12.5%
3	24	9.3%
6	23	8.9%
2	23	8.9%
4	23	8.9%
5	22	8.6%
7	20	7.8%
9	20	7.8%
8	20	7.8%
0	17	6.6%
Other values (7)	33	12.8%

Hangul

Value	Count	Frequency (%)
번	5	18.5%
연	5	18.5%
기	2	7.4%
소	2	7.4%
적	1	3.7%
현	1	3.7%
체	1	3.7%
업	1	3.7%
심	1	3.7%
관	1	3.7%
Other values (7)	7	25.9%

Punctuation

Value	Count	Frequency (%)
‘	1	100.0%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	전라북도 탄소산업 연관 기업 현황
0	(‘16. 3. 16기준)
1	;[1]
2	연번
3	총계
4	1
5	2
6	3
7	4
8	5
9	6

	전라북도 탄소산업 연관 기업 현황
118	106
119	;
120	탄소소재 적용 관심 기업체 현황
121	;[5]
122	연번
123	1
124	2
125	3
126	4
127	;

Most frequently occurring

	전라북도 탄소산업 연관 기업 현황	# duplicates
4	;	5
5	연번	5
0	1	2
1	2	2
2	3	2
3	4	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Other Punctuation

Open Punctuation

Close Punctuation

Space Separator

Initial Punctuation

Most occurring scripts

Most frequent character per script

Common

Hangul

Most occurring blocks

Most frequent character per block

ASCII

Hangul

Punctuation

Missing values

Sample

Duplicate rows

Most frequently occurring