gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	500
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	124
Duplicate rows (%)	24.8%
Total size in memory	12.3 KiB
Average record size in memory	25.3 B

Variable types

Categorical	1
Text	2

Dataset

Description	해당 파일 데이터는 신용보증기금의 기타 행정정보 이용로그에 대해 확인하실 수 있는 자료이니 데이터 활용에 참고하여 주시기 바랍니다.
Author	신용보증기금
URL	https://www.data.go.kr/data/15093048/fileData.do

Alerts

`최종수정수` has constant value ""	Constant
Dataset has 124 (24.8%) duplicate rows	Duplicates

Reproduction

Analysis started	2023-12-12 02:00:56.260173
Analysis finished	2023-12-12 02:00:56.490367
Duration	0.23 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

최종수정수
Categorical

CONSTANT

Distinct	1
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	4.0 KiB

1	500

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
1	500	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
1	500	100.0%

처리직원번호
Text

Distinct	206
Distinct (%)	41.2%
Missing	0
Missing (%)	0.0%
Memory size	4.0 KiB

Length

Max length	5
Median length	4
Mean length	4.156
Min length	4

Characters and Unicode

Total characters	2078
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	82 ?
Unique (%)	16.4%

Sample

1st row	4120
2nd row	3927
3rd row	9C713
4th row	6077
5th row	5928

Value	Count	Frequency (%)
4005	9	1.8%
9c656	8	1.6%
4051	8	1.6%
5892	7	1.4%
5975	7	1.4%
4134	7	1.4%
3984	7	1.4%
5416	7	1.4%
3602	6	1.2%
9c647	6	1.2%
Other values (196)	428	85.6%

Most occurring characters

Value	Count	Frequency (%)
4	302	14.5%
5	256	12.3%
9	254	12.2%
6	223	10.7%
3	187	9.0%
0	183	8.8%
1	173	8.3%
7	154	7.4%
8	134	6.4%
2	134	6.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	2000	96.2%
Uppercase Letter	78	3.8%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
4	302	15.1%
5	256	12.8%
9	254	12.7%
6	223	11.2%
3	187	9.3%
0	183	9.2%
1	173	8.6%
7	154	7.7%
8	134	6.7%
2	134	6.7%

Uppercase Letter

Value	Count	Frequency (%)
C	78	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	2000	96.2%
Latin	78	3.8%

Most frequent character per script

Common

Value	Count	Frequency (%)
4	302	15.1%
5	256	12.8%
9	254	12.7%
6	223	11.2%
3	187	9.3%
0	183	9.2%
1	173	8.6%
7	154	7.7%
8	134	6.7%
2	134	6.7%

Latin

Value	Count	Frequency (%)
C	78	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2078	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
4	302	14.5%
5	256	12.3%
9	254	12.2%
6	223	10.7%
3	187	9.0%
0	183	8.8%
1	173	8.3%
7	154	7.4%
8	134	6.4%
2	134	6.4%

최초처리직원번호
Text

Distinct	206
Distinct (%)	41.2%
Missing	0
Missing (%)	0.0%
Memory size	4.0 KiB

Length

Max length	5
Median length	4
Mean length	4.156
Min length	4

Characters and Unicode

Total characters	2078
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	82 ?
Unique (%)	16.4%

Sample

1st row	4120
2nd row	3927
3rd row	9C713
4th row	6077
5th row	5928

Value	Count	Frequency (%)
4005	9	1.8%
9c656	8	1.6%
4051	8	1.6%
5892	7	1.4%
5975	7	1.4%
4134	7	1.4%
3984	7	1.4%
5416	7	1.4%
3602	6	1.2%
9c647	6	1.2%
Other values (196)	428	85.6%

Most occurring characters

Value	Count	Frequency (%)
4	302	14.5%
5	256	12.3%
9	254	12.2%
6	223	10.7%
3	187	9.0%
0	183	8.8%
1	173	8.3%
7	154	7.4%
8	134	6.4%
2	134	6.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	2000	96.2%
Uppercase Letter	78	3.8%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
4	302	15.1%
5	256	12.8%
9	254	12.7%
6	223	11.2%
3	187	9.3%
0	183	9.2%
1	173	8.6%
7	154	7.7%
8	134	6.7%
2	134	6.7%

Uppercase Letter

Value	Count	Frequency (%)
C	78	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	2000	96.2%
Latin	78	3.8%

Most frequent character per script

Common

Value	Count	Frequency (%)
4	302	15.1%
5	256	12.8%
9	254	12.7%
6	223	11.2%
3	187	9.3%
0	183	9.2%
1	173	8.6%
7	154	7.7%
8	134	6.7%
2	134	6.7%

Latin

Value	Count	Frequency (%)
C	78	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2078	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
4	302	14.5%
5	256	12.3%
9	254	12.2%
6	223	10.7%
3	187	9.0%
0	183	8.8%
1	173	8.3%
7	154	7.4%
8	134	6.4%
2	134	6.4%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	최종수정수	처리직원번호	최초처리직원번호
0	1	4120	4120
1	1	3927	3927
2	1	9C713	9C713
3	1	6077	6077
4	1	5928	5928
5	1	5473	5473
6	1	9C708	9C708
7	1	5407	5407
8	1	4223	4223
9	1	5893	5893

	최종수정수	처리직원번호	최초처리직원번호
490	1	9C713	9C713
491	1	9C656	9C656
492	1	5440	5440
493	1	5539	5539
494	1	5482	5482
495	1	9C691	9C691
496	1	5168	5168
497	1	9C642	9C642
498	1	5925	5925
499	1	4051	4051

Most frequently occurring

	최종수정수	처리직원번호	최초처리직원번호	# duplicates
18	1	4005	4005	9
21	1	4051	4051	8
108	1	9C656	9C656	8
17	1	3984	3984	7
25	1	4134	4134	7
72	1	5416	5416	7
83	1	5892	5892	7
87	1	5975	5975	7
7	1	3602	3602	6
20	1	4049	4049	6

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Missing values

Sample

Duplicate rows

Most frequently occurring