gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	10
Duplicate rows (%)	0.1%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/F/1/datasetView.do

Alerts

Dataset has 10 (0.1%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-03-13 19:20:55.642144
Analysis finished	2024-03-13 19:20:55.838346
Duration	0.2 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	7956
Distinct (%)	79.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6293 ?
Unique (%)	62.9%

Sample

1st row	SPB-50859
2nd row	SPB-50211
3rd row	SPB-32017
4th row	SPB-46110
5th row	SPB-35521

Value	Count	Frequency (%)
spb-53008	8	0.1%
spb-34154	7	0.1%
spb-46499	5	< 0.1%
spb-48861	5	< 0.1%
spb-34434	5	< 0.1%
spb-32419	5	< 0.1%
spb-43158	5	< 0.1%
spb-42771	5	< 0.1%
spb-48648	5	< 0.1%
spb-33543	4	< 0.1%
Other values (7946)	9946	99.5%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	8157	9.1%
4	7413	8.2%
5	6703	7.4%
1	4416	4.9%
0	4354	4.8%
2	4227	4.7%
Other values (4)	14730	16.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	8157	16.3%
4	7413	14.8%
5	6703	13.4%
1	4416	8.8%
0	4354	8.7%
2	4227	8.5%
8	3902	7.8%
7	3806	7.6%
6	3659	7.3%
9	3363	6.7%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
3	8157	13.6%
4	7413	12.4%
5	6703	11.2%
1	4416	7.4%
0	4354	7.3%
2	4227	7.0%
8	3902	6.5%
7	3806	6.3%
6	3659	6.1%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	8157	9.1%
4	7413	8.2%
5	6703	7.4%
1	4416	4.9%
0	4354	4.8%
2	4227	4.7%
Other values (4)	14730	16.4%

등록일시
Date

Distinct	183
Distinct (%)	1.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2021-07-01 00:00:00
Maximum	2021-12-30 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	2818
타이어	2057
체인	1951
안장	1847
페달	879

Length

Max length	4
Median length	3
Mean length	2.738
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	체인
2nd row	타이어
3rd row	기타
4th row	타이어
5th row	체인

Common Values

Value	Count	Frequency (%)
기타	2818	28.2%
타이어	2057	20.6%
체인	1951	19.5%
안장	1847	18.5%
페달	879	8.8%
단말기	448	4.5%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	2818	28.2%
타이어	2057	20.6%
체인	1951	19.5%
안장	1847	18.5%
페달	879	8.8%
단말기	448	4.5%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
85289	SPB-50859	2021-11-23	체인
18983	SPB-50211	2021-08-02	타이어
30583	SPB-32017	2021-08-19	기타
19583	SPB-46110	2021-08-03	타이어
76393	SPB-35521	2021-11-01	체인
25062	SPB-43149	2021-08-11	타이어
36163	SPB-47135	2021-08-29	기타
80375	SPB-48179	2021-11-11	안장
73957	SPB-40683	2021-10-28	체인
91455	SPB-48731	2021-12-13	안장

	자전거번호	등록일시	고장구분
4934	SPB-31325	2021-07-09	체인
74759	SPB-40236	2021-10-29	기타
60910	SPB-43231	2021-10-05	단말기
36778	SPB-45876	2021-08-30	타이어
55804	SPB-42269	2021-09-26	기타
19541	SPB-38013	2021-08-03	안장
30164	SPB-31092	2021-08-19	체인
67834	SPB-47111	2021-10-17	안장
78012	SPB-43273	2021-11-04	체인
61066	SPB-47488	2021-10-05	단말기

Most frequently occurring

	자전거번호	등록일시	고장구분	# duplicates
0	SPB-31187	2021-09-10	페달	2
1	SPB-31279	2021-09-02	체인	2
2	SPB-32593	2021-08-28	타이어	2
3	SPB-33401	2021-07-04	안장	2
4	SPB-33517	2021-08-10	체인	2
5	SPB-37222	2021-09-16	체인	2
6	SPB-43277	2021-07-06	타이어	2
7	SPB-47682	2021-09-21	타이어	2
8	SPB-51582	2021-08-05	타이어	2
9	SPB-53899	2021-08-11	안장	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring