gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	20
Duplicate rows (%)	0.2%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/A/1/datasetView.do

Alerts

Dataset has 20 (0.2%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-05-11 00:11:49.477560
Analysis finished	2024-05-11 00:11:50.137805
Duration	0.66 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	7501
Distinct (%)	75.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	5654 ?
Unique (%)	56.5%

Sample

1st row	SPB-33048
2nd row	SPB-31122
3rd row	SPB-41290
4th row	SPB-32481
5th row	SPB-34211

Value	Count	Frequency (%)
spb-46909	8	0.1%
spb-46529	8	0.1%
spb-30524	7	0.1%
spb-41634	7	0.1%
spb-32055	7	0.1%
spb-32397	6	0.1%
spb-41083	6	0.1%
spb-32401	6	0.1%
spb-32617	6	0.1%
spb-46553	6	0.1%
Other values (7491)	9933	99.3%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	9169	10.2%
4	7298	8.1%
5	6027	6.7%
0	4487	5.0%
1	4463	5.0%
2	4462	5.0%
Other values (4)	14094	15.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	9169	18.3%
4	7298	14.6%
5	6027	12.1%
0	4487	9.0%
1	4463	8.9%
2	4462	8.9%
6	3698	7.4%
7	3582	7.2%
8	3462	6.9%
9	3352	6.7%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
3	9169	15.3%
4	7298	12.2%
5	6027	10.0%
0	4487	7.5%
1	4463	7.4%
2	4462	7.4%
6	3698	6.2%
7	3582	6.0%
8	3462	5.8%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	9169	10.2%
4	7298	8.1%
5	6027	6.7%
0	4487	5.0%
1	4463	5.0%
2	4462	5.0%
Other values (4)	14094	15.7%

등록일시
Date

Distinct	149
Distinct (%)	1.5%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2021-02-01 00:00:00
Maximum	2021-06-29 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	2823
체인	2331
안장	1984
타이어	1462
페달	860

Length

Max length	4
Median length	2
Mean length	2.6287
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	체인
2nd row	체인
3rd row	안장
4th row	타이어
5th row	체인

Common Values

Value	Count	Frequency (%)
기타	2823	28.2%
체인	2331	23.3%
안장	1984	19.8%
타이어	1462	14.6%
페달	860	8.6%
단말기	540	5.4%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	2823	28.2%
체인	2331	23.3%
안장	1984	19.8%
타이어	1462	14.6%
페달	860	8.6%
단말기	540	5.4%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
33396	SPB-33048	2021-5-25	체인
50541	SPB-31122	2021-6-24	체인
5062	SPB-41290	2021-3-25	안장
9705	SPB-32481	2021-4-9	타이어
44355	SPB-34211	2021-6-14	체인
45252	SPB-51574	2021-6-15	기타
50833	SPB-51912	2021-6-24	타이어
16466	SPB-37051	2021-4-23	체인
27473	SPB-41215	2021-5-13	기타
51142	SPB-43305	2021-6-25	타이어

	자전거번호	등록일시	고장구분
5361	SPB-41620	2021-3-26	체인
49254	SPB-42028	2021-6-22	단말기
39618	SPB-54643	2021-6-6	기타
9852	SPB-31766	2021-4-9	체인
5591	SPB-42575	2021-3-26	타이어
6139	SPB-38283	2021-3-30	기타
35834	SPB-31900	2021-5-30	체인
40968	SPB-36313	2021-6-8	타이어
5114	SPB-41481	2021-3-25	체인
50133	SPB-36764	2021-6-23	체인

Most frequently occurring

	자전거번호	등록일시	고장구분	# duplicates
0	SPB-33486	2021-4-19	기타	2
1	SPB-33834	2021-5-13	체인	2
2	SPB-34029	2021-4-29	체인	2
3	SPB-36594	2021-3-31	안장	2
4	SPB-36693	2021-4-7	타이어	2
5	SPB-38274	2021-5-17	타이어	2
6	SPB-38347	2021-6-4	체인	2
7	SPB-39490	2021-5-22	타이어	2
8	SPB-39586	2021-4-6	체인	2
9	SPB-40365	2021-5-19	타이어	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring