gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	18
Duplicate rows (%)	0.2%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/A/1/datasetView.do

Alerts

Dataset has 18 (0.2%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-05-11 00:11:44.560938
Analysis finished	2024-05-11 00:11:45.170125
Duration	0.61 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	7967
Distinct (%)	79.7%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6343 ?
Unique (%)	63.4%

Sample

1st row	SPB-51759
2nd row	SPB-39634
3rd row	SPB-30136
4th row	SPB-31922
5th row	SPB-51618

Value	Count	Frequency (%)
spb-53491	7	0.1%
spb-33819	7	0.1%
spb-51821	6	0.1%
spb-35080	5	< 0.1%
spb-46356	5	< 0.1%
spb-32122	5	< 0.1%
spb-37659	5	< 0.1%
spb-47973	5	< 0.1%
spb-52258	5	< 0.1%
spb-41321	5	< 0.1%
Other values (7957)	9945	99.5%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	8062	9.0%
4	7508	8.3%
5	6643	7.4%
1	4385	4.9%
2	4309	4.8%
0	4196	4.7%
Other values (4)	14897	16.6%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	8062	16.1%
4	7508	15.0%
5	6643	13.3%
1	4385	8.8%
2	4309	8.6%
0	4196	8.4%
8	3865	7.7%
7	3826	7.7%
6	3754	7.5%
9	3452	6.9%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
3	8062	13.4%
4	7508	12.5%
5	6643	11.1%
1	4385	7.3%
2	4309	7.2%
0	4196	7.0%
8	3865	6.4%
7	3826	6.4%
6	3754	6.3%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	8062	9.0%
4	7508	8.3%
5	6643	7.4%
1	4385	4.9%
2	4309	4.8%
0	4196	4.7%
Other values (4)	14897	16.6%

등록일시
Date

Distinct	183
Distinct (%)	1.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2021-07-01 00:00:00
Maximum	2021-12-30 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	2791
타이어	2116
체인	1975
안장	1817
페달	902

Length

Max length	4
Median length	3
Mean length	2.7422
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	단말기
2nd row	안장
3rd row	체인
4th row	기타
5th row	타이어

Common Values

Value	Count	Frequency (%)
기타	2791	27.9%
타이어	2116	21.2%
체인	1975	19.8%
안장	1817	18.2%
페달	902	9.0%
단말기	399	4.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	2791	27.9%
타이어	2116	21.2%
체인	1975	19.8%
안장	1817	18.2%
페달	902	9.0%
단말기	399	4.0%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
83293	SPB-51759	2021-11-17	단말기
79739	SPB-39634	2021-11-07	안장
29544	SPB-30136	2021-08-18	체인
91373	SPB-31922	2021-12-13	기타
42370	SPB-51618	2021-09-08	타이어
80046	SPB-36171	2021-11-10	기타
72942	SPB-47538	2021-10-26	타이어
81226	SPB-41996	2021-11-12	타이어
38690	SPB-43434	2021-09-02	기타
71782	SPB-38219	2021-10-24	안장

	자전거번호	등록일시	고장구분
60972	SPB-36072	2021-10-05	기타
3165	SPB-49032	2021-07-06	기타
48513	SPB-33480	2021-09-16	체인
32115	SPB-47981	2021-08-22	체인
83532	SPB-57848	2021-11-18	기타
67187	SPB-56075	2021-10-15	타이어
69253	SPB-32670	2021-10-20	기타
91172	SPB-54861	2021-12-12	안장
12236	SPB-54021	2021-07-21	안장
89418	SPB-58508	2021-12-07	안장

Most frequently occurring

	자전거번호	등록일시	고장구분	# duplicates
0	SPB-30128	2021-07-12	타이어	2
1	SPB-30647	2021-08-28	페달	2
2	SPB-31122	2021-09-02	안장	2
3	SPB-32312	2021-07-29	체인	2
4	SPB-32404	2021-10-09	페달	2
5	SPB-33179	2021-08-10	체인	2
6	SPB-33570	2021-10-20	페달	2
7	SPB-34163	2021-07-05	페달	2
8	SPB-34167	2021-09-02	안장	2
9	SPB-35643	2021-08-20	타이어	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring