gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	18
Duplicate rows (%)	0.2%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/F/1/datasetView.do

Alerts

Dataset has 18 (0.2%) duplicate rows

Duplicates

Reproduction

Analysis started	2024-03-13 19:20:57.366397
Analysis finished	2024-03-13 19:20:57.589063
Duration	0.22 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	7499
Distinct (%)	75.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	5646 ?
Unique (%)	56.5%

Sample

1st row	SPB-41758
2nd row	SPB-31142
3rd row	SPB-45328
4th row	SPB-32630
5th row	SPB-35372

Value	Count	Frequency (%)
spb-46515	11	0.1%
spb-46909	8	0.1%
spb-32409	6	0.1%
spb-37064	6	0.1%
spb-41620	6	0.1%
spb-37011	6	0.1%
spb-33034	6	0.1%
spb-52324	6	0.1%
spb-45373	6	0.1%
spb-34165	6	0.1%
Other values (7489)	9933	99.3%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	9076	10.1%
4	7433	8.3%
5	5878	6.5%
0	4489	5.0%
1	4467	5.0%
2	4383	4.9%
Other values (4)	14274	15.9%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	9076	18.2%
4	7433	14.9%
5	5878	11.8%
0	4489	9.0%
1	4467	8.9%
2	4383	8.8%
6	3772	7.5%
7	3599	7.2%
8	3496	7.0%
9	3407	6.8%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
3	9076	15.1%
4	7433	12.4%
5	5878	9.8%
0	4489	7.5%
1	4467	7.4%
2	4383	7.3%
6	3772	6.3%
7	3599	6.0%
8	3496	5.8%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	9076	10.1%
4	7433	8.3%
5	5878	6.5%
0	4489	5.0%
1	4467	5.0%
2	4383	4.9%
Other values (4)	14274	15.9%

등록일시
Date

Distinct	149
Distinct (%)	1.5%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2021-02-01 00:00:00
Maximum	2021-06-29 00:00:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	2807
체인	2356
안장	2004
타이어	1452
페달	878

Length

Max length	4
Median length	2
Mean length	2.6214
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	안장
2nd row	체인
3rd row	기타
4th row	안장
5th row	기타

Common Values

Value	Count	Frequency (%)
기타	2807	28.1%
체인	2356	23.6%
안장	2004	20.0%
타이어	1452	14.5%
페달	878	8.8%
단말기	503	5.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	2807	28.1%
체인	2356	23.6%
안장	2004	20.0%
타이어	1452	14.5%
페달	878	8.8%
단말기	503	5.0%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
35206	SPB-41758	2021-5-28	안장
11030	SPB-31142	2021-4-11	체인
42442	SPB-45328	2021-6-11	기타
48856	SPB-32630	2021-6-21	안장
44751	SPB-35372	2021-6-14	기타
28552	SPB-51681	2021-5-14	페달
24443	SPB-32949	2021-5-8	기타
48387	SPB-33575	2021-6-21	기타
49658	SPB-30850	2021-6-23	페달
52594	SPB-44877	2021-6-27	페달

	자전거번호	등록일시	고장구분
39497	SPB-50417	2021-6-6	타이어
44111	SPB-49068	2021-6-14	체인
9037	SPB-32583	2021-4-8	페달
8601	SPB-32360	2021-4-7	단말기
30507	SPB-43408	2021-5-19	안장
41491	SPB-38649	2021-6-9	기타
39812	SPB-38168	2021-6-6	기타
3382	SPB-46026	2021-3-14	페달
31659	SPB-33348	2021-5-22	기타
47189	SPB-37009	2021-6-19	타이어

Most frequently occurring

	자전거번호	등록일시	고장구분	# duplicates
0	SPB-30143	2021-4-26	체인	2
1	SPB-30445	2021-2-26	안장	2
2	SPB-31450	2021-4-17	기타	2
3	SPB-32176	2021-4-9	페달	2
4	SPB-32484	2021-5-13	체인	2
5	SPB-33034	2021-4-23	체인	2
6	SPB-33420	2021-4-10	체인	2
7	SPB-34233	2021-6-2	체인	2
8	SPB-35595	2021-2-2	단말기	2
9	SPB-37779	2021-5-6	기타	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample

Duplicate rows

Most frequently occurring