gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	6749
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	158.3 KiB
Average record size in memory	24.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/A/1/datasetView.do

Reproduction

Analysis started	2024-05-11 00:11:54.464972
Analysis finished	2024-05-11 00:11:55.116410
Duration	0.65 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	4974
Distinct (%)	73.7%
Missing	0
Missing (%)	0.0%
Memory size	52.9 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	60741
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	3691 ?
Unique (%)	54.7%

Sample

1st row	SPB-37924
2nd row	SPB-43077
3rd row	SPB-32462
4th row	SPB-39801
5th row	SPB-37194

Value	Count	Frequency (%)
spb-31084	10	0.1%
spb-30722	7	0.1%
spb-34307	7	0.1%
spb-32743	7	0.1%
spb-50197	6	0.1%
spb-32106	6	0.1%
spb-42382	6	0.1%
spb-02189	6	0.1%
spb-35868	6	0.1%
spb-36054	6	0.1%
Other values (4964)	6682	99.0%

Most occurring characters

Value	Count	Frequency (%)
3	6792	11.2%
S	6749	11.1%
P	6749	11.1%
B	6749	11.1%
-	6749	11.1%
4	4287	7.1%
5	3536	5.8%
2	3332	5.5%
1	3310	5.4%
0	3266	5.4%
Other values (4)	9222	15.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	33745	55.6%
Uppercase Letter	20247	33.3%
Dash Punctuation	6749	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	6792	20.1%
4	4287	12.7%
5	3536	10.5%
2	3332	9.9%
1	3310	9.8%
0	3266	9.7%
9	2327	6.9%
6	2325	6.9%
8	2289	6.8%
7	2281	6.8%

Uppercase Letter

Value	Count	Frequency (%)
S	6749	33.3%
P	6749	33.3%
B	6749	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	6749	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	40494	66.7%
Latin	20247	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
3	6792	16.8%
-	6749	16.7%
4	4287	10.6%
5	3536	8.7%
2	3332	8.2%
1	3310	8.2%
0	3266	8.1%
9	2327	5.7%
6	2325	5.7%
8	2289	5.7%

Latin

Value	Count	Frequency (%)
S	6749	33.3%
P	6749	33.3%
B	6749	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	60741	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
3	6792	11.2%
S	6749	11.1%
P	6749	11.1%
B	6749	11.1%
-	6749	11.1%
4	4287	7.1%
5	3536	5.8%
2	3332	5.5%
1	3310	5.4%
0	3266	5.4%
Other values (4)	9222	15.2%

등록일시
Date

Distinct	5778
Distinct (%)	85.6%
Missing	0
Missing (%)	0.0%
Memory size	52.9 KiB

Minimum	2020-11-01 00:10:00
Maximum	2021-01-30 23:23:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	52.9 KiB

기타	2507
체인	1270
안장	1147
단말기	705
타이어	656

Length

Max length	4
Median length	3
Mean length	2.6703215
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	기타
2nd row	안장
3rd row	체인
4th row	안장
5th row	체인

Common Values

Value	Count	Frequency (%)
기타	2507	37.1%
체인	1270	18.8%
안장	1147	17.0%
단말기	705	10.4%
타이어	656	9.7%
페달	464	6.9%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	2507	37.1%
체인	1270	18.8%
안장	1147	17.0%
단말기	705	10.4%
타이어	656	9.7%
페달	464	6.9%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
0	SPB-37924	2020-11-01 0:10	기타
1	SPB-43077	2020-11-01 0:24	안장
2	SPB-32462	2020-11-01 0:53	체인
3	SPB-39801	2020-11-01 1:09	안장
4	SPB-37194	2020-11-01 1:16	체인
5	SPB-50851	2020-11-01 1:19	타이어
6	SPB-44717	2020-11-01 1:49	체인
7	SPB-30172	2020-11-01 1:50	체인
8	SPB-33807	2020-11-01 2:01	단말기
9	SPB-30675	2020-11-01 2:08	체인

	자전거번호	등록일시	고장구분
6739	SPB-53705	2021-01-30 18:54	기타
6740	SPB-35024	2021-01-30 18:54	타이어
6741	SPB-50987	2021-01-30 19:18	기타
6742	SPB-40224	2021-01-30 20:28	기타
6743	SPB-34069	2021-01-30 21:11	기타
6744	SPB-30121	2021-01-30 21:33	기타
6745	SPB-51556	2021-01-30 22:25	기타
6746	SPB-31682	2021-01-30 22:50	기타
6747	SPB-35788	2021-01-30 23:00	기타
6748	SPB-38803	2021-01-30 23:23	타이어

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample