gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/A/1/datasetView.do

Reproduction

Analysis started	2024-05-03 21:46:47.556627
Analysis finished	2024-05-03 21:46:48.806032
Duration	1.25 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	8015
Distinct (%)	80.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6402 ?
Unique (%)	64.0%

Sample

1st row	SPB-63014
2nd row	SPB-36020
3rd row	SPB-47983
4th row	SPB-57182
5th row	SPB-32611

Value	Count	Frequency (%)
spb-58572	6	0.1%
spb-52473	5	< 0.1%
spb-45722	5	< 0.1%
spb-53016	5	< 0.1%
spb-44200	5	< 0.1%
spb-53137	5	< 0.1%
spb-45403	5	< 0.1%
spb-38781	5	< 0.1%
spb-31055	5	< 0.1%
spb-33489	5	< 0.1%
Other values (8005)	9949	99.5%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
4	7131	7.9%
3	7121	7.9%
5	7018	7.8%
6	4443	4.9%
0	4259	4.7%
1	4252	4.7%
Other values (4)	15776	17.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
4	7131	14.3%
3	7121	14.2%
5	7018	14.0%
6	4443	8.9%
0	4259	8.5%
1	4252	8.5%
8	4236	8.5%
2	4183	8.4%
7	3904	7.8%
9	3453	6.9%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
4	7131	11.9%
3	7121	11.9%
5	7018	11.7%
6	4443	7.4%
0	4259	7.1%
1	4252	7.1%
8	4236	7.1%
2	4183	7.0%
7	3904	6.5%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
4	7131	7.9%
3	7121	7.9%
5	7018	7.8%
6	4443	4.9%
0	4259	4.7%
1	4252	4.7%
Other values (4)	15776	17.5%

등록일시
Date

Distinct	9392
Distinct (%)	93.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2022-01-01 01:55:00
Maximum	2022-06-29 22:51:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	3337
안장	1910
체인	1847
타이어	1534
페달	857

Length

Max length	4
Median length	3
Mean length	2.692
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	기타
2nd row	체인
3rd row	기타
4th row	체인
5th row	페달

Common Values

Value	Count	Frequency (%)
기타	3337	33.4%
안장	1910	19.1%
체인	1847	18.5%
타이어	1534	15.3%
페달	857	8.6%
단말기	515	5.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	3337	33.4%
안장	1910	19.1%
체인	1847	18.5%
타이어	1534	15.3%
페달	857	8.6%
단말기	515	5.1%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
71679	SPB-63014	2022-06-22 17:26	기타
3084	SPB-36020	2022-01-13 8:53	체인
70981	SPB-47983	2022-06-21 19:08	기타
36137	SPB-57182	2022-04-29 20:45	체인
42096	SPB-32611	2022-05-10 16:29	페달
71462	SPB-34374	2022-06-22 9:14	체인
19088	SPB-44550	2022-03-26 17:15	기타
50690	SPB-39879	2022-05-22 19:59	안장
65363	SPB-56993	2022-06-13 12:58	기타
32459	SPB-40917	2022-04-23 9:03	체인

	자전거번호	등록일시	고장구분
44285	SPB-39764	2022-05-13 13:31	타이어
68714	SPB-61667	2022-06-18 14:03	안장
45109	SPB-62564	2022-05-14 17:53	타이어
57679	SPB-46813	2022-06-01 20:38	페달
67852	SPB-45102	2022-06-17 8:30	안장
24794	SPB-57959	2022-04-08 18:58	단말기
68673	SPB-40176	2022-06-18 12:42	안장
26161	SPB-61787	2022-04-11 15:48	기타
16834	SPB-46463	2022-03-18 11:57	체인
72622	SPB-52754	2022-06-24 18:08	기타

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample