gimi9 Pandas Profiling

Dataset statistics

Number of variables	3
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	312.5 KiB
Average record size in memory	32.0 B

Variable types

Text	1
DateTime	1
Categorical	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15644/F/1/datasetView.do

Reproduction

Analysis started	2024-03-13 19:20:53.595005
Analysis finished	2024-03-13 19:20:54.060057
Duration	0.47 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

자전거번호
Text

Distinct	8015
Distinct (%)	80.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	14
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6403 ?
Unique (%)	64.0%

Sample

1st row	SPB-46239
2nd row	SPB-51099
3rd row	SPB-51301
4th row	SPB-35056
5th row	SPB-43780

Value	Count	Frequency (%)
spb-35003	6	0.1%
spb-50744	6	0.1%
spb-45722	6	0.1%
spb-55846	5	< 0.1%
spb-37032	5	< 0.1%
spb-44393	5	< 0.1%
spb-50747	5	< 0.1%
spb-39866	5	< 0.1%
spb-38782	5	< 0.1%
spb-51516	5	< 0.1%
Other values (8005)	9947	99.5%

Most occurring characters

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	7269	8.1%
4	7100	7.9%
5	6915	7.7%
6	4526	5.0%
1	4293	4.8%
2	4292	4.8%
Other values (4)	15605	17.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	50000	55.6%
Uppercase Letter	30000	33.3%
Dash Punctuation	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
3	7269	14.5%
4	7100	14.2%
5	6915	13.8%
6	4526	9.1%
1	4293	8.6%
2	4292	8.6%
8	4236	8.5%
0	4168	8.3%
7	3628	7.3%
9	3573	7.1%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Dash Punctuation

Value	Count	Frequency (%)
-	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	60000	66.7%
Latin	30000	33.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
-	10000	16.7%
3	7269	12.1%
4	7100	11.8%
5	6915	11.5%
6	4526	7.5%
1	4293	7.2%
2	4292	7.2%
8	4236	7.1%
0	4168	6.9%
7	3628	6.0%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
P	10000	33.3%
B	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
S	10000	11.1%
P	10000	11.1%
B	10000	11.1%
-	10000	11.1%
3	7269	8.1%
4	7100	7.9%
5	6915	7.7%
6	4526	5.0%
1	4293	4.8%
2	4292	4.8%
Other values (4)	15605	17.3%

등록일시
Date

Distinct	9403
Distinct (%)	94.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Minimum	2022-01-01 02:25:00
Maximum	2022-06-29 21:18:00

Histogram

Histogram with fixed size bins (bins=50)

고장구분
Categorical

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

기타	3388
안장	1944
체인	1730
타이어	1548
페달	827

Length

Max length	4
Median length	3
Mean length	2.7047
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	기타
2nd row	기타
3rd row	체인
4th row	안장
5th row	안장

Common Values

Value	Count	Frequency (%)
기타	3388	33.9%
안장	1944	19.4%
체인	1730	17.3%
타이어	1548	15.5%
페달	827	8.3%
단말기	563	5.6%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
기타	3388	33.9%
안장	1944	19.4%
체인	1730	17.3%
타이어	1548	15.5%
페달	827	8.3%
단말기	563	5.6%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	자전거번호	등록일시	고장구분
34080	SPB-46239	2022-04-26 13:32	기타
14114	SPB-51099	2022-03-09 10:09	기타
15971	SPB-51301	2022-03-15 22:13	체인
46089	SPB-35056	2022-05-16 13:17	안장
36450	SPB-43780	2022-04-30 17:08	안장
20793	SPB-33779	2022-03-30 22:58	체인
56189	SPB-53985	2022-05-30 18:46	기타
38833	SPB-50876	2022-05-04 20:03	페달
66022	SPB-32749	2022-06-14 8:07	체인
42659	SPB-38845	2022-05-11 8:51	기타

	자전거번호	등록일시	고장구분
50500	SPB-52593	2022-05-22 16:03	기타
44908	SPB-56092	2022-05-14 12:35	안장
32340	SPB-36901	2022-04-22 20:36	체인
44971	SPB-51818	2022-05-14 14:32	페달
25717	SPB-43481	2022-04-10 18:15	안장
24119	SPB-43979	2022-04-07 15:16	타이어
16938	SPB-31946	2022-03-18 19:05	체인
43587	SPB-62033	2022-05-12 15:44	체인
50228	SPB-48818	2022-05-21 22:08	페달
26377	SPB-30309	2022-04-11 19:27	타이어

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Common Values

Length

Common Values (Plot)

Missing values

Sample