gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	10000
Missing cells	275
Missing cells (%)	1.4%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	244.1 KiB
Average record size in memory	25.0 B

Variable types

Text	1
Numeric	1

Dataset

Description	사업예정지일련번호,년도
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-21182/S/1/datasetView.do

Alerts

`년도` has 275 (2.8%) missing values	Missing
`사업예정지일련번호` has unique values	Unique

Reproduction

Analysis started	2024-05-11 09:52:37.425580
Analysis finished	2024-05-11 09:52:38.066624
Duration	0.64 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

사업예정지일련번호
Text

UNIQUE

Distinct	10000
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	18
Median length	18
Mean length	18
Min length	18

Characters and Unicode

Total characters	180000
Distinct characters	13
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	10000 ?
Unique (%)	100.0%

Sample

1st row	SVR001200806250062
2nd row	SVR001201801030130
3rd row	SVR001201506230027
4th row	SVR001200507010240
5th row	SVR001200701120522

Value	Count	Frequency (%)
svr001200806250062	1	< 0.1%
svr001200707050283	1	< 0.1%
svr001201912300266	1	< 0.1%
svr001200901120097	1	< 0.1%
svr001201106200057	1	< 0.1%
svr001201701200174	1	< 0.1%
svr001202101040129	1	< 0.1%
svr001201801120111	1	< 0.1%
svr001201307030042	1	< 0.1%
svr001200906280011	1	< 0.1%
Other values (9990)	9990	99.9%

Most occurring characters

Value	Count	Frequency (%)
0	67789	37.7%
1	30851	17.1%
2	18678	10.4%
S	10000	5.6%
V	10000	5.6%
R	10000	5.6%
6	5636	3.1%
7	5323	3.0%
3	5311	3.0%
4	4205	2.3%
Other values (3)	12207	6.8%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	150000	83.3%
Uppercase Letter	30000	16.7%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	67789	45.2%
1	30851	20.6%
2	18678	12.5%
6	5636	3.8%
7	5323	3.5%
3	5311	3.5%
4	4205	2.8%
5	4192	2.8%
9	4050	2.7%
8	3965	2.6%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
V	10000	33.3%
R	10000	33.3%

Most occurring scripts

Value	Count	Frequency (%)
Common	150000	83.3%
Latin	30000	16.7%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	67789	45.2%
1	30851	20.6%
2	18678	12.5%
6	5636	3.8%
7	5323	3.5%
3	5311	3.5%
4	4205	2.8%
5	4192	2.8%
9	4050	2.7%
8	3965	2.6%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
V	10000	33.3%
R	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	180000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	67789	37.7%
1	30851	17.1%
2	18678	10.4%
S	10000	5.6%
V	10000	5.6%
R	10000	5.6%
6	5636	3.1%
7	5323	3.0%
3	5311	3.0%
4	4205	2.3%
Other values (3)	12207	6.8%

년도
Real number (ℝ)

MISSING

Distinct	17
Distinct (%)	0.2%
Missing	275
Missing (%)	2.8%
Infinite	0
Infinite (%)	0.0%
Mean	2011.6347

Minimum	2005
Maximum	2021
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	166.0 KiB

Quantile statistics

Minimum	2005
5-th percentile	2005
Q1	2007
median	2011
Q3	2016
95-th percentile	2019
Maximum	2021
Range	16
Interquartile range (IQR)	9

Descriptive statistics

Standard deviation	4.7366574
Coefficient of variation (CV)	0.0023546311
Kurtosis	-1.1032426
Mean	2011.6347
Median Absolute Deviation (MAD)	4
Skewness	0.28806424
Sum	19563147
Variance	22.435924
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=17)

Value	Count	Frequency (%)
2007	983	9.8%
2005	976	9.8%
2013	707	7.1%
2019	698	7.0%
2009	690	6.9%
2014	666	6.7%
2012	637	6.4%
2011	625	6.2%
2006	614	6.1%
2008	601	6.0%
Other values (7)	2528	25.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
2005	976	9.8%
2006	614	6.1%
2007	983	9.8%
2008	601	6.0%
2009	690	6.9%
2010	534	5.3%
2011	625	6.2%
2012	637	6.4%
2013	707	7.1%
2014	666	6.7%

Value	Count	Frequency (%)
2021	173	1.7%
2020	302	3.0%
2019	698	7.0%
2018	555	5.5%
2017	165	1.7%
2016	595	5.9%
2015	204	2.0%
2014	666	6.7%
2013	707	7.1%
2012	637	6.4%

년도

년도

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	사업예정지일련번호	년도
31272	SVR001200806250062	2008
66410	SVR001201801030130	2018
63311	SVR001201506230027	2015
14342	SVR001200507010240	2005
6376	SVR001200701120522	2007
38141	SVR001200901080013	2009
66390	SVR001201801030099	2018
32445	SVR001200801280121	2008
17890	SVR001200505010066	2005
73901	SVR002201906210008	2019

	사업예정지일련번호	년도
8592	SVR001200503012795	2005
63408	SVR001201301170290	2013
57798	SVR001201206270055	2012
42142	SVR001201001150072	2010
54083	SVR001201201110111	2012
35983	SVR001200810130010	2008
10666	SVR001200701160146	2007
72696	SVR001201507150054	2015
21902	SVR001200706280114	2007
70256	SVR001201801130003	2018

Overview

Variables