gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	244.1 KiB
Average record size in memory	25.0 B

Variable types

Text	1
Numeric	1

Dataset

Description	굴착예정지일련번호,년도
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-21183/S/1/datasetView.do

Alerts

굴착예정지일련번호 has unique values Unique

Reproduction

Analysis started	2024-05-11 01:05:59.384946
Analysis finished	2024-05-11 01:06:00.351362
Duration	0.97 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

굴착예정지일련번호
Text

UNIQUE

Distinct	10000
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	18
Median length	18
Mean length	18
Min length	18

Characters and Unicode

Total characters	180000
Distinct characters	13
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	10000 ?
Unique (%)	100.0%

Sample

1st row	SVR001200603150187
2nd row	SVR001200705130002
3rd row	SVR001200605040239
4th row	SVR001200604030012
5th row	SVR001200511070193

Value	Count	Frequency (%)
svr001200603150187	1	< 0.1%
svr001200505160029	1	< 0.1%
svr001200605270006	1	< 0.1%
svr001200506030037	1	< 0.1%
svr001200507060056	1	< 0.1%
svr001200608020181	1	< 0.1%
svr001200505300063	1	< 0.1%
svr001200705110288	1	< 0.1%
svr001200602160097	1	< 0.1%
svr001200704190158	1	< 0.1%
Other values (9990)	9990	99.9%

Most occurring characters

Value	Count	Frequency (%)
0	70379	39.1%
1	22992	12.8%
2	18154	10.1%
S	10000	5.6%
V	10000	5.6%
R	10000	5.6%
5	7758	4.3%
6	7713	4.3%
7	6240	3.5%
3	4869	2.7%
Other values (3)	11895	6.6%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	150000	83.3%
Uppercase Letter	30000	16.7%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	70379	46.9%
1	22992	15.3%
2	18154	12.1%
5	7758	5.2%
6	7713	5.1%
7	6240	4.2%
3	4869	3.2%
4	4337	2.9%
8	4113	2.7%
9	3445	2.3%

Uppercase Letter

Value	Count	Frequency (%)
S	10000	33.3%
V	10000	33.3%
R	10000	33.3%

Most occurring scripts

Value	Count	Frequency (%)
Common	150000	83.3%
Latin	30000	16.7%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	70379	46.9%
1	22992	15.3%
2	18154	12.1%
5	7758	5.2%
6	7713	5.1%
7	6240	4.2%
3	4869	3.2%
4	4337	2.9%
8	4113	2.7%
9	3445	2.3%

Latin

Value	Count	Frequency (%)
S	10000	33.3%
V	10000	33.3%
R	10000	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	180000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	70379	39.1%
1	22992	12.8%
2	18154	10.1%
S	10000	5.6%
V	10000	5.6%
R	10000	5.6%
5	7758	4.3%
6	7713	4.3%
7	6240	3.5%
3	4869	2.7%
Other values (3)	11895	6.6%

년도
Real number (ℝ)

Distinct	6
Distinct (%)	0.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2005.8928

Minimum	2005
Maximum	2013
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	166.0 KiB

Quantile statistics

Minimum	2005
5-th percentile	2005
Q1	2005
median	2006
Q3	2006
95-th percentile	2007
Maximum	2013
Range	8
Interquartile range (IQR)	1

Descriptive statistics

Standard deviation	0.76744189
Coefficient of variation (CV)	0.00038259367
Kurtosis	0.38696699
Mean	2005.8928
Median Absolute Deviation (MAD)	1
Skewness	0.3731773
Sum	20058928
Variance	0.58896706
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=6)

Value	Count	Frequency (%)
2006	4146	41.5%
2005	3476	34.8%
2007	2365	23.6%
2008	10	0.1%
2013	2	< 0.1%
2011	1	< 0.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
2005	3476	34.8%
2006	4146	41.5%
2007	2365	23.6%
2008	10	0.1%
2011	1	< 0.1%
2013	2	< 0.1%

Value	Count	Frequency (%)
2013	2	< 0.1%
2011	1	< 0.1%
2008	10	0.1%
2007	2365	23.6%
2006	4146	41.5%
2005	3476	34.8%

년도

년도

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	굴착예정지일련번호	년도
33164	SVR001200603150187	2006
99195	SVR001200705130002	2007
51242	SVR001200605040239	2006
47645	SVR001200604030012	2006
33085	SVR001200511070193	2005
45315	SVR001200604250063	2006
26840	SVR001200512290011	2005
34804	SVR001200511140151	2005
99383	SVR001200705150175	2007
31635	SVR001200601170049	2006

	굴착예정지일련번호	년도
27791	SVR001200512010059	2005
87145	SVR001200709030079	2007
54503	SVR001200606080012	2006
640	SVR001200504180029	2005
82976	SVR001200608210332	2006
85123	SVR001200704300306	2007
90338	SVR001200708240176	2007
37037	SVR001200511010478	2005
35689	SVR001200511300011	2005
59619	SVR001200703260183	2007

Overview

Variables