Overview

Dataset statistics

Number of variables2
Number of observations10000
Missing cells275
Missing cells (%)1.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory244.1 KiB
Average record size in memory25.0 B

Variable types

Text1
Numeric1

Dataset

Description사업예정지일련번호,년도
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21182/S/1/datasetView.do

Alerts

년도 has 275 (2.8%) missing valuesMissing
사업예정지일련번호 has unique valuesUnique

Reproduction

Analysis started2024-05-11 09:52:37.425580
Analysis finished2024-05-11 09:52:38.066624
Duration0.64 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T09:52:38.330395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length18
Median length18
Mean length18
Min length18

Characters and Unicode

Total characters180000
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st rowSVR001200806250062
2nd rowSVR001201801030130
3rd rowSVR001201506230027
4th rowSVR001200507010240
5th rowSVR001200701120522
ValueCountFrequency (%)
svr001200806250062 1
 
< 0.1%
svr001200707050283 1
 
< 0.1%
svr001201912300266 1
 
< 0.1%
svr001200901120097 1
 
< 0.1%
svr001201106200057 1
 
< 0.1%
svr001201701200174 1
 
< 0.1%
svr001202101040129 1
 
< 0.1%
svr001201801120111 1
 
< 0.1%
svr001201307030042 1
 
< 0.1%
svr001200906280011 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-11T09:52:39.260997image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 67789
37.7%
1 30851
17.1%
2 18678
 
10.4%
S 10000
 
5.6%
V 10000
 
5.6%
R 10000
 
5.6%
6 5636
 
3.1%
7 5323
 
3.0%
3 5311
 
3.0%
4 4205
 
2.3%
Other values (3) 12207
 
6.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 150000
83.3%
Uppercase Letter 30000
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 67789
45.2%
1 30851
20.6%
2 18678
 
12.5%
6 5636
 
3.8%
7 5323
 
3.5%
3 5311
 
3.5%
4 4205
 
2.8%
5 4192
 
2.8%
9 4050
 
2.7%
8 3965
 
2.6%
Uppercase Letter
ValueCountFrequency (%)
S 10000
33.3%
V 10000
33.3%
R 10000
33.3%

Most occurring scripts

ValueCountFrequency (%)
Common 150000
83.3%
Latin 30000
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0 67789
45.2%
1 30851
20.6%
2 18678
 
12.5%
6 5636
 
3.8%
7 5323
 
3.5%
3 5311
 
3.5%
4 4205
 
2.8%
5 4192
 
2.8%
9 4050
 
2.7%
8 3965
 
2.6%
Latin
ValueCountFrequency (%)
S 10000
33.3%
V 10000
33.3%
R 10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 180000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 67789
37.7%
1 30851
17.1%
2 18678
 
10.4%
S 10000
 
5.6%
V 10000
 
5.6%
R 10000
 
5.6%
6 5636
 
3.1%
7 5323
 
3.0%
3 5311
 
3.0%
4 4205
 
2.3%
Other values (3) 12207
 
6.8%

년도
Real number (ℝ)

MISSING 

Distinct17
Distinct (%)0.2%
Missing275
Missing (%)2.8%
Infinite0
Infinite (%)0.0%
Mean2011.6347
Minimum2005
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-11T09:52:39.675709image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2005
Q12007
median2011
Q32016
95-th percentile2019
Maximum2021
Range16
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.7366574
Coefficient of variation (CV)0.0023546311
Kurtosis-1.1032426
Mean2011.6347
Median Absolute Deviation (MAD)4
Skewness0.28806424
Sum19563147
Variance22.435924
MonotonicityNot monotonic
2024-05-11T09:52:40.155888image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2007 983
 
9.8%
2005 976
 
9.8%
2013 707
 
7.1%
2019 698
 
7.0%
2009 690
 
6.9%
2014 666
 
6.7%
2012 637
 
6.4%
2011 625
 
6.2%
2006 614
 
6.1%
2008 601
 
6.0%
Other values (7) 2528
25.3%
ValueCountFrequency (%)
2005 976
9.8%
2006 614
6.1%
2007 983
9.8%
2008 601
6.0%
2009 690
6.9%
2010 534
5.3%
2011 625
6.2%
2012 637
6.4%
2013 707
7.1%
2014 666
6.7%
ValueCountFrequency (%)
2021 173
 
1.7%
2020 302
3.0%
2019 698
7.0%
2018 555
5.5%
2017 165
 
1.7%
2016 595
5.9%
2015 204
 
2.0%
2014 666
6.7%
2013 707
7.1%
2012 637
6.4%

Interactions

2024-05-11T09:52:37.602498image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-05-11T09:52:37.871604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T09:52:38.011451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

사업예정지일련번호년도
31272SVR0012008062500622008
66410SVR0012018010301302018
63311SVR0012015062300272015
14342SVR0012005070102402005
6376SVR0012007011205222007
38141SVR0012009010800132009
66390SVR0012018010300992018
32445SVR0012008012801212008
17890SVR0012005050100662005
73901SVR0022019062100082019
사업예정지일련번호년도
8592SVR0012005030127952005
63408SVR0012013011702902013
57798SVR0012012062700552012
42142SVR0012010011500722010
54083SVR0012012011101112012
35983SVR0012008101300102008
10666SVR0012007011601462007
72696SVR0012015071500542015
21902SVR0012007062801142007
70256SVR0012018011300032018