Overview

Dataset statistics

Number of variables2
Number of observations1283
Missing cells0
Missing cells (%)0.0%
Duplicate rows4
Duplicate rows (%)0.3%
Total size in memory21.4 KiB
Average record size in memory17.1 B

Variable types

Numeric1
Categorical1

Dataset

Description등록번호,DEL_YN
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-21197/S/1/datasetView.do

Alerts

Dataset has 4 (0.3%) duplicate rowsDuplicates
DEL_YN is highly imbalanced (86.3%)Imbalance

Reproduction

Analysis started2024-05-11 06:08:35.325967
Analysis finished2024-05-11 06:08:36.158988
Duration0.83 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

등록번호
Real number (ℝ)

Distinct1277
Distinct (%)99.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2523.8605
Minimum1
Maximum4162
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.4 KiB
2024-05-11T06:08:36.454292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1059.1
Q11316.5
median1638
Q33838.5
95-th percentile4097.9
Maximum4162
Range4161
Interquartile range (IQR)2522

Descriptive statistics

Standard deviation1281.1784
Coefficient of variation (CV)0.50762649
Kurtosis-1.8716095
Mean2523.8605
Median Absolute Deviation (MAD)610
Skewness0.076506358
Sum3238113
Variance1641418.2
MonotonicityNot monotonic
2024-05-11T06:08:36.943744image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1664 4
 
0.3%
1665 2
 
0.2%
3544 2
 
0.2%
1667 2
 
0.2%
3631 1
 
0.1%
3630 1
 
0.1%
3629 1
 
0.1%
3628 1
 
0.1%
3627 1
 
0.1%
3626 1
 
0.1%
Other values (1267) 1267
98.8%
ValueCountFrequency (%)
1 1
0.1%
2 1
0.1%
3 1
0.1%
4 1
0.1%
14 1
0.1%
15 1
0.1%
1001 1
0.1%
1002 1
0.1%
1003 1
0.1%
1004 1
0.1%
ValueCountFrequency (%)
4162 1
0.1%
4161 1
0.1%
4160 1
0.1%
4159 1
0.1%
4158 1
0.1%
4157 1
0.1%
4156 1
0.1%
4155 1
0.1%
4154 1
0.1%
4153 1
0.1%

DEL_YN
Categorical

IMBALANCE 

Distinct3
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size10.2 KiB
N
1241 
Y
 
40
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N 1241
96.7%
Y 40
 
3.1%
2
 
0.2%

Length

2024-05-11T06:08:37.458361image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:08:37.835148image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
n 1241
96.9%
y 40
 
3.1%

Interactions

2024-05-11T06:08:35.481047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:08:38.082165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호DEL_YN
등록번호1.0000.670
DEL_YN0.6701.000
2024-05-11T06:08:38.343331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
등록번호DEL_YN
등록번호1.0000.356
DEL_YN0.3561.000

Missing values

2024-05-11T06:08:35.822096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:08:36.061876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

등록번호DEL_YN
03785N
13786N
23787N
33788N
43789N
53790N
63791N
73792N
83793N
93794N
등록번호DEL_YN
12731413N
12741409N
12751404N
12761378N
12771371N
12781368N
12791367N
12801420N
12811419N
12821417N

Duplicate rows

Most frequently occurring

등록번호DEL_YN# duplicates
01664Y4
11665Y2
21667Y2
335442