gimi9 Pandas Profiling

Dataset statistics

Number of variables	2
Number of observations	1283
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	4
Duplicate rows (%)	0.3%
Total size in memory	21.4 KiB
Average record size in memory	17.1 B

Variable types

Numeric	1
Categorical	1

Dataset

Description	등록번호,DEL_YN
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-21197/S/1/datasetView.do

Alerts

Dataset has 4 (0.3%) duplicate rows	Duplicates
`DEL_YN` is highly imbalanced (86.3%)	Imbalance

Reproduction

Analysis started	2024-05-11 06:08:35.325967
Analysis finished	2024-05-11 06:08:36.158988
Duration	0.83 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

등록번호
Real number (ℝ)

Distinct	1277
Distinct (%)	99.5%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	2523.8605

Minimum	1
Maximum	4162
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	11.4 KiB

Quantile statistics

Minimum	1
5-th percentile	1059.1
Q1	1316.5
median	1638
Q3	3838.5
95-th percentile	4097.9
Maximum	4162
Range	4161
Interquartile range (IQR)	2522

Descriptive statistics

Standard deviation	1281.1784
Coefficient of variation (CV)	0.50762649
Kurtosis	-1.8716095
Mean	2523.8605
Median Absolute Deviation (MAD)	610
Skewness	0.076506358
Sum	3238113
Variance	1641418.2
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
1664	4	0.3%
1665	2	0.2%
3544	2	0.2%
1667	2	0.2%
3631	1	0.1%
3630	1	0.1%
3629	1	0.1%
3628	1	0.1%
3627	1	0.1%
3626	1	0.1%
Other values (1267)	1267	98.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	1	0.1%
2	1	0.1%
3	1	0.1%
4	1	0.1%
14	1	0.1%
15	1	0.1%
1001	1	0.1%
1002	1	0.1%
1003	1	0.1%
1004	1	0.1%

Value	Count	Frequency (%)
4162	1	0.1%
4161	1	0.1%
4160	1	0.1%
4159	1	0.1%
4158	1	0.1%
4157	1	0.1%
4156	1	0.1%
4155	1	0.1%
4154	1	0.1%
4153	1	0.1%

DEL_YN
Categorical

IMBALANCE

Distinct	3
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	10.2 KiB

N	1241
Y	40
	2

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	N
2nd row	N
3rd row	N
4th row	N
5th row	N

Common Values

Value	Count	Frequency (%)
N	1241	96.7%
Y	40	3.1%
	2	0.2%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
n	1241	96.9%
y	40	3.1%

등록번호

등록번호

Phik (φk)
Auto

Heatmap
Table

	등록번호	DEL_YN
등록번호	1.000	0.670
DEL_YN	0.670	1.000

Heatmap
Table

	등록번호	DEL_YN
등록번호	1.000	0.356
DEL_YN	0.356	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	등록번호	DEL_YN
0	3785	N
1	3786	N
2	3787	N
3	3788	N
4	3789	N
5	3790	N
6	3791	N
7	3792	N
8	3793	N
9	3794	N

	등록번호	DEL_YN
1273	1413	N
1274	1409	N
1275	1404	N
1276	1378	N
1277	1371	N
1278	1368	N
1279	1367	N
1280	1420	N
1281	1419	N
1282	1417	N

Most frequently occurring

	등록번호	DEL_YN	# duplicates
0	1664	Y	4
1	1665	Y	2
2	1667	Y	2
3	3544		2

Overview

Variables

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring