gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	9189
Missing cells	15
Missing cells (%)	< 0.1%
Duplicate rows	1
Duplicate rows (%)	< 0.1%
Total size in memory	305.2 KiB
Average record size in memory	34.0 B

Variable types

Categorical	1
Text	1
Numeric	2

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15249/F/1/datasetView.do

Alerts

Dataset has 1 (< 0.1%) duplicate rows	Duplicates
`대여일자` is highly overall correlated with `대여건수`	High correlation
`대여건수` is highly overall correlated with `대여일자`	High correlation

Reproduction

Analysis started	2024-03-13 09:53:50.234809
Analysis finished	2024-03-13 09:53:51.247929
Duration	1.01 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

구분
Categorical

Distinct	26
Distinct (%)	0.3%
Missing	0
Missing (%)	0.0%
Memory size	71.9 KiB

송파구	591
강남구	582
영등포구	528
서초구	527
강서구	510
Other values (21)	6451

Length

Max length	4
Median length	3
Mean length	3.0960932
Min length	2

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	강남구
2nd row	강남구
3rd row	강남구
4th row	강남구
5th row	강남구

Common Values

Value	Count	Frequency (%)
송파구	591	6.4%
강남구	582	6.3%
영등포구	528	5.7%
서초구	527	5.7%
강서구	510	5.6%
마포구	470	5.1%
노원구	405	4.4%
종로구	383	4.2%
성동구	379	4.1%
은평구	372	4.0%
Other values (16)	4442	48.3%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
송파구	591	6.4%
강남구	582	6.3%
영등포구	528	5.7%
서초구	527	5.7%
강서구	510	5.6%
마포구	470	5.1%
노원구	405	4.4%
종로구	383	4.2%
성동구	379	4.1%
은평구	372	4.0%
Other values (16)	4442	48.3%

대여소명
Text

Distinct	1546
Distinct (%)	16.8%
Missing	5
Missing (%)	0.1%
Memory size	71.9 KiB

Length

Max length	47
Median length	31
Mean length	15.464612
Min length	8

Characters and Unicode

Total characters	142027
Distinct characters	521
Distinct categories	12 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	5 ?
Unique (%)	0.1%

Sample

1st row	2301. 현대고등학교 건너편
2nd row	2302. 교보타워 버스정류장(신논현역 3번출구 후면)
3rd row	2303. 논현역 7번출구
4th row	2304. 신영 ROYAL PALACE 앞
5th row	2305. MCM 본사 직영점 앞

Value	Count	Frequency (%)
앞	2392	8.3%
옆	471	1.6%
출구	330	1.1%
1번출구	288	1.0%
사거리	252	0.9%
교차로	242	0.8%
입구	239	0.8%
뒤	228	0.8%
2번출구	220	0.8%
3번출구	196	0.7%
Other values (3396)	24135	83.2%

Most occurring characters

Value	Count	Frequency (%)
	19809	13.9%
.	9220	6.5%
1	8250	5.8%
2	6166	4.3%
3	4348	3.1%
5	3232	2.3%
구	3178	2.2%
0	3063	2.2%
4	2986	2.1%
앞	2877	2.0%
Other values (511)	78898	55.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	72931	51.4%
Decimal Number	37248	26.2%
Space Separator	19809	13.9%
Other Punctuation	9297	6.5%
Uppercase Letter	1100	0.8%
Close Punctuation	717	0.5%
Open Punctuation	717	0.5%
Lowercase Letter	119	0.1%
Dash Punctuation	53	< 0.1%
Math Symbol	24	< 0.1%
Other values (2)	12	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
구	3178	4.4%
앞	2877	3.9%
역	2388	3.3%
번	2097	2.9%
출	2067	2.8%
동	1769	2.4%
교	1528	2.1%
리	1268	1.7%
원	1136	1.6%
파	1085	1.5%
Other values (453)	53538	73.4%

Uppercase Letter

Value	Count	Frequency (%)
K	162	14.7%
S	131	11.9%
C	119	10.8%
G	84	7.6%
L	84	7.6%
T	66	6.0%
A	54	4.9%
M	54	4.9%
B	53	4.8%
I	47	4.3%
Other values (14)	246	22.4%

Decimal Number

Value	Count	Frequency (%)
1	8250	22.1%
2	6166	16.6%
3	4348	11.7%
5	3232	8.7%
0	3063	8.2%
4	2986	8.0%
6	2733	7.3%
7	2208	5.9%
9	2140	5.7%
8	2122	5.7%

Lowercase Letter

Value	Count	Frequency (%)
e	39	32.8%
k	13	10.9%
t	12	10.1%
l	12	10.1%
n	12	10.1%
s	7	5.9%
y	6	5.0%
c	6	5.0%
o	6	5.0%
m	6	5.0%

Other Punctuation

Value	Count	Frequency (%)
.	9220	99.2%
,	36	0.4%
?	18	0.2%
&	12	0.1%
@	6	0.1%
·	5	0.1%

Math Symbol

Value	Count	Frequency (%)
~	18	75.0%
+	6	25.0%

Space Separator

Value	Count	Frequency (%)
	19809	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	717	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	717	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	53	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	6	100.0%

Connector Punctuation

Value	Count	Frequency (%)
_	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	72937	51.4%
Common	67871	47.8%
Latin	1219	0.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
구	3178	4.4%
앞	2877	3.9%
역	2388	3.3%
번	2097	2.9%
출	2067	2.8%
동	1769	2.4%
교	1528	2.1%
리	1268	1.7%
원	1136	1.6%
파	1085	1.5%
Other values (454)	53544	73.4%

Latin

Value	Count	Frequency (%)
K	162	13.3%
S	131	10.7%
C	119	9.8%
G	84	6.9%
L	84	6.9%
T	66	5.4%
A	54	4.4%
M	54	4.4%
B	53	4.3%
I	47	3.9%
Other values (24)	365	29.9%

Common

Value	Count	Frequency (%)
	19809	29.2%
.	9220	13.6%
1	8250	12.2%
2	6166	9.1%
3	4348	6.4%
5	3232	4.8%
0	3063	4.5%
4	2986	4.4%
6	2733	4.0%
7	2208	3.3%
Other values (13)	5856	8.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	72931	51.4%
ASCII	69085	48.6%
None	11	< 0.1%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	19809	28.7%
.	9220	13.3%
1	8250	11.9%
2	6166	8.9%
3	4348	6.3%
5	3232	4.7%
0	3063	4.4%
4	2986	4.3%
6	2733	4.0%
7	2208	3.2%
Other values (46)	7070	10.2%

Hangul

Value	Count	Frequency (%)
구	3178	4.4%
앞	2877	3.9%
역	2388	3.3%
번	2097	2.9%
출	2067	2.8%
동	1769	2.4%
교	1528	2.1%
리	1268	1.7%
원	1136	1.6%
파	1085	1.5%
Other values (453)	53538	73.4%

None

Value	Count	Frequency (%)
㈜	6	54.5%
·	5	45.5%

대여일자
Real number (ℝ)

HIGH CORRELATION

Distinct	6
Distinct (%)	0.1%
Missing	5
Missing (%)	0.1%
Infinite	0
Infinite (%)	0.0%
Mean	201887.91

Minimum	201812
Maximum	201905
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	80.9 KiB

Quantile statistics

Minimum	201812
5-th percentile	201812
Q1	201901
median	201903
Q3	201904
95-th percentile	201905
Maximum	201905
Range	93
Interquartile range (IQR)	3

Descriptive statistics

Standard deviation	33.873285
Coefficient of variation (CV)	0.00016778263
Kurtosis	1.2197688
Mean	201887.91
Median Absolute Deviation (MAD)	1
Skewness	-1.7913755
Sum	1.8541386 × 10⁹
Variance	1147.3994
Monotonicity	Increasing

Histogram with fixed size bins (bins=6)

Value	Count	Frequency (%)
201903	1536	16.7%
201905	1536	16.7%
201904	1535	16.7%
201901	1528	16.6%
201902	1526	16.6%
201812	1523	16.6%
(Missing)	5	0.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
201812	1523	16.6%
201901	1528	16.6%
201902	1526	16.6%
201903	1536	16.7%
201904	1535	16.7%
201905	1536	16.7%

Value	Count	Frequency (%)
201905	1536	16.7%
201904	1535	16.7%
201903	1536	16.7%
201902	1526	16.6%
201901	1528	16.6%
201812	1523	16.6%

대여건수
Real number (ℝ)

HIGH CORRELATION

Distinct	2033
Distinct (%)	22.1%
Missing	5
Missing (%)	0.1%
Infinite	0
Infinite (%)	0.0%
Mean	636.1679

Minimum	0
Maximum	16080
Zeros	5
Zeros (%)	0.1%
Negative	0
Negative (%)	0.0%
Memory size	80.9 KiB

Quantile statistics

Minimum	0
5-th percentile	74
Q1	215
median	407
Q3	780
95-th percentile	1924.85
Maximum	16080
Range	16080
Interquartile range (IQR)	565

Descriptive statistics

Standard deviation	748.96096
Coefficient of variation (CV)	1.1773008
Kurtosis	56.210929
Mean	636.1679
Median Absolute Deviation (MAD)	238.5
Skewness	4.9942804
Sum	5842566
Variance	560942.52
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
201	27	0.3%
226	22	0.2%
275	21	0.2%
415	21	0.2%
142	21	0.2%
198	21	0.2%
144	21	0.2%
243	20	0.2%
228	20	0.2%
315	20	0.2%
Other values (2023)	8970	97.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
0	5	0.1%
1	1	< 0.1%
2	5	0.1%
3	1	< 0.1%
4	2	< 0.1%
5	5	0.1%
6	5	0.1%
7	2	< 0.1%
8	1	< 0.1%
9	3	< 0.1%

Value	Count	Frequency (%)
16080	1	< 0.1%
15588	1	< 0.1%
9996	1	< 0.1%
9977	1	< 0.1%
9366	1	< 0.1%
8719	1	< 0.1%
8378	1	< 0.1%
8081	1	< 0.1%
7795	1	< 0.1%
7344	1	< 0.1%

대여일자
대여건수

대여건수
대여일자

대여건수
대여일자

Phik (φk)
Auto

Heatmap
Table

	구분	대여일자	대여건수
구분	1.000	NaN	0.146
대여일자	NaN	1.000	NaN
대여건수	0.146	NaN	1.000

Heatmap
Table

	대여일자	대여건수	구분
대여일자	1.000	0.574	0.000
대여건수	0.574	1.000	0.058
구분	0.000	0.058	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	구분	대여소명	대여일자	대여건수
0	강남구	2301. 현대고등학교 건너편	201812	364
1	강남구	2302. 교보타워 버스정류장(신논현역 3번출구 후면)	201812	500
2	강남구	2303. 논현역 7번출구	201812	286
3	강남구	2304. 신영 ROYAL PALACE 앞	201812	149
4	강남구	2305. MCM 본사 직영점 앞	201812	145
5	강남구	2306. 압구정역 2번 출구 옆	201812	457
6	강남구	2307. 압구정 한양 3차 아파트	201812	279
7	강남구	2308. 압구정파출소 앞	201812	292
8	강남구	2309. 청담역(우리들병원 앞)	201812	152
9	강남구	2310. 청담동 맥도날드 옆(위치)	201812	214

	구분	대여소명	대여일자	대여건수
9179	중랑구	1450. 화랑대역 7번출구	201905	1013
9180	중랑구	1451. 중랑세무서	201905	2097
9181	중랑구	1452. 겸재교 진입부	201905	1743
9182	중랑구	1453. 중랑캠핑숲	201905	218
9183	중랑구	1454. 한국전력공사(동대문 중랑지사)	201905	1141
9184	중랑구	1455. 상봉역 2번 출구	201905	1362
9185	중랑구	1456. 상아빌딩(우림시장 교차로)	201905	826
9186	중랑구	1457. 동원사거리	201905	827
9187	중랑구	1458. 상봉터미널2	201905	1421
9188	중랑구	1459. 용마한신아파트사거리	201905	447

Most frequently occurring

	구분	대여소명	대여일자	대여건수	# duplicates
0	<NA>	<NA>	<NA>	<NA>	5

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Math Symbol

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Other Symbol

Connector Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

ASCII

Hangul

None

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring