gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2199 (22.0%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:00:19.272426
Analysis finished	2024-05-11 06:00:20.291037
Duration	1.02 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2176
Distinct (%)	21.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	24
Median length	21
Mean length	7.2214
Min length	2

Characters and Unicode

Total characters	72214
Distinct characters	433
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	91 ?
Unique (%)	0.9%

Sample

1st row	천호한신
2nd row	장안위더스빌
3rd row	길음삼부
4th row	목동10단지
5th row	양평경남1차

Value	Count	Frequency (%)
아파트	135	1.3%
래미안	37	0.3%
힐스테이트	19	0.2%
sk뷰	17	0.2%
아이파크	17	0.2%
서울숲2차푸르지오임대	16	0.2%
북한산	15	0.1%
신반포	14	0.1%
해모로	14	0.1%
고덕	14	0.1%
Other values (2237)	10318	97.2%

Most occurring characters

Value	Count	Frequency (%)
아	2374	3.3%
파	2300	3.2%
트	2081	2.9%
대	1861	2.6%
지	1750	2.4%
동	1689	2.3%
차	1565	2.2%
신	1545	2.1%
성	1401	1.9%
단	1387	1.9%
Other values (423)	54261	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66074	91.5%
Decimal Number	3724	5.2%
Uppercase Letter	758	1.0%
Space Separator	694	1.0%
Lowercase Letter	363	0.5%
Open Punctuation	165	0.2%
Close Punctuation	165	0.2%
Dash Punctuation	157	0.2%
Other Punctuation	106	0.1%
Math Symbol	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2374	3.6%
파	2300	3.5%
트	2081	3.1%
대	1861	2.8%
지	1750	2.6%
동	1689	2.6%
차	1565	2.4%
신	1545	2.3%
성	1401	2.1%
단	1387	2.1%
Other values (377)	48121	72.8%

Uppercase Letter

Value	Count	Frequency (%)
S	139	18.3%
K	105	13.9%
C	87	11.5%
L	57	7.5%
H	54	7.1%
M	52	6.9%
D	52	6.9%
E	43	5.7%
I	42	5.5%
V	32	4.2%
Other values (7)	95	12.5%

Lowercase Letter

Value	Count	Frequency (%)
e	211	58.1%
l	38	10.5%
i	29	8.0%
v	22	6.1%
s	21	5.8%
k	18	5.0%
w	8	2.2%
c	6	1.7%
h	6	1.7%
g	2	0.6%

Decimal Number

Value	Count	Frequency (%)
1	1157	31.1%
2	1039	27.9%
3	503	13.5%
4	248	6.7%
5	214	5.7%
6	158	4.2%
7	131	3.5%
8	99	2.7%
0	92	2.5%
9	83	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	79	74.5%
.	27	25.5%

Space Separator

Value	Count	Frequency (%)
	694	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	165	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	165	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	157	100.0%

Math Symbol

Value	Count	Frequency (%)
~	4	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66074	91.5%
Common	5015	6.9%
Latin	1125	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2374	3.6%
파	2300	3.5%
트	2081	3.1%
대	1861	2.8%
지	1750	2.6%
동	1689	2.6%
차	1565	2.4%
신	1545	2.3%
성	1401	2.1%
단	1387	2.1%
Other values (377)	48121	72.8%

Latin

Value	Count	Frequency (%)
e	211	18.8%
S	139	12.4%
K	105	9.3%
C	87	7.7%
L	57	5.1%
H	54	4.8%
M	52	4.6%
D	52	4.6%
E	43	3.8%
I	42	3.7%
Other values (19)	283	25.2%

Common

Value	Count	Frequency (%)
1	1157	23.1%
2	1039	20.7%
	694	13.8%
3	503	10.0%
4	248	4.9%
5	214	4.3%
(	165	3.3%
)	165	3.3%
6	158	3.2%
-	157	3.1%
Other values (7)	515	10.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66074	91.5%
ASCII	6136	8.5%
Number Forms	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2374	3.6%
파	2300	3.5%
트	2081	3.1%
대	1861	2.8%
지	1750	2.6%
동	1689	2.6%
차	1565	2.4%
신	1545	2.3%
성	1401	2.1%
단	1387	2.1%
Other values (377)	48121	72.8%

ASCII

Value	Count	Frequency (%)
1	1157	18.9%
2	1039	16.9%
	694	11.3%
3	503	8.2%
4	248	4.0%
5	214	3.5%
e	211	3.4%
(	165	2.7%
)	165	2.7%
6	158	2.6%
Other values (35)	1582	25.8%

Number Forms

Value	Count	Frequency (%)
Ⅰ	4	100.0%

아파트코드
Text

Distinct	2182
Distinct (%)	21.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	91 ?
Unique (%)	0.9%

Sample

1st row	A13486601
2nd row	A13078701
3rd row	A13611004
4th row	A15873701
5th row	A15010302

Value	Count	Frequency (%)
b13380801	13	0.1%
a12071002	12	0.1%
a13881701	12	0.1%
a12071102	12	0.1%
a15685206	12	0.1%
a13816101	11	0.1%
a13003005	11	0.1%
a13684605	11	0.1%
a15205104	11	0.1%
a13407002	11	0.1%
Other values (2172)	9884	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18204	20.2%
1	17663	19.6%
A	9978	11.1%
3	9016	10.0%
2	8118	9.0%
5	6227	6.9%
8	5789	6.4%
7	4802	5.3%
4	3766	4.2%
6	3419	3.8%
Other values (2)	3018	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18204	22.8%
1	17663	22.1%
3	9016	11.3%
2	8118	10.1%
5	6227	7.8%
8	5789	7.2%
7	4802	6.0%
4	3766	4.7%
6	3419	4.3%
9	2996	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9978	99.8%
B	22	0.2%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18204	22.8%
1	17663	22.1%
3	9016	11.3%
2	8118	10.1%
5	6227	7.8%
8	5789	7.2%
7	4802	6.0%
4	3766	4.7%
6	3419	4.3%
9	2996	3.7%

Latin

Value	Count	Frequency (%)
A	9978	99.8%
B	22	0.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18204	20.2%
1	17663	19.6%
A	9978	11.1%
3	9016	10.0%
2	8118	9.0%
5	6227	6.9%
8	5789	6.4%
7	4802	5.3%
4	3766	4.2%
6	3419	3.8%
Other values (2)	3018	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	6.0151
Min length	2

Characters and Unicode

Total characters	60151
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	승강기유지비충당부채
2nd row	관리비예치금
3rd row	연차수당충당부채
4th row	미부과관리비
5th row	미지급금

Value	Count	Frequency (%)
미처분이익잉여금	324	3.2%
수선유지비충당부채	320	3.2%
연차수당충당부채	313	3.1%
당기순이익	311	3.1%
예수금	310	3.1%
선급비용	304	3.0%
예금	303	3.0%
장기수선충당예금	301	3.0%
퇴직급여충당부채	300	3.0%
장기수선충당부채	299	3.0%
Other values (67)	6915	69.2%

Most occurring characters

Value	Count	Frequency (%)
금	4646	7.7%
당	3863	6.4%
수	3159	5.3%
충	3155	5.2%
부	3056	5.1%
비	2998	5.0%
채	2748	4.6%
기	2433	4.0%
선	1936	3.2%
예	1742	2.9%
Other values (97)	30415	50.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60151	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4646	7.7%
당	3863	6.4%
수	3159	5.3%
충	3155	5.2%
부	3056	5.1%
비	2998	5.0%
채	2748	4.6%
기	2433	4.0%
선	1936	3.2%
예	1742	2.9%
Other values (97)	30415	50.6%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60151	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4646	7.7%
당	3863	6.4%
수	3159	5.3%
충	3155	5.2%
부	3056	5.1%
비	2998	5.0%
채	2748	4.6%
기	2433	4.0%
선	1936	3.2%
예	1742	2.9%
Other values (97)	30415	50.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60151	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4646	7.7%
당	3863	6.4%
수	3159	5.3%
충	3155	5.2%
부	3056	5.1%
비	2998	5.0%
채	2748	4.6%
기	2433	4.0%
선	1936	3.2%
예	1742	2.9%
Other values (97)	30415	50.6%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202005	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202005
2nd row	202005
3rd row	202005
4th row	202005
5th row	202005

Common Values

Value	Count	Frequency (%)
202005	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202005	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7459
Distinct (%)	74.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	75018849

Minimum	-3.7314823 × 10⁸
Maximum	8.9980122 × 10⁹
Zeros	2199
Zeros (%)	22.0%
Negative	312
Negative (%)	3.1%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.7314823 × 10⁸
5-th percentile	0
Q1	0
median	3150368
Q3	34787275
95-th percentile	3.6606834 × 10⁸
Maximum	8.9980122 × 10⁹
Range	9.3711605 × 10⁹
Interquartile range (IQR)	34787275

Descriptive statistics

Standard deviation	3.0069243 × 10⁸
Coefficient of variation (CV)	4.008225
Kurtosis	258.87072
Mean	75018849
Median Absolute Deviation (MAD)	3150368
Skewness	12.693987
Sum	7.5018849 × 10¹¹
Variance	9.0415936 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2199	22.0%
500000	31	0.3%
250000	20	0.2%
300000	13	0.1%
1000000	11	0.1%
242000	11	0.1%
400000	10	0.1%
10000	9	0.1%
484000	9	0.1%
100000	9	0.1%
Other values (7449)	7678	76.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-373148226	1	< 0.1%
-274511234	1	< 0.1%
-243688934	1	< 0.1%
-185960982	1	< 0.1%
-135178250	1	< 0.1%
-133385584	1	< 0.1%
-125211700	1	< 0.1%
-121299737	1	< 0.1%
-116349116	1	< 0.1%
-102207246	1	< 0.1%

Value	Count	Frequency (%)
8998012236	1	< 0.1%
8959412051	1	< 0.1%
7301297897	1	< 0.1%
5987905932	1	< 0.1%
5409417731	1	< 0.1%
5115662860	1	< 0.1%
5084404131	1	< 0.1%
4443367362	1	< 0.1%
4350317710	1	< 0.1%
4034231841	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.469
금액	0.469	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
25778	천호한신	A13486601	승강기유지비충당부채	202005	0
14758	장안위더스빌	A13078701	관리비예치금	202005	42821000
31128	길음삼부	A13611004	연차수당충당부채	202005	12240025
67090	목동10단지	A15873701	미부과관리비	202005	561499445
50784	양평경남1차	A15010302	미지급금	202005	4249480
15418	제기안암골벽산	A13086101	공동체활성화단체지원적립금	202005	1417640
66956	신월삼정그린뷰	A15809402	현금	202005	5230
19355	창동주공17단지	A13204408	장기수선충당부채	202005	1706170486
7939	북가좌삼호	A12013202	주차장충당부채	202005	0
36871	오금대림	A13813008	주차장충당부채	202005	0

	아파트명	아파트코드	비용명	년월일	금액
68524	은평뉴타운박석고개1단지	A41279910	미지급금	202005	194180405
66697	목동우성2차	A15807703	선급비용	202005	23756280
1139	백련산 sk뷰 아이파크	A10025310	미처분이익잉여금	202005	17670340
28687	역삼2차아이파크	A13579503	연차수당충당부채	202005	0
64607	방화청솔3단지	A15785709	선급금	202005	3864690
42247	공릉대주파크빌	A13980706	공동주택적립금	202005	22194591
51380	한강아파트	A15080501	상여충당부채	202005	0
50455	문래현대5차아파트	A15009504	예수금	202005	1020680
64569	한숲대림아파트	A15785703	기타당좌자산	202005	0
42394	상계불암대림	A13981006	주차장충당부채	202005	0

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Math Symbol

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample