gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2178 (21.8%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:53.357215
Analysis finished	2024-05-11 05:59:54.280447
Duration	0.92 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2201
Distinct (%)	22.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.2501
Min length	2

Characters and Unicode

Total characters	72501
Distinct characters	436
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	127 ?
Unique (%)	1.3%

Sample

1st row	올림픽파크한양수자인
2nd row	염창동아3차
3rd row	공릉두산힐스빌
4th row	은평뉴타운구파발10단지2관리
5th row	염창한마음삼성

Value	Count	Frequency (%)
아파트	145	1.4%
래미안	24	0.2%
고덕	21	0.2%
아이파크	20	0.2%
래미안밤섬리베뉴	18	0.2%
e편한세상	16	0.2%
코오롱하늘채아파트	15	0.1%
신도림현대	15	0.1%
신동아파밀리에	14	0.1%
신당남산타운(분양	13	0.1%
Other values (2267)	10343	97.2%

Most occurring characters

Value	Count	Frequency (%)
아	2441	3.4%
파	2357	3.3%
트	2135	2.9%
대	1837	2.5%
지	1767	2.4%
동	1651	2.3%
신	1566	2.2%
차	1555	2.1%
단	1377	1.9%
성	1353	1.9%
Other values (426)	54462	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66449	91.7%
Decimal Number	3677	5.1%
Uppercase Letter	761	1.0%
Space Separator	722	1.0%
Lowercase Letter	337	0.5%
Close Punctuation	144	0.2%
Open Punctuation	144	0.2%
Dash Punctuation	128	0.2%
Other Punctuation	125	0.2%
Letter Number	11	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2441	3.7%
파	2357	3.5%
트	2135	3.2%
대	1837	2.8%
지	1767	2.7%
동	1651	2.5%
신	1566	2.4%
차	1555	2.3%
단	1377	2.1%
성	1353	2.0%
Other values (380)	48410	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	117	15.4%
C	116	15.2%
K	97	12.7%
D	74	9.7%
M	74	9.7%
L	56	7.4%
H	48	6.3%
I	38	5.0%
E	31	4.1%
G	24	3.2%
Other values (7)	86	11.3%

Lowercase Letter

Value	Count	Frequency (%)
e	204	60.5%
l	30	8.9%
i	25	7.4%
v	20	5.9%
k	17	5.0%
s	16	4.7%
c	8	2.4%
w	6	1.8%
g	4	1.2%
a	4	1.2%

Decimal Number

Value	Count	Frequency (%)
1	1140	31.0%
2	1029	28.0%
3	476	12.9%
4	279	7.6%
5	208	5.7%
6	178	4.8%
7	119	3.2%
0	86	2.3%
9	81	2.2%
8	81	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	100	80.0%
.	25	20.0%

Space Separator

Value	Count	Frequency (%)
	722	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	144	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	144	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	128	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	11	100.0%

Math Symbol

Value	Count	Frequency (%)
~	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66449	91.7%
Common	4943	6.8%
Latin	1109	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2441	3.7%
파	2357	3.5%
트	2135	3.2%
대	1837	2.8%
지	1767	2.7%
동	1651	2.5%
신	1566	2.4%
차	1555	2.3%
단	1377	2.1%
성	1353	2.0%
Other values (380)	48410	72.9%

Latin

Value	Count	Frequency (%)
e	204	18.4%
S	117	10.6%
C	116	10.5%
K	97	8.7%
D	74	6.7%
M	74	6.7%
L	56	5.0%
H	48	4.3%
I	38	3.4%
E	31	2.8%
Other values (19)	254	22.9%

Common

Value	Count	Frequency (%)
1	1140	23.1%
2	1029	20.8%
	722	14.6%
3	476	9.6%
4	279	5.6%
5	208	4.2%
6	178	3.6%
)	144	2.9%
(	144	2.9%
-	128	2.6%
Other values (7)	495	10.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66449	91.7%
ASCII	6041	8.3%
Number Forms	11	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2441	3.7%
파	2357	3.5%
트	2135	3.2%
대	1837	2.8%
지	1767	2.7%
동	1651	2.5%
신	1566	2.4%
차	1555	2.3%
단	1377	2.1%
성	1353	2.0%
Other values (380)	48410	72.9%

ASCII

Value	Count	Frequency (%)
1	1140	18.9%
2	1029	17.0%
	722	12.0%
3	476	7.9%
4	279	4.6%
5	208	3.4%
e	204	3.4%
6	178	2.9%
)	144	2.4%
(	144	2.4%
Other values (35)	1517	25.1%

Number Forms

Value	Count	Frequency (%)
Ⅰ	11	100.0%

아파트코드
Text

Distinct	2209
Distinct (%)	22.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	128 ?
Unique (%)	1.3%

Sample

1st row	A10027354
2nd row	A15786227
3rd row	A13980415
4th row	A41279928
5th row	A15786118

Value	Count	Frequency (%)
a10045302	13	0.1%
a13817202	12	0.1%
a15606003	11	0.1%
a13311101	11	0.1%
a10028177	11	0.1%
a15722102	11	0.1%
a13822004	11	0.1%
a15678101	11	0.1%
a15205108	11	0.1%
a13381701	11	0.1%
Other values (2199)	9887	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18506	20.6%
1	17643	19.6%
A	9990	11.1%
3	8821	9.8%
2	8145	9.0%
5	6305	7.0%
8	5675	6.3%
7	4770	5.3%
4	3791	4.2%
6	3432	3.8%
Other values (2)	2922	3.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18506	23.1%
1	17643	22.1%
3	8821	11.0%
2	8145	10.2%
5	6305	7.9%
8	5675	7.1%
7	4770	6.0%
4	3791	4.7%
6	3432	4.3%
9	2912	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	9990	99.9%
B	10	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18506	23.1%
1	17643	22.1%
3	8821	11.0%
2	8145	10.2%
5	6305	7.9%
8	5675	7.1%
7	4770	6.0%
4	3791	4.7%
6	3432	4.3%
9	2912	3.6%

Latin

Value	Count	Frequency (%)
A	9990	99.9%
B	10	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18506	20.6%
1	17643	19.6%
A	9990	11.1%
3	8821	9.8%
2	8145	9.0%
5	6305	7.0%
8	5675	6.3%
7	4770	5.3%
4	3791	4.2%
6	3432	3.8%
Other values (2)	2922	3.2%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9574
Min length	2

Characters and Unicode

Total characters	59574
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	관리비미수금
2nd row	청소비충당부채
3rd row	선급비용
4th row	선수전기료
5th row	수선유지비충당부채

Value	Count	Frequency (%)
당기순이익	343	3.4%
예수금	324	3.2%
연차수당충당부채	319	3.2%
비품	316	3.2%
퇴직급여충당부채	315	3.1%
관리비미수금	309	3.1%
공동주택적립금	306	3.1%
선급비용	301	3.0%
예금	300	3.0%
미처분이익잉여금	296	3.0%
Other values (67)	6871	68.7%

Most occurring characters

Value	Count	Frequency (%)
금	4655	7.8%
당	3787	6.4%
수	3160	5.3%
비	3071	5.2%
충	3033	5.1%
부	2946	4.9%
채	2636	4.4%
기	2390	4.0%
선	1864	3.1%
예	1777	3.0%
Other values (97)	30255	50.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59574	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4655	7.8%
당	3787	6.4%
수	3160	5.3%
비	3071	5.2%
충	3033	5.1%
부	2946	4.9%
채	2636	4.4%
기	2390	4.0%
선	1864	3.1%
예	1777	3.0%
Other values (97)	30255	50.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59574	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3787	6.4%
수	3160	5.3%
비	3071	5.2%
충	3033	5.1%
부	2946	4.9%
채	2636	4.4%
기	2390	4.0%
선	1864	3.1%
예	1777	3.0%
Other values (97)	30255	50.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59574	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3787	6.4%
수	3160	5.3%
비	3071	5.2%
충	3033	5.1%
부	2946	4.9%
채	2636	4.4%
기	2390	4.0%
선	1864	3.1%
예	1777	3.0%
Other values (97)	30255	50.8%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202009	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202009
2nd row	202009
3rd row	202009
4th row	202009
5th row	202009

Common Values

Value	Count	Frequency (%)
202009	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202009	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7476
Distinct (%)	74.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	71073712

Minimum	-4.09024 × 10⁹
Maximum	1.1736729 × 10¹⁰
Zeros	2178
Zeros (%)	21.8%
Negative	333
Negative (%)	3.3%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.09024 × 10⁹
5-th percentile	0
Q1	0
median	3512020
Q3	36703212
95-th percentile	3.3641574 × 10⁸
Maximum	1.1736729 × 10¹⁰
Range	1.5826969 × 10¹⁰
Interquartile range (IQR)	36703212

Descriptive statistics

Standard deviation	2.9967182 × 10⁸
Coefficient of variation (CV)	4.2163524
Kurtosis	383.32557
Mean	71073712
Median Absolute Deviation (MAD)	3512020
Skewness	14.630203
Sum	7.1073712 × 10¹¹
Variance	8.9803197 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2178	21.8%
250000	26	0.3%
500000	20	0.2%
484000	20	0.2%
100000	14	0.1%
242000	14	0.1%
300000	12	0.1%
3000000	9	0.1%
10000000	9	0.1%
200000	9	0.1%
Other values (7466)	7689	76.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-4090240000	1	< 0.1%
-389001283	1	< 0.1%
-304675700	1	< 0.1%
-275232430	1	< 0.1%
-242649140	1	< 0.1%
-160448120	1	< 0.1%
-158111490	1	< 0.1%
-133850640	1	< 0.1%
-130567060	1	< 0.1%
-114204475	1	< 0.1%

Value	Count	Frequency (%)
11736728832	1	< 0.1%
8111691181	1	< 0.1%
6810387320	1	< 0.1%
5628446971	1	< 0.1%
5253154921	1	< 0.1%
5244582330	1	< 0.1%
5182339176	1	< 0.1%
4836080065	1	< 0.1%
4376421585	1	< 0.1%
4309876602	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.458
금액	0.458	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
4376	올림픽파크한양수자인	A10027354	관리비미수금	202009	42651720
65026	염창동아3차	A15786227	청소비충당부채	202009	0
42338	공릉두산힐스빌	A13980415	선급비용	202009	12757370
69253	은평뉴타운구파발10단지2관리	A41279928	선수전기료	202009	0
64923	염창한마음삼성	A15786118	수선유지비충당부채	202009	6494760
22229	성수금호3차	A13311101	비품	202009	1827000
13793	래미안아름숲	A13002002	예수금	202009	3403625
66626	목동삼성쉐르빌2차	A15807601	수선유지비충당부채	202009	299375
21915	행당대림	A13307204	기타유형자산감가상각누계액	202009	0
37162	오금현대백조	A13813006	퇴직급여충당예금	202009	0

	아파트명	아파트코드	비용명	년월일	금액
14590	휘경동일스위트리버	A13009206	퇴직급여충당예금	202009	57015648
7020	명륜아남1차	A11052201	미수금	202009	2043000
30062	역삼래미안	A13592706	기타충당부채	202009	0
21774	서울숲더샵	A13307003	미부과관리비	202009	222021294
38481	잠실푸르지오월드마크	A13872503	단기보증금	202009	9430000
47291	수유벽산	A14207203	장기수선충당예금	202009	525600555
58175	가산삼익아파트	A15380101	현금	202009	506348
8014	홍은풍림아이원	A12010202	연차수당충당부채	202009	1950330
57172	구로주공	A15286809	기타유형자산감가상각누계액	202009	-1111000
5065	인왕산2차아이파크아파트	A10027708	선수관리비	202009	32962000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample