gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2513 (25.1%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:56:58.625332
Analysis finished	2024-05-11 05:56:59.974691
Duration	1.35 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2157
Distinct (%)	21.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.4893
Min length	2

Characters and Unicode

Total characters	74893
Distinct characters	435
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	120 ?
Unique (%)	1.2%

Sample

1st row	신사현대2차
2nd row	성수청구강변
3rd row	역삼래미안
4th row	용산파크자이
5th row	마천우방

Value	Count	Frequency (%)
아파트	181	1.7%
래미안	40	0.4%
e편한세상	24	0.2%
아이파크	22	0.2%
경남아너스빌	19	0.2%
고덕	16	0.1%
푸르지오	16	0.1%
힐스테이트	16	0.1%
이편한세상	16	0.1%
북한산	15	0.1%
Other values (2237)	10487	96.6%

Most occurring characters

Value	Count	Frequency (%)
아	2613	3.5%
파	2581	3.4%
트	2486	3.3%
지	1847	2.5%
대	1668	2.2%
동	1659	2.2%
이	1467	2.0%
차	1429	1.9%
단	1419	1.9%
신	1381	1.8%
Other values (425)	56343	75.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	68634	91.6%
Decimal Number	3618	4.8%
Space Separator	938	1.3%
Uppercase Letter	823	1.1%
Lowercase Letter	354	0.5%
Open Punctuation	135	0.2%
Close Punctuation	135	0.2%
Dash Punctuation	131	0.2%
Other Punctuation	120	0.2%
Letter Number	5	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2613	3.8%
파	2581	3.8%
트	2486	3.6%
지	1847	2.7%
대	1668	2.4%
동	1659	2.4%
이	1467	2.1%
차	1429	2.1%
단	1419	2.1%
신	1381	2.0%
Other values (380)	50084	73.0%

Uppercase Letter

Value	Count	Frequency (%)
S	144	17.5%
C	120	14.6%
K	102	12.4%
D	94	11.4%
M	94	11.4%
E	40	4.9%
H	37	4.5%
I	34	4.1%
L	33	4.0%
G	29	3.5%
Other values (7)	96	11.7%

Lowercase Letter

Value	Count	Frequency (%)
e	201	56.8%
l	42	11.9%
i	34	9.6%
v	23	6.5%
s	12	3.4%
k	9	2.5%
w	8	2.3%
c	8	2.3%
h	7	2.0%
a	5	1.4%

Decimal Number

Value	Count	Frequency (%)
1	1092	30.2%
2	1075	29.7%
3	465	12.9%
4	292	8.1%
5	183	5.1%
6	149	4.1%
7	111	3.1%
8	90	2.5%
0	83	2.3%
9	78	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	101	84.2%
.	19	15.8%

Space Separator

Value	Count	Frequency (%)
	938	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	135	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	135	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	131	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	68634	91.6%
Common	5077	6.8%
Latin	1182	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2613	3.8%
파	2581	3.8%
트	2486	3.6%
지	1847	2.7%
대	1668	2.4%
동	1659	2.4%
이	1467	2.1%
차	1429	2.1%
단	1419	2.1%
신	1381	2.0%
Other values (380)	50084	73.0%

Latin

Value	Count	Frequency (%)
e	201	17.0%
S	144	12.2%
C	120	10.2%
K	102	8.6%
D	94	8.0%
M	94	8.0%
l	42	3.6%
E	40	3.4%
H	37	3.1%
I	34	2.9%
Other values (19)	274	23.2%

Common

Value	Count	Frequency (%)
1	1092	21.5%
2	1075	21.2%
	938	18.5%
3	465	9.2%
4	292	5.8%
5	183	3.6%
6	149	2.9%
(	135	2.7%
)	135	2.7%
-	131	2.6%
Other values (6)	482	9.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	68634	91.6%
ASCII	6254	8.4%
Number Forms	5	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2613	3.8%
파	2581	3.8%
트	2486	3.6%
지	1847	2.7%
대	1668	2.4%
동	1659	2.4%
이	1467	2.1%
차	1429	2.1%
단	1419	2.1%
신	1381	2.0%
Other values (380)	50084	73.0%

ASCII

Value	Count	Frequency (%)
1	1092	17.5%
2	1075	17.2%
	938	15.0%
3	465	7.4%
4	292	4.7%
e	201	3.2%
5	183	2.9%
6	149	2.4%
S	144	2.3%
(	135	2.2%
Other values (34)	1580	25.3%

Number Forms

Value	Count	Frequency (%)
Ⅰ	5	100.0%

아파트코드
Text

Distinct	2161
Distinct (%)	21.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	121 ?
Unique (%)	1.2%

Sample

1st row	A12208105
2nd row	A13383003
3rd row	A13592706
4th row	A14075201
5th row	A13812004

Value	Count	Frequency (%)
a13611007	13	0.1%
a15683402	13	0.1%
a41279918	13	0.1%
a15805115	12	0.1%
a12284701	12	0.1%
a13186801	12	0.1%
a13120001	12	0.1%
a15882104	11	0.1%
a13788208	11	0.1%
a13302001	11	0.1%
Other values (2151)	9880	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18619	20.7%
1	17558	19.5%
A	9986	11.1%
3	8707	9.7%
2	8581	9.5%
5	6073	6.7%
8	5470	6.1%
7	4619	5.1%
4	4033	4.5%
6	3301	3.7%
Other values (2)	3053	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18619	23.3%
1	17558	21.9%
3	8707	10.9%
2	8581	10.7%
5	6073	7.6%
8	5470	6.8%
7	4619	5.8%
4	4033	5.0%
6	3301	4.1%
9	3039	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9986	99.9%
B	14	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18619	23.3%
1	17558	21.9%
3	8707	10.9%
2	8581	10.7%
5	6073	7.6%
8	5470	6.8%
7	4619	5.8%
4	4033	5.0%
6	3301	4.1%
9	3039	3.8%

Latin

Value	Count	Frequency (%)
A	9986	99.9%
B	14	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18619	20.7%
1	17558	19.5%
A	9986	11.1%
3	8707	9.7%
2	8581	9.5%
5	6073	6.7%
8	5470	6.1%
7	4619	5.1%
4	4033	4.5%
6	3301	3.7%
Other values (2)	3053	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	6.0122
Min length	2

Characters and Unicode

Total characters	60122
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	예수금
2nd row	관리비미수금
3rd row	선급금
4th row	기타당좌자산
5th row	수선유지비충당부채

Value	Count	Frequency (%)
비품	330	3.3%
장기수선충당부채	316	3.2%
예수금	303	3.0%
예금	303	3.0%
관리비미수금	303	3.0%
선급비용	301	3.0%
연차수당충당부채	298	3.0%
공동주택적립금	297	3.0%
미지급금	296	3.0%
미처분이익잉여금	294	2.9%
Other values (67)	6959	69.6%

Most occurring characters

Value	Count	Frequency (%)
금	4563	7.6%
당	3811	6.3%
수	3168	5.3%
비	3110	5.2%
충	3041	5.1%
부	2918	4.9%
채	2620	4.4%
기	2505	4.2%
선	1935	3.2%
예	1770	2.9%
Other values (97)	30681	51.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60122	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4563	7.6%
당	3811	6.3%
수	3168	5.3%
비	3110	5.2%
충	3041	5.1%
부	2918	4.9%
채	2620	4.4%
기	2505	4.2%
선	1935	3.2%
예	1770	2.9%
Other values (97)	30681	51.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60122	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4563	7.6%
당	3811	6.3%
수	3168	5.3%
비	3110	5.2%
충	3041	5.1%
부	2918	4.9%
채	2620	4.4%
기	2505	4.2%
선	1935	3.2%
예	1770	2.9%
Other values (97)	30681	51.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60122	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4563	7.6%
당	3811	6.3%
수	3168	5.3%
비	3110	5.2%
충	3041	5.1%
부	2918	4.9%
채	2620	4.4%
기	2505	4.2%
선	1935	3.2%
예	1770	2.9%
Other values (97)	30681	51.0%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202312	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202312
2nd row	202312
3rd row	202312
4th row	202312
5th row	202312

Common Values

Value	Count	Frequency (%)
202312	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202312	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7139
Distinct (%)	71.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	86316338

Minimum	-2.6482084 × 10⁸
Maximum	6.9310378 × 10⁹
Zeros	2513
Zeros (%)	25.1%
Negative	353
Negative (%)	3.5%
Memory size	166.0 KiB

Quantile statistics

Minimum	-2.6482084 × 10⁸
5-th percentile	0
Q1	0
median	3323710
Q3	44404518
95-th percentile	4.3164009 × 10⁸
Maximum	6.9310378 × 10⁹
Range	7.1958587 × 10⁹
Interquartile range (IQR)	44404518

Descriptive statistics

Standard deviation	3.087784 × 10⁸
Coefficient of variation (CV)	3.577288
Kurtosis	104.22158
Mean	86316338
Median Absolute Deviation (MAD)	3323710
Skewness	8.6204177
Sum	8.6316338 × 10¹¹
Variance	9.5344102 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2513	25.1%
500000	28	0.3%
300000	21	0.2%
250000	17	0.2%
242000	12	0.1%
200000	11	0.1%
484000	11	0.1%
400	11	0.1%
5000000	11	0.1%
3000000	10	0.1%
Other values (7129)	7355	73.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-264820840	1	< 0.1%
-206158420	1	< 0.1%
-198071844	1	< 0.1%
-195908810	1	< 0.1%
-159515815	1	< 0.1%
-147774640	1	< 0.1%
-144624600	1	< 0.1%
-142530421	1	< 0.1%
-137854113	1	< 0.1%
-132142869	1	< 0.1%

Value	Count	Frequency (%)
6931037845	1	< 0.1%
4980841618	1	< 0.1%
4909755056	1	< 0.1%
4888235156	1	< 0.1%
4853035871	1	< 0.1%
4777615630	1	< 0.1%
4699245158	1	< 0.1%
4663387098	1	< 0.1%
4271166575	1	< 0.1%
4132811129	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.501
금액	0.501	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
15061	신사현대2차	A12208105	예수금	202312	2312497
25766	성수청구강변	A13383003	관리비미수금	202312	22937320
32044	역삼래미안	A13592706	선급금	202312	3129000
48228	용산파크자이	A14075201	기타당좌자산	202312	529000
38896	마천우방	A13812004	수선유지비충당부채	202312	0
42761	중계주공5단지	A13922114	선급비용	202312	40673170
62516	상도동중앙하이츠빌아파트	A15683402	현금	202312	452410
41023	거여현대2차	A13881401	청소비충당부채	202312	6044880
66727	목동금호어울림	A15805403	선급금	202312	3438520
56731	개봉삼환	A15209205	예수금	202312	2055662

	아파트명	아파트코드	비용명	년월일	금액
65076	강서센트레빌4차	A15781201	임차보증금	202312	400000
14749	북한산현대홈타운	A12204102	선수수도료	202312	427820
42138	상계극동늘푸른	A13920106	퇴직급여충당부채	202312	34408615
65510	방화동 개화아파트	A15785608	당기순이익	202312	26187328
9672	창신두산	A11054101	기타충당예금	202312	0
29982	수서가람	A13523003	비품감가상각누계액	202312	-30846550
17196	래미안크레시티	A13071302	비품	202312	80263380
2782	DMC롯데캐슬더퍼스트	A10024828	복리후생비충당부채	202312	3663620
41020	거여현대2차	A13881401	퇴직급여충당부채	202312	22882110
18229	청량리홍릉동부	A13086802	가수금	202312	215536

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample