gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 25.94465494)	Skewed
`금액` has 2356 (23.6%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:58:12.982950
Analysis finished	2024-05-11 05:58:14.209721
Duration	1.23 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2233
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.32
Min length	2

Characters and Unicode

Total characters	73200
Distinct characters	435
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	136 ?
Unique (%)	1.4%

Sample

1st row	하월곡동신
2nd row	홍제성원아파트
3rd row	힐스테이트서초젠트리스
4th row	월계6-2초안
5th row	푸른마을아파트

Value	Count	Frequency (%)
아파트	158	1.5%
래미안	32	0.3%
아이파크	22	0.2%
디에이치	19	0.2%
sk뷰	17	0.2%
경남아너스빌	16	0.1%
e편한세상	16	0.1%
해모로	14	0.1%
고덕	14	0.1%
도화현대1차아파트	14	0.1%
Other values (2313)	10396	97.0%

Most occurring characters

Value	Count	Frequency (%)
아	2510	3.4%
파	2422	3.3%
트	2230	3.0%
지	1873	2.6%
동	1750	2.4%
대	1736	2.4%
신	1498	2.0%
단	1470	2.0%
차	1450	2.0%
이	1354	1.8%
Other values (425)	54907	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67053	91.6%
Decimal Number	3655	5.0%
Uppercase Letter	835	1.1%
Space Separator	792	1.1%
Lowercase Letter	354	0.5%
Open Punctuation	130	0.2%
Close Punctuation	130	0.2%
Dash Punctuation	130	0.2%
Other Punctuation	114	0.2%
Letter Number	7	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2510	3.7%
파	2422	3.6%
트	2230	3.3%
지	1873	2.8%
동	1750	2.6%
대	1736	2.6%
신	1498	2.2%
단	1470	2.2%
차	1450	2.2%
이	1354	2.0%
Other values (380)	48760	72.7%

Uppercase Letter

Value	Count	Frequency (%)
S	140	16.8%
C	116	13.9%
K	103	12.3%
M	82	9.8%
D	82	9.8%
L	53	6.3%
I	44	5.3%
E	43	5.1%
H	40	4.8%
G	30	3.6%
Other values (7)	102	12.2%

Lowercase Letter

Value	Count	Frequency (%)
e	199	56.2%
l	39	11.0%
i	28	7.9%
v	21	5.9%
s	18	5.1%
k	16	4.5%
h	11	3.1%
w	8	2.3%
c	8	2.3%
g	3	0.8%

Decimal Number

Value	Count	Frequency (%)
1	1112	30.4%
2	1039	28.4%
3	483	13.2%
4	262	7.2%
5	236	6.5%
6	158	4.3%
7	115	3.1%
9	88	2.4%
0	83	2.3%
8	79	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	101	88.6%
.	13	11.4%

Space Separator

Value	Count	Frequency (%)
	792	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	130	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	130	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	130	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67053	91.6%
Common	4951	6.8%
Latin	1196	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2510	3.7%
파	2422	3.6%
트	2230	3.3%
지	1873	2.8%
동	1750	2.6%
대	1736	2.6%
신	1498	2.2%
단	1470	2.2%
차	1450	2.2%
이	1354	2.0%
Other values (380)	48760	72.7%

Latin

Value	Count	Frequency (%)
e	199	16.6%
S	140	11.7%
C	116	9.7%
K	103	8.6%
M	82	6.9%
D	82	6.9%
L	53	4.4%
I	44	3.7%
E	43	3.6%
H	40	3.3%
Other values (19)	294	24.6%

Common

Value	Count	Frequency (%)
1	1112	22.5%
2	1039	21.0%
	792	16.0%
3	483	9.8%
4	262	5.3%
5	236	4.8%
6	158	3.2%
(	130	2.6%
)	130	2.6%
-	130	2.6%
Other values (6)	479	9.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67053	91.6%
ASCII	6140	8.4%
Number Forms	7	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2510	3.7%
파	2422	3.6%
트	2230	3.3%
지	1873	2.8%
동	1750	2.6%
대	1736	2.6%
신	1498	2.2%
단	1470	2.2%
차	1450	2.2%
이	1354	2.0%
Other values (380)	48760	72.7%

ASCII

Value	Count	Frequency (%)
1	1112	18.1%
2	1039	16.9%
	792	12.9%
3	483	7.9%
4	262	4.3%
5	236	3.8%
e	199	3.2%
6	158	2.6%
S	140	2.3%
(	130	2.1%
Other values (34)	1589	25.9%

Number Forms

Value	Count	Frequency (%)
Ⅰ	7	100.0%

아파트코드
Text

Distinct	2239
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	138 ?
Unique (%)	1.4%

Sample

1st row	A13613005
2nd row	A12009201
3rd row	A10028046
4th row	A13905208
5th row	A13594203

Value	Count	Frequency (%)
a12181406	14	0.1%
a12013003	12	0.1%
a13080401	12	0.1%
a13471501	12	0.1%
a13204301	11	0.1%
a13822002	11	0.1%
a10024216	11	0.1%
a15210211	11	0.1%
a13611007	11	0.1%
a13302204	11	0.1%
Other values (2229)	9884	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18454	20.5%
1	17649	19.6%
A	9996	11.1%
3	8863	9.8%
2	8300	9.2%
5	6144	6.8%
8	5568	6.2%
7	4596	5.1%
4	4072	4.5%
6	3345	3.7%
Other values (2)	3013	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18454	23.1%
1	17649	22.1%
3	8863	11.1%
2	8300	10.4%
5	6144	7.7%
8	5568	7.0%
7	4596	5.7%
4	4072	5.1%
6	3345	4.2%
9	3009	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9996	> 99.9%
B	4	< 0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18454	23.1%
1	17649	22.1%
3	8863	11.1%
2	8300	10.4%
5	6144	7.7%
8	5568	7.0%
7	4596	5.7%
4	4072	5.1%
6	3345	4.2%
9	3009	3.8%

Latin

Value	Count	Frequency (%)
A	9996	> 99.9%
B	4	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18454	20.5%
1	17649	19.6%
A	9996	11.1%
3	8863	9.8%
2	8300	9.2%
5	6144	6.8%
8	5568	6.2%
7	4596	5.1%
4	4072	4.5%
6	3345	3.7%
Other values (2)	3013	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9689
Min length	2

Characters and Unicode

Total characters	59689
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	연차수당충당부채
2nd row	장기수선충당예금
3rd row	미수수익
4th row	장기수선충당부채
5th row	가수금

Value	Count	Frequency (%)
관리비미수금	330	3.3%
퇴직급여충당부채	319	3.2%
예금	315	3.1%
장기수선충당부채	311	3.1%
연차수당충당부채	308	3.1%
당기순이익	308	3.1%
장기수선충당예금	305	3.0%
공동주택적립금	302	3.0%
선급비용	300	3.0%
미처분이익잉여금	297	3.0%
Other values (67)	6905	69.0%

Most occurring characters

Value	Count	Frequency (%)
금	4640	7.8%
당	3869	6.5%
수	3098	5.2%
충	3044	5.1%
부	2953	4.9%
비	2950	4.9%
채	2660	4.5%
기	2556	4.3%
선	1876	3.1%
예	1689	2.8%
Other values (97)	30354	50.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59689	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4640	7.8%
당	3869	6.5%
수	3098	5.2%
충	3044	5.1%
부	2953	4.9%
비	2950	4.9%
채	2660	4.5%
기	2556	4.3%
선	1876	3.1%
예	1689	2.8%
Other values (97)	30354	50.9%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59689	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4640	7.8%
당	3869	6.5%
수	3098	5.2%
충	3044	5.1%
부	2953	4.9%
비	2950	4.9%
채	2660	4.5%
기	2556	4.3%
선	1876	3.1%
예	1689	2.8%
Other values (97)	30354	50.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59689	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4640	7.8%
당	3869	6.5%
수	3098	5.2%
충	3044	5.1%
부	2953	4.9%
비	2950	4.9%
채	2660	4.5%
기	2556	4.3%
선	1876	3.1%
예	1689	2.8%
Other values (97)	30354	50.9%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202203	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202203
2nd row	202203
3rd row	202203
4th row	202203
5th row	202203

Common Values

Value	Count	Frequency (%)
202203	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202203	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7330
Distinct (%)	73.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	79936787

Minimum	-6.0442056 × 10⁸
Maximum	2.2863029 × 10¹⁰
Zeros	2356
Zeros (%)	23.6%
Negative	329
Negative (%)	3.3%
Memory size	166.0 KiB

Quantile statistics

Minimum	-6.0442056 × 10⁸
5-th percentile	0
Q1	0
median	2815730
Q3	31867436
95-th percentile	3.765935 × 10⁸
Maximum	2.2863029 × 10¹⁰
Range	2.346745 × 10¹⁰
Interquartile range (IQR)	31867436

Descriptive statistics

Standard deviation	3.9812528 × 10⁸
Coefficient of variation (CV)	4.9805014
Kurtosis	1191.0794
Mean	79936787
Median Absolute Deviation (MAD)	2815730
Skewness	25.944655
Sum	7.9936787 × 10¹¹
Variance	1.5850374 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2356	23.6%
500000	26	0.3%
250000	17	0.2%
484000	13	0.1%
300000	13	0.1%
242000	12	0.1%
30000000	11	0.1%
200000	11	0.1%
3000000	10	0.1%
1000000	10	0.1%
Other values (7320)	7521	75.2%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-604420565	1	< 0.1%
-302175540	1	< 0.1%
-279779260	1	< 0.1%
-271719290	1	< 0.1%
-230922000	1	< 0.1%
-197026520	1	< 0.1%
-136451812	1	< 0.1%
-123413690	1	< 0.1%
-120813335	1	< 0.1%
-116368256	1	< 0.1%

Value	Count	Frequency (%)
22863029101	1	< 0.1%
10667455557	1	< 0.1%
7821550005	1	< 0.1%
7491393593	1	< 0.1%
7257482358	1	< 0.1%
6545163308	1	< 0.1%
5724597640	1	< 0.1%
5255507426	1	< 0.1%
5133279263	1	< 0.1%
4962590333	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.232
금액	0.232	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
34489	하월곡동신	A13613005	연차수당충당부채	202203	6236980
9527	홍제성원아파트	A12009201	장기수선충당예금	202203	150172096
7662	힐스테이트서초젠트리스	A10028046	미수수익	202203	659930
42943	월계6-2초안	A13905208	장기수선충당부채	202203	380190529
32650	푸른마을아파트	A13594203	가수금	202203	4070600
61250	시흥베르빌	A15303102	수선유지비충당부채	202203	3416960
54394	양평현대2차	A15010305	관리비예치금	202203	38688000
42585	현대리버빌2차아파트	A13887403	경비비충당부채	202203	8761155
25931	옥수중앙하이츠	A13383801	예수금	202203	1418800
46635	상계대림e-편한세상	A13983803	가수금	202203	918700

	아파트명	아파트코드	비용명	년월일	금액
42254	방이코오롱	A13883602	예수금	202203	2785027
28794	아크로힐스논현	A13501006	주차장충당부채	202203	0
66836	마곡수명산파크5단지	A15728007	미처분이익잉여금	202203	32401122
36394	반포삼호가든맨션5차	A13704101	일반관리비충당부채	202203	0
23240	창동동아	A13290003	미지급금	202203	164008310
65412	등촌주공2단지	A15703304	가수금	202203	1075640
59158	천왕이펜하우스1단지	A15213006	전신전화가입권	202203	0
61156	서울가든빌라	A15289508	미처분이익잉여금	202203	0
39860	오금삼성	A13813003	관리비예치금	202203	19212000
33307	돈암삼성임대	A13606106	기타충당부채	202203	0

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample