gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2008 (20.1%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:00:46.318136
Analysis finished	2024-05-11 06:00:47.232889
Duration	0.91 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2000
Distinct (%)	20.0%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.1868
Min length	2

Characters and Unicode

Total characters	71868
Distinct characters	428
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	72 ?
Unique (%)	0.7%

Sample

1st row	가락상아1차
2nd row	북한산힐스테이트3차
3rd row	방학동부센트레빌
4th row	자양현대
5th row	상계주공6단지

Value	Count	Frequency (%)
아파트	98	0.9%
래미안	32	0.3%
신반포	15	0.1%
서울숲힐스테이트	14	0.1%
신동아파밀리에	14	0.1%
신도림현대	14	0.1%
2단지	13	0.1%
현대	13	0.1%
홍제원	13	0.1%
신당남산타운(분양	13	0.1%
Other values (2059)	10272	97.7%

Most occurring characters

Value	Count	Frequency (%)
아	2207	3.1%
파	2206	3.1%
지	1984	2.8%
트	1928	2.7%
대	1749	2.4%
동	1628	2.3%
단	1566	2.2%
신	1519	2.1%
차	1491	2.1%
성	1289	1.8%
Other values (418)	54301	75.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	65999	91.8%
Decimal Number	3802	5.3%
Uppercase Letter	713	1.0%
Space Separator	562	0.8%
Lowercase Letter	321	0.4%
Close Punctuation	131	0.2%
Open Punctuation	131	0.2%
Dash Punctuation	118	0.2%
Other Punctuation	85	0.1%
Letter Number	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2207	3.3%
파	2206	3.3%
지	1984	3.0%
트	1928	2.9%
대	1749	2.7%
동	1628	2.5%
단	1566	2.4%
신	1519	2.3%
차	1491	2.3%
성	1289	2.0%
Other values (373)	48432	73.4%

Uppercase Letter

Value	Count	Frequency (%)
S	129	18.1%
K	102	14.3%
C	90	12.6%
L	54	7.6%
D	49	6.9%
M	49	6.9%
H	41	5.8%
G	36	5.0%
I	34	4.8%
E	30	4.2%
Other values (7)	99	13.9%

Lowercase Letter

Value	Count	Frequency (%)
e	183	57.0%
i	30	9.3%
l	28	8.7%
v	21	6.5%
s	13	4.0%
k	12	3.7%
w	11	3.4%
c	8	2.5%
a	5	1.6%
g	5	1.6%

Decimal Number

Value	Count	Frequency (%)
1	1219	32.1%
2	1105	29.1%
3	483	12.7%
4	257	6.8%
5	225	5.9%
6	148	3.9%
9	108	2.8%
7	98	2.6%
8	87	2.3%
0	72	1.9%

Other Punctuation

Value	Count	Frequency (%)
,	76	89.4%
.	9	10.6%

Space Separator

Value	Count	Frequency (%)
	562	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	131	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	131	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	118	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	65999	91.8%
Common	4829	6.7%
Latin	1040	1.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2207	3.3%
파	2206	3.3%
지	1984	3.0%
트	1928	2.9%
대	1749	2.7%
동	1628	2.5%
단	1566	2.4%
신	1519	2.3%
차	1491	2.3%
성	1289	2.0%
Other values (373)	48432	73.4%

Latin

Value	Count	Frequency (%)
e	183	17.6%
S	129	12.4%
K	102	9.8%
C	90	8.7%
L	54	5.2%
D	49	4.7%
M	49	4.7%
H	41	3.9%
G	36	3.5%
I	34	3.3%
Other values (19)	273	26.2%

Common

Value	Count	Frequency (%)
1	1219	25.2%
2	1105	22.9%
	562	11.6%
3	483	10.0%
4	257	5.3%
5	225	4.7%
6	148	3.1%
)	131	2.7%
(	131	2.7%
-	118	2.4%
Other values (6)	450	9.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	65999	91.8%
ASCII	5863	8.2%
Number Forms	6	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2207	3.3%
파	2206	3.3%
지	1984	3.0%
트	1928	2.9%
대	1749	2.7%
동	1628	2.5%
단	1566	2.4%
신	1519	2.3%
차	1491	2.3%
성	1289	2.0%
Other values (373)	48432	73.4%

ASCII

Value	Count	Frequency (%)
1	1219	20.8%
2	1105	18.8%
	562	9.6%
3	483	8.2%
4	257	4.4%
5	225	3.8%
e	183	3.1%
6	148	2.5%
)	131	2.2%
(	131	2.2%
Other values (34)	1419	24.2%

Number Forms

Value	Count	Frequency (%)
Ⅰ	6	100.0%

아파트코드
Text

Distinct	2006
Distinct (%)	20.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	72 ?
Unique (%)	0.7%

Sample

1st row	A13813004
2nd row	A12204004
3rd row	A13272102
4th row	A14319003
5th row	A13920707

Value	Count	Frequency (%)
a13378001	14	0.1%
a14377402	13	0.1%
a12078704	13	0.1%
a13483002	13	0.1%
a10045302	13	0.1%
a13186708	12	0.1%
a13986306	12	0.1%
a10027375	12	0.1%
a13410002	12	0.1%
a15786222	11	0.1%
Other values (1996)	9875	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18167	20.2%
1	17539	19.5%
A	9990	11.1%
3	8865	9.8%
2	8328	9.3%
5	6236	6.9%
8	5867	6.5%
7	4958	5.5%
4	3830	4.3%
6	3353	3.7%
Other values (2)	2867	3.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18167	22.7%
1	17539	21.9%
3	8865	11.1%
2	8328	10.4%
5	6236	7.8%
8	5867	7.3%
7	4958	6.2%
4	3830	4.8%
6	3353	4.2%
9	2857	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	9990	99.9%
B	10	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18167	22.7%
1	17539	21.9%
3	8865	11.1%
2	8328	10.4%
5	6236	7.8%
8	5867	7.3%
7	4958	6.2%
4	3830	4.8%
6	3353	4.2%
9	2857	3.6%

Latin

Value	Count	Frequency (%)
A	9990	99.9%
B	10	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18167	20.2%
1	17539	19.5%
A	9990	11.1%
3	8865	9.8%
2	8328	9.3%
5	6236	6.9%
8	5867	6.5%
7	4958	5.5%
4	3830	4.3%
6	3353	3.7%
Other values (2)	2867	3.2%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9375
Min length	2

Characters and Unicode

Total characters	59375
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	선급비용
2nd row	단기보증금
3rd row	기타시설운영충당부채
4th row	미처분이익잉여금
5th row	미지급금

Value	Count	Frequency (%)
관리비미수금	337	3.4%
예금	334	3.3%
선급비용	327	3.3%
비품	325	3.2%
미처분이익잉여금	318	3.2%
공동주택적립금	318	3.2%
당기순이익	310	3.1%
연차수당충당부채	305	3.0%
장기수선충당부채	303	3.0%
장기수선충당예금	303	3.0%
Other values (67)	6820	68.2%

Most occurring characters

Value	Count	Frequency (%)
금	4748	8.0%
당	3686	6.2%
수	3181	5.4%
비	3003	5.1%
충	3001	5.1%
부	2883	4.9%
채	2589	4.4%
기	2341	3.9%
선	1927	3.2%
예	1761	3.0%
Other values (97)	30255	51.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59375	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4748	8.0%
당	3686	6.2%
수	3181	5.4%
비	3003	5.1%
충	3001	5.1%
부	2883	4.9%
채	2589	4.4%
기	2341	3.9%
선	1927	3.2%
예	1761	3.0%
Other values (97)	30255	51.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59375	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4748	8.0%
당	3686	6.2%
수	3181	5.4%
비	3003	5.1%
충	3001	5.1%
부	2883	4.9%
채	2589	4.4%
기	2341	3.9%
선	1927	3.2%
예	1761	3.0%
Other values (97)	30255	51.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59375	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4748	8.0%
당	3686	6.2%
수	3181	5.4%
비	3003	5.1%
충	3001	5.1%
부	2883	4.9%
채	2589	4.4%
기	2341	3.9%
선	1927	3.2%
예	1761	3.0%
Other values (97)	30255	51.0%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202001	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202001
2nd row	202001
3rd row	202001
4th row	202001
5th row	202001

Common Values

Value	Count	Frequency (%)
202001	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202001	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7668
Distinct (%)	76.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	75489781

Minimum	-4.7775438 × 10⁸
Maximum	1.1661407 × 10¹⁰
Zeros	2008
Zeros (%)	20.1%
Negative	386
Negative (%)	3.9%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.7775438 × 10⁸
5-th percentile	0
Q1	15768
median	3388148.5
Q3	36244005
95-th percentile	3.83988 × 10⁸
Maximum	1.1661407 × 10¹⁰
Range	1.2139161 × 10¹⁰
Interquartile range (IQR)	36228237

Descriptive statistics

Standard deviation	3.081933 × 10⁸
Coefficient of variation (CV)	4.0825831
Kurtosis	400.26829
Mean	75489781
Median Absolute Deviation (MAD)	3388148.5
Skewness	15.749134
Sum	7.5489781 × 10¹¹
Variance	9.4983113 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2008	20.1%
500000	33	0.3%
250000	31	0.3%
300000	16	0.2%
5000	14	0.1%
1000000	13	0.1%
200000	12	0.1%
484000	10	0.1%
242000	9	0.1%
30000000	9	0.1%
Other values (7658)	7845	78.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-477754375	1	< 0.1%
-302145700	1	< 0.1%
-282000000	1	< 0.1%
-205956340	1	< 0.1%
-177053510	1	< 0.1%
-161481980	1	< 0.1%
-149282800	1	< 0.1%
-145971370	1	< 0.1%
-136095880	1	< 0.1%
-127648010	1	< 0.1%

Value	Count	Frequency (%)
11661406948	1	< 0.1%
8854326575	1	< 0.1%
7909385769	1	< 0.1%
6901374795	1	< 0.1%
6106284701	1	< 0.1%
5880083757	1	< 0.1%
5653287022	1	< 0.1%
5330459540	1	< 0.1%
4909231012	1	< 0.1%
4260978071	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.374
금액	0.374	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
32806	가락상아1차	A13813004	선급비용	202001	598280
10452	북한산힐스테이트3차	A12204004	단기보증금	202001	37310250
17506	방학동부센트레빌	A13272102	기타시설운영충당부채	202001	1206774
42683	자양현대	A14319003	미처분이익잉여금	202001	0
35769	상계주공6단지	A13920707	미지급금	202001	0
45472	양평경남1차	A15010302	전신전화가입권	202001	250000
45199	문래현대5차아파트	A15009504	퇴직급여충당예금	202001	22549640
36450	중계주공7단지	A13922910	공동주택적립금	202001	10127562
7248	홍제원현대임대	A12078707	당기순이익	202001	702482
35891	상계주공10단지	A13920804	선수전기료	202001	3376390

	아파트명	아파트코드	비용명	년월일	금액
17535	도봉서원제2	A13275302	예금	202001	131330121
24917	도곡1차아이파크	A13527007	세대배부용비품	202001	593000
37431	공릉동신	A13980411	연차수당충당부채	202001	5691720
23934	청담삼환	A13510201	가지급금	202001	67950
13747	청량리미주	A13086705	장기수선충당부채	202001	1611998025
48141	봉천두산3단지	A15178203	주차장충당부채	202001	0
38607	상계현대3차	A13983712	당기순이익	202001	7564626
21402	성내삼성	A13403101	관리비예치금	202001	192150000
4427	e편한세상마포리버파크	A10028006	미수금	202001	0
24906	도곡1차아이파크	A13527007	연차수당충당부채	202001	14075870

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample