gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2150 (21.5%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:59.713021
Analysis finished	2024-05-11 06:00:00.767087
Duration	1.05 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2186
Distinct (%)	21.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.2404
Min length	2

Characters and Unicode

Total characters	72404
Distinct characters	435
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	115 ?
Unique (%)	1.1%

Sample

1st row	면목두산2.3차
2nd row	독산현대
3rd row	신내경남아너스빌
4th row	트리마제
5th row	녹천역두산위브아파트

Value	Count	Frequency (%)
아파트	129	1.2%
래미안	27	0.3%
아이파크	19	0.2%
힐스테이트	17	0.2%
서울숲2차푸르지오임대	15	0.1%
신도림현대	14	0.1%
e편한세상신촌아파트	13	0.1%
도화현대1차아파트	12	0.1%
마포래미안푸르지오	12	0.1%
신반포	12	0.1%
Other values (2252)	10331	97.5%

Most occurring characters

Value	Count	Frequency (%)
아	2460	3.4%
파	2413	3.3%
트	2161	3.0%
지	1857	2.6%
대	1797	2.5%
동	1657	2.3%
차	1496	2.1%
신	1471	2.0%
단	1443	2.0%
성	1356	1.9%
Other values (425)	54293	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66453	91.8%
Decimal Number	3663	5.1%
Uppercase Letter	746	1.0%
Space Separator	678	0.9%
Lowercase Letter	319	0.4%
Open Punctuation	143	0.2%
Close Punctuation	143	0.2%
Dash Punctuation	129	0.2%
Other Punctuation	119	0.2%
Math Symbol	7	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2460	3.7%
파	2413	3.6%
트	2161	3.3%
지	1857	2.8%
대	1797	2.7%
동	1657	2.5%
차	1496	2.3%
신	1471	2.2%
단	1443	2.2%
성	1356	2.0%
Other values (379)	48342	72.7%

Uppercase Letter

Value	Count	Frequency (%)
S	121	16.2%
C	105	14.1%
K	99	13.3%
M	67	9.0%
D	67	9.0%
L	54	7.2%
H	48	6.4%
I	37	5.0%
E	32	4.3%
V	25	3.4%
Other values (7)	91	12.2%

Lowercase Letter

Value	Count	Frequency (%)
e	195	61.1%
l	30	9.4%
i	27	8.5%
v	18	5.6%
k	12	3.8%
s	12	3.8%
w	8	2.5%
c	6	1.9%
a	4	1.3%
g	4	1.3%

Decimal Number

Value	Count	Frequency (%)
1	1140	31.1%
2	1045	28.5%
3	490	13.4%
4	251	6.9%
5	201	5.5%
6	153	4.2%
7	128	3.5%
8	99	2.7%
9	83	2.3%
0	73	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	100	84.0%
.	19	16.0%

Space Separator

Value	Count	Frequency (%)
	678	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	143	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	143	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	129	100.0%

Math Symbol

Value	Count	Frequency (%)
~	7	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66453	91.8%
Common	4882	6.7%
Latin	1069	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2460	3.7%
파	2413	3.6%
트	2161	3.3%
지	1857	2.8%
대	1797	2.7%
동	1657	2.5%
차	1496	2.3%
신	1471	2.2%
단	1443	2.2%
성	1356	2.0%
Other values (379)	48342	72.7%

Latin

Value	Count	Frequency (%)
e	195	18.2%
S	121	11.3%
C	105	9.8%
K	99	9.3%
M	67	6.3%
D	67	6.3%
L	54	5.1%
H	48	4.5%
I	37	3.5%
E	32	3.0%
Other values (19)	244	22.8%

Common

Value	Count	Frequency (%)
1	1140	23.4%
2	1045	21.4%
	678	13.9%
3	490	10.0%
4	251	5.1%
5	201	4.1%
6	153	3.1%
(	143	2.9%
)	143	2.9%
-	129	2.6%
Other values (7)	509	10.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66453	91.8%
ASCII	5947	8.2%
Number Forms	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2460	3.7%
파	2413	3.6%
트	2161	3.3%
지	1857	2.8%
대	1797	2.7%
동	1657	2.5%
차	1496	2.3%
신	1471	2.2%
단	1443	2.2%
성	1356	2.0%
Other values (379)	48342	72.7%

ASCII

Value	Count	Frequency (%)
1	1140	19.2%
2	1045	17.6%
	678	11.4%
3	490	8.2%
4	251	4.2%
5	201	3.4%
e	195	3.3%
6	153	2.6%
(	143	2.4%
)	143	2.4%
Other values (35)	1508	25.4%

Number Forms

Value	Count	Frequency (%)
Ⅰ	4	100.0%

아파트코드
Text

Distinct	2193
Distinct (%)	21.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	115 ?
Unique (%)	1.1%

Sample

1st row	A13188406
2nd row	A15381303
3rd row	A13113006
4th row	A10026988
5th row	A10027121

Value	Count	Frequency (%)
a10026370	13	0.1%
a12181406	12	0.1%
a12175203	12	0.1%
a15681503	12	0.1%
a13592604	11	0.1%
a13986306	11	0.1%
a15286809	11	0.1%
a13408003	11	0.1%
a15603203	11	0.1%
a13481305	11	0.1%
Other values (2183)	9885	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18309	20.3%
1	17614	19.6%
A	9984	11.1%
3	8951	9.9%
2	8255	9.2%
5	6163	6.8%
8	5717	6.4%
7	4818	5.4%
4	3816	4.2%
6	3370	3.7%
Other values (2)	3003	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18309	22.9%
1	17614	22.0%
3	8951	11.2%
2	8255	10.3%
5	6163	7.7%
8	5717	7.1%
7	4818	6.0%
4	3816	4.8%
6	3370	4.2%
9	2987	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9984	99.8%
B	16	0.2%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18309	22.9%
1	17614	22.0%
3	8951	11.2%
2	8255	10.3%
5	6163	7.7%
8	5717	7.1%
7	4818	6.0%
4	3816	4.8%
6	3370	4.2%
9	2987	3.7%

Latin

Value	Count	Frequency (%)
A	9984	99.8%
B	16	0.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18309	20.3%
1	17614	19.6%
A	9984	11.1%
3	8951	9.9%
2	8255	9.2%
5	6163	6.8%
8	5717	6.4%
7	4818	5.4%
4	3816	4.2%
6	3370	3.7%
Other values (2)	3003	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9464
Min length	2

Characters and Unicode

Total characters	59464
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	비품감가상각누계액
2nd row	비품
3rd row	가지급금
4th row	미처분이익잉여금
5th row	미수관리비예치금

Value	Count	Frequency (%)
예금	345	3.5%
선급비용	339	3.4%
미처분이익잉여금	332	3.3%
퇴직급여충당부채	332	3.3%
예수금	330	3.3%
공동주택적립금	323	3.2%
당기순이익	317	3.2%
비품	307	3.1%
수선유지비충당부채	303	3.0%
장기수선충당부채	295	2.9%
Other values (67)	6777	67.8%

Most occurring characters

Value	Count	Frequency (%)
금	4655	7.8%
당	3752	6.3%
수	3089	5.2%
충	3038	5.1%
비	3005	5.1%
부	2959	5.0%
채	2662	4.5%
기	2367	4.0%
선	1942	3.3%
예	1749	2.9%
Other values (97)	30246	50.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59464	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4655	7.8%
당	3752	6.3%
수	3089	5.2%
충	3038	5.1%
비	3005	5.1%
부	2959	5.0%
채	2662	4.5%
기	2367	4.0%
선	1942	3.3%
예	1749	2.9%
Other values (97)	30246	50.9%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59464	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3752	6.3%
수	3089	5.2%
충	3038	5.1%
비	3005	5.1%
부	2959	5.0%
채	2662	4.5%
기	2367	4.0%
선	1942	3.3%
예	1749	2.9%
Other values (97)	30246	50.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59464	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3752	6.3%
수	3089	5.2%
충	3038	5.1%
비	3005	5.1%
부	2959	5.0%
채	2662	4.5%
기	2367	4.0%
선	1942	3.3%
예	1749	2.9%
Other values (97)	30246	50.9%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202008	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202008
2nd row	202008
3rd row	202008
4th row	202008
5th row	202008

Common Values

Value	Count	Frequency (%)
202008	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202008	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7517
Distinct (%)	75.2%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	72024041

Minimum	-3.8900128 × 10⁸
Maximum	1.1724908 × 10¹⁰
Zeros	2150
Zeros (%)	21.5%
Negative	334
Negative (%)	3.3%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.8900128 × 10⁸
5-th percentile	0
Q1	386
median	3566920
Q3	34730592
95-th percentile	3.5346555 × 10⁸
Maximum	1.1724908 × 10¹⁰
Range	1.2113909 × 10¹⁰
Interquartile range (IQR)	34730206

Descriptive statistics

Standard deviation	2.8428218 × 10⁸
Coefficient of variation (CV)	3.9470456
Kurtosis	371.56458
Mean	72024041
Median Absolute Deviation (MAD)	3566920
Skewness	14.023306
Sum	7.2024041 × 10¹¹
Variance	8.0816356 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2150	21.5%
500000	26	0.3%
250000	19	0.2%
300000	16	0.2%
1000000	14	0.1%
484000	14	0.1%
242000	11	0.1%
100000	11	0.1%
5000000	10	0.1%
2000000	10	0.1%
Other values (7507)	7719	77.2%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-389001283	1	< 0.1%
-275911534	1	< 0.1%
-241396750	1	< 0.1%
-235006308	1	< 0.1%
-178797700	1	< 0.1%
-174573633	1	< 0.1%
-173098610	1	< 0.1%
-162096680	1	< 0.1%
-161866980	1	< 0.1%
-156648860	1	< 0.1%

Value	Count	Frequency (%)
11724907627	1	< 0.1%
6016150639	1	< 0.1%
5173392238	1	< 0.1%
5064404578	1	< 0.1%
4788302087	1	< 0.1%
4725614202	1	< 0.1%
3959218905	1	< 0.1%
3939755056	1	< 0.1%
3893339909	1	< 0.1%
3834291108	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.420
금액	0.420	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
18234	면목두산2.3차	A13188406	비품감가상각누계액	202008	-4935255
58077	독산현대	A15381303	비품	202008	925780
16005	신내경남아너스빌	A13113006	가지급금	202008	120070
3700	트리마제	A10026988	미처분이익잉여금	202008	0
3875	녹천역두산위브아파트	A10027121	미수관리비예치금	202008	1120000
9812	상암월드컵파크3단지	A12127003	상여충당부채	202008	3245699
54141	신림푸르지오	A15190705	미부과관리비	202008	255476916
52462	롯데캐슬아이비	A15088915	예수금	202008	687978
26219	마일스디오빌	A13501002	퇴직급여충당부채	202008	0
43987	월계청백3단지	A13985105	미지급금	202008	17877550

	아파트명	아파트코드	비용명	년월일	금액
25104	고덕리엔파크1단지	A13410012	저장품	202008	77850
22208	성수2차대우	A13372101	예금	202008	98703252
43521	상계대림e-편한세상	A13983803	비품감가상각누계액	202008	-11805619
12150	북한산힐스테이트3차	A12204004	연차수당충당부채	202008	27701030
28840	청담건영아파트	A13576201	미부과관리비	202008	33454981
64180	한사랑2차삼성아파트(등촌동)	A15783907	현금	202008	109684
60943	사당우성2단지	A15681502	임대보증금	202008	1000000
29865	역삼래미안	A13592706	주차장충당예금	202008	0
66444	삼성쉐르빌1 아파트	A15807603	기타투자자산	202008	175507520
43136	상계보람	A13982604	선수난방비	202008	11085093

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Math Symbol

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample