gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2126 (21.3%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:00:26.224003
Analysis finished	2024-05-11 06:00:27.356043
Duration	1.13 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2183
Distinct (%)	21.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	24
Median length	21
Mean length	7.2174
Min length	2

Characters and Unicode

Total characters	72174
Distinct characters	431
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	107 ?
Unique (%)	1.1%

Sample

1st row	왕십리텐즈힐2구역214동
2nd row	문래미원아파트
3rd row	공릉태릉우성
4th row	방화삼성
5th row	휘경주공2단지

Value	Count	Frequency (%)
아파트	112	1.1%
래미안	33	0.3%
신동아파밀리에	17	0.2%
아이파크	16	0.2%
북한산	16	0.2%
힐스테이트	15	0.1%
창동주공2단지	15	0.1%
sk뷰	13	0.1%
e편한세상	13	0.1%
중계성원2차	13	0.1%
Other values (2246)	10295	97.5%

Most occurring characters

Value	Count	Frequency (%)
아	2398	3.3%
파	2308	3.2%
트	2087	2.9%
지	1872	2.6%
대	1797	2.5%
동	1700	2.4%
차	1524	2.1%
단	1464	2.0%
신	1463	2.0%
성	1355	1.9%
Other values (421)	54206	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66204	91.7%
Decimal Number	3754	5.2%
Uppercase Letter	743	1.0%
Space Separator	617	0.9%
Lowercase Letter	318	0.4%
Close Punctuation	140	0.2%
Open Punctuation	140	0.2%
Dash Punctuation	138	0.2%
Other Punctuation	110	0.2%
Math Symbol	5	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2398	3.6%
파	2308	3.5%
트	2087	3.2%
지	1872	2.8%
대	1797	2.7%
동	1700	2.6%
차	1524	2.3%
단	1464	2.2%
신	1463	2.2%
성	1355	2.0%
Other values (375)	48236	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	138	18.6%
K	98	13.2%
C	81	10.9%
L	61	8.2%
D	50	6.7%
M	50	6.7%
H	49	6.6%
E	44	5.9%
I	40	5.4%
G	32	4.3%
Other values (7)	100	13.5%

Lowercase Letter

Value	Count	Frequency (%)
e	189	59.4%
l	30	9.4%
i	26	8.2%
v	19	6.0%
s	16	5.0%
k	15	4.7%
w	9	2.8%
c	6	1.9%
h	4	1.3%
a	2	0.6%

Decimal Number

Value	Count	Frequency (%)
1	1186	31.6%
2	1089	29.0%
3	481	12.8%
4	262	7.0%
5	202	5.4%
6	157	4.2%
7	113	3.0%
9	95	2.5%
8	94	2.5%
0	75	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	83	75.5%
.	27	24.5%

Space Separator

Value	Count	Frequency (%)
	617	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	140	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	140	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	138	100.0%

Math Symbol

Value	Count	Frequency (%)
~	5	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66204	91.7%
Common	4904	6.8%
Latin	1066	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2398	3.6%
파	2308	3.5%
트	2087	3.2%
지	1872	2.8%
대	1797	2.7%
동	1700	2.6%
차	1524	2.3%
단	1464	2.2%
신	1463	2.2%
성	1355	2.0%
Other values (375)	48236	72.9%

Latin

Value	Count	Frequency (%)
e	189	17.7%
S	138	12.9%
K	98	9.2%
C	81	7.6%
L	61	5.7%
D	50	4.7%
M	50	4.7%
H	49	4.6%
E	44	4.1%
I	40	3.8%
Other values (19)	266	25.0%

Common

Value	Count	Frequency (%)
1	1186	24.2%
2	1089	22.2%
	617	12.6%
3	481	9.8%
4	262	5.3%
5	202	4.1%
6	157	3.2%
)	140	2.9%
(	140	2.9%
-	138	2.8%
Other values (7)	492	10.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66204	91.7%
ASCII	5965	8.3%
Number Forms	5	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2398	3.6%
파	2308	3.5%
트	2087	3.2%
지	1872	2.8%
대	1797	2.7%
동	1700	2.6%
차	1524	2.3%
단	1464	2.2%
신	1463	2.2%
성	1355	2.0%
Other values (375)	48236	72.9%

ASCII

Value	Count	Frequency (%)
1	1186	19.9%
2	1089	18.3%
	617	10.3%
3	481	8.1%
4	262	4.4%
5	202	3.4%
e	189	3.2%
6	157	2.6%
)	140	2.3%
(	140	2.3%
Other values (35)	1502	25.2%

Number Forms

Value	Count	Frequency (%)
Ⅰ	5	100.0%

아파트코드
Text

Distinct	2190
Distinct (%)	21.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	107 ?
Unique (%)	1.1%

Sample

1st row	A13373302
2nd row	A15009601
3rd row	A13980009
4th row	A15722001
5th row	A13087407

Value	Count	Frequency (%)
a13204508	15	0.1%
a13986701	13	0.1%
a11081503	13	0.1%
a13676103	12	0.1%
a15210209	12	0.1%
a13377901	11	0.1%
a13770607	11	0.1%
a15883202	11	0.1%
a14272313	11	0.1%
a14320002	11	0.1%
Other values (2180)	9880	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18262	20.3%
1	17712	19.7%
A	9991	11.1%
3	8784	9.8%
2	8304	9.2%
5	6251	6.9%
8	5740	6.4%
7	4865	5.4%
4	3797	4.2%
6	3256	3.6%
Other values (2)	3038	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18262	22.8%
1	17712	22.1%
3	8784	11.0%
2	8304	10.4%
5	6251	7.8%
8	5740	7.2%
7	4865	6.1%
4	3797	4.7%
6	3256	4.1%
9	3029	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9991	99.9%
B	9	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18262	22.8%
1	17712	22.1%
3	8784	11.0%
2	8304	10.4%
5	6251	7.8%
8	5740	7.2%
7	4865	6.1%
4	3797	4.7%
6	3256	4.1%
9	3029	3.8%

Latin

Value	Count	Frequency (%)
A	9991	99.9%
B	9	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18262	20.3%
1	17712	19.7%
A	9991	11.1%
3	8784	9.8%
2	8304	9.2%
5	6251	6.9%
8	5740	6.4%
7	4865	5.4%
4	3797	4.2%
6	3256	3.6%
Other values (2)	3038	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9073
Min length	2

Characters and Unicode

Total characters	59073
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	선급비용
2nd row	관리비예치금
3rd row	미처분이익잉여금
4th row	주차장충당부채
5th row	기타의비유동부채

Value	Count	Frequency (%)
당기순이익	341	3.4%
예금	325	3.2%
비품	322	3.2%
퇴직급여충당부채	316	3.2%
선급비용	315	3.1%
현금	304	3.0%
예수금	304	3.0%
미부과관리비	303	3.0%
미처분이익잉여금	301	3.0%
관리비미수금	298	3.0%
Other values (67)	6871	68.7%

Most occurring characters

Value	Count	Frequency (%)
금	4686	7.9%
당	3730	6.3%
수	3180	5.4%
충	3032	5.1%
비	2990	5.1%
부	2927	5.0%
채	2611	4.4%
기	2370	4.0%
선	1925	3.3%
예	1741	2.9%
Other values (97)	29881	50.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59073	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4686	7.9%
당	3730	6.3%
수	3180	5.4%
충	3032	5.1%
비	2990	5.1%
부	2927	5.0%
채	2611	4.4%
기	2370	4.0%
선	1925	3.3%
예	1741	2.9%
Other values (97)	29881	50.6%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59073	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4686	7.9%
당	3730	6.3%
수	3180	5.4%
충	3032	5.1%
비	2990	5.1%
부	2927	5.0%
채	2611	4.4%
기	2370	4.0%
선	1925	3.3%
예	1741	2.9%
Other values (97)	29881	50.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59073	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4686	7.9%
당	3730	6.3%
수	3180	5.4%
충	3032	5.1%
비	2990	5.1%
부	2927	5.0%
채	2611	4.4%
기	2370	4.0%
선	1925	3.3%
예	1741	2.9%
Other values (97)	29881	50.6%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202004	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202004
2nd row	202004
3rd row	202004
4th row	202004
5th row	202004

Common Values

Value	Count	Frequency (%)
202004	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202004	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7528
Distinct (%)	75.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	69944404

Minimum	-8.1896599 × 10⁸
Maximum	5.1882838 × 10⁹
Zeros	2126
Zeros (%)	21.3%
Negative	326
Negative (%)	3.3%
Memory size	166.0 KiB

Quantile statistics

Minimum	-8.1896599 × 10⁸
5-th percentile	0
Q1	2987.5
median	3469435
Q3	36298634
95-th percentile	3.4628359 × 10⁸
Maximum	5.1882838 × 10⁹
Range	6.0072498 × 10⁹
Interquartile range (IQR)	36295646

Descriptive statistics

Standard deviation	2.4504074 × 10⁸
Coefficient of variation (CV)	3.5033645
Kurtosis	101.14417
Mean	69944404
Median Absolute Deviation (MAD)	3469435
Skewness	8.3687223
Sum	6.9944404 × 10¹¹
Variance	6.0044964 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2126	21.3%
500000	30	0.3%
250000	22	0.2%
300000	16	0.2%
10000000	13	0.1%
30000000	13	0.1%
3000000	12	0.1%
200000	12	0.1%
484000	12	0.1%
242000	11	0.1%
Other values (7518)	7733	77.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-818965991	1	< 0.1%
-492888411	1	< 0.1%
-292192150	1	< 0.1%
-240875890	1	< 0.1%
-239487120	1	< 0.1%
-189742270	1	< 0.1%
-166397700	1	< 0.1%
-156075187	1	< 0.1%
-141551042	1	< 0.1%
-109974210	1	< 0.1%

Value	Count	Frequency (%)
5188283800	2	< 0.1%
3990701541	1	< 0.1%
3628671451	1	< 0.1%
3612590323	1	< 0.1%
3419272874	1	< 0.1%
3361695523	1	< 0.1%
3295996232	2	< 0.1%
3243018406	1	< 0.1%
3158249491	1	< 0.1%
3029920900	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.504
금액	0.504	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
21794	왕십리텐즈힐2구역214동	A13373302	선급비용	202004	2311490
50309	문래미원아파트	A15009601	관리비예치금	202004	21728000
41325	공릉태릉우성	A13980009	미처분이익잉여금	202004	0
62334	방화삼성	A15722001	주차장충당부채	202004	3750619
15353	휘경주공2단지	A13087407	기타의비유동부채	202004	0
64200	등촌태진아름	A15784402	미부과관리비	202004	40204230
53366	관악국제산장	A15176701	현금	202004	446290
36436	마천우방	A13812004	기타공동주택관리비충당부채	202004	0
19370	창동주공2단지	A13204508	선수수익	202004	0
25001	길동우성2차	A13481305	경비비충당부채	202004	26060270

	아파트명	아파트코드	비용명	년월일	금액
41404	공릉우방4단지	A13980012	미지급비용	202004	131797360
24440	강일리버파크9단지	A13410007	공동주택적립금	202004	61053413
32114	삼선푸르지오아파트	A13672101	미지급금	202004	0
58776	래미안상도3차	A15603006	미수금	202004	668140
55735	신개봉삼환	A15280602	기타충당부채	202004	0
1851	DMC센트럴아이파크 관리사무소	A10025976	수선유지비충당부채	202004	16307340
47362	현대성우	A14281701	미수수익	202004	9520
9844	마포래미안푸르지오	A12175203	단기보증금	202004	8076915
27375	일원목련타운	A13523005	비품	202004	41909225
23546	천호우성	A13402103	승강기유지비충당부채	202004	4500500

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Math Symbol

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample