gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 21.57862091)	Skewed
`금액` has 2504 (25.0%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:57:43.199069
Analysis finished	2024-05-11 05:57:44.829082
Duration	1.63 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2259
Distinct (%)	22.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	21
Mean length	7.3617
Min length	2

Characters and Unicode

Total characters	73617
Distinct characters	436
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	125 ?
Unique (%)	1.2%

Sample

1st row	원효산호
2nd row	우장산한화꿈에그린
3rd row	신림동부
4th row	구의현대6단지
5th row	천호삼익

Value	Count	Frequency (%)
아파트	158	1.5%
래미안	44	0.4%
아이파크	21	0.2%
e편한세상	18	0.2%
경남아너스빌	17	0.2%
신반포	17	0.2%
sk뷰	16	0.1%
해모로	16	0.1%
꿈의숲	13	0.1%
방화2-2	12	0.1%
Other values (2339)	10417	96.9%

Most occurring characters

Value	Count	Frequency (%)
아	2568	3.5%
파	2462	3.3%
트	2331	3.2%
지	1842	2.5%
대	1697	2.3%
동	1667	2.3%
신	1509	2.0%
차	1453	2.0%
이	1443	2.0%
단	1434	1.9%
Other values (426)	55211	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67349	91.5%
Decimal Number	3663	5.0%
Uppercase Letter	846	1.1%
Space Separator	832	1.1%
Lowercase Letter	364	0.5%
Open Punctuation	150	0.2%
Close Punctuation	150	0.2%
Dash Punctuation	140	0.2%
Other Punctuation	120	0.2%
Letter Number	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2568	3.8%
파	2462	3.7%
트	2331	3.5%
지	1842	2.7%
대	1697	2.5%
동	1667	2.5%
신	1509	2.2%
차	1453	2.2%
이	1443	2.1%
단	1434	2.1%
Other values (381)	48943	72.7%

Uppercase Letter

Value	Count	Frequency (%)
S	139	16.4%
C	117	13.8%
K	98	11.6%
D	84	9.9%
M	84	9.9%
L	68	8.0%
H	64	7.6%
E	34	4.0%
I	34	4.0%
G	33	3.9%
Other values (7)	91	10.8%

Lowercase Letter

Value	Count	Frequency (%)
e	189	51.9%
l	45	12.4%
i	35	9.6%
v	28	7.7%
s	19	5.2%
k	17	4.7%
w	11	3.0%
h	8	2.2%
c	6	1.6%
a	3	0.8%

Decimal Number

Value	Count	Frequency (%)
2	1090	29.8%
1	1044	28.5%
3	479	13.1%
4	300	8.2%
5	234	6.4%
6	147	4.0%
7	111	3.0%
9	97	2.6%
8	83	2.3%
0	78	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	96	80.0%
.	24	20.0%

Space Separator

Value	Count	Frequency (%)
	832	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	150	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	150	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	140	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67349	91.5%
Common	5055	6.9%
Latin	1213	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2568	3.8%
파	2462	3.7%
트	2331	3.5%
지	1842	2.7%
대	1697	2.5%
동	1667	2.5%
신	1509	2.2%
차	1453	2.2%
이	1443	2.1%
단	1434	2.1%
Other values (381)	48943	72.7%

Latin

Value	Count	Frequency (%)
e	189	15.6%
S	139	11.5%
C	117	9.6%
K	98	8.1%
D	84	6.9%
M	84	6.9%
L	68	5.6%
H	64	5.3%
l	45	3.7%
i	35	2.9%
Other values (19)	290	23.9%

Common

Value	Count	Frequency (%)
2	1090	21.6%
1	1044	20.7%
	832	16.5%
3	479	9.5%
4	300	5.9%
5	234	4.6%
(	150	3.0%
)	150	3.0%
6	147	2.9%
-	140	2.8%
Other values (6)	489	9.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67349	91.5%
ASCII	6265	8.5%
Number Forms	3	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2568	3.8%
파	2462	3.7%
트	2331	3.5%
지	1842	2.7%
대	1697	2.5%
동	1667	2.5%
신	1509	2.2%
차	1453	2.2%
이	1443	2.1%
단	1434	2.1%
Other values (381)	48943	72.7%

ASCII

Value	Count	Frequency (%)
2	1090	17.4%
1	1044	16.7%
	832	13.3%
3	479	7.6%
4	300	4.8%
5	234	3.7%
e	189	3.0%
(	150	2.4%
)	150	2.4%
6	147	2.3%
Other values (34)	1650	26.3%

Number Forms

Value	Count	Frequency (%)
Ⅰ	3	100.0%

아파트코드
Text

Distinct	2265
Distinct (%)	22.7%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	125 ?
Unique (%)	1.2%

Sample

1st row	A14085002
2nd row	A15701004
3rd row	A15101101
4th row	A14383203
5th row	A13486701

Value	Count	Frequency (%)
a13082502	12	0.1%
a13187406	12	0.1%
a15785711	12	0.1%
a13178101	12	0.1%
a12013003	12	0.1%
a15886504	12	0.1%
a15085805	11	0.1%
a13986306	11	0.1%
a13528003	11	0.1%
a12070101	11	0.1%
Other values (2255)	9884	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18495	20.5%
1	17565	19.5%
A	9996	11.1%
3	8804	9.8%
2	8303	9.2%
5	6141	6.8%
8	5623	6.2%
7	4747	5.3%
4	4077	4.5%
6	3264	3.6%
Other values (2)	2985	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18495	23.1%
1	17565	22.0%
3	8804	11.0%
2	8303	10.4%
5	6141	7.7%
8	5623	7.0%
7	4747	5.9%
4	4077	5.1%
6	3264	4.1%
9	2981	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9996	> 99.9%
B	4	< 0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18495	23.1%
1	17565	22.0%
3	8804	11.0%
2	8303	10.4%
5	6141	7.7%
8	5623	7.0%
7	4747	5.9%
4	4077	5.1%
6	3264	4.1%
9	2981	3.7%

Latin

Value	Count	Frequency (%)
A	9996	> 99.9%
B	4	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18495	20.5%
1	17565	19.5%
A	9996	11.1%
3	8804	9.8%
2	8303	9.2%
5	6141	6.8%
8	5623	6.2%
7	4747	5.3%
4	4077	4.5%
6	3264	3.6%
Other values (2)	2985	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	6.0515
Min length	2

Characters and Unicode

Total characters	60515
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	기타당좌자산
2nd row	선급비용
3rd row	가지급금
4th row	퇴직급여충당부채
5th row	가지급금

Value	Count	Frequency (%)
미처분이익잉여금	342	3.4%
퇴직급여충당부채	332	3.3%
당기순이익	317	3.2%
공동주택적립금	306	3.1%
장기수선충당부채	302	3.0%
연차수당충당부채	298	3.0%
선급비용	296	3.0%
예금	295	2.9%
가수금	293	2.9%
비품	293	2.9%
Other values (67)	6926	69.3%

Most occurring characters

Value	Count	Frequency (%)
금	4541	7.5%
당	3917	6.5%
충	3111	5.1%
수	3055	5.0%
비	3030	5.0%
부	2965	4.9%
채	2685	4.4%
기	2428	4.0%
선	1865	3.1%
예	1713	2.8%
Other values (97)	31205	51.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60515	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4541	7.5%
당	3917	6.5%
충	3111	5.1%
수	3055	5.0%
비	3030	5.0%
부	2965	4.9%
채	2685	4.4%
기	2428	4.0%
선	1865	3.1%
예	1713	2.8%
Other values (97)	31205	51.6%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60515	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4541	7.5%
당	3917	6.5%
충	3111	5.1%
수	3055	5.0%
비	3030	5.0%
부	2965	4.9%
채	2685	4.4%
기	2428	4.0%
선	1865	3.1%
예	1713	2.8%
Other values (97)	31205	51.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60515	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4541	7.5%
당	3917	6.5%
충	3111	5.1%
수	3055	5.0%
비	3030	5.0%
부	2965	4.9%
채	2685	4.4%
기	2428	4.0%
선	1865	3.1%
예	1713	2.8%
Other values (97)	31205	51.6%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202207	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202207
2nd row	202207
3rd row	202207
4th row	202207
5th row	202207

Common Values

Value	Count	Frequency (%)
202207	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202207	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7174
Distinct (%)	71.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	77774383

Minimum	-4.09024 × 10⁹
Maximum	1.9396992 × 10¹⁰
Zeros	2504
Zeros (%)	25.0%
Negative	350
Negative (%)	3.5%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.09024 × 10⁹
5-th percentile	0
Q1	0
median	2766160.5
Q3	36842597
95-th percentile	3.6708453 × 10⁸
Maximum	1.9396992 × 10¹⁰
Range	2.3487232 × 10¹⁰
Interquartile range (IQR)	36842597

Descriptive statistics

Standard deviation	3.5903193 × 10⁸
Coefficient of variation (CV)	4.6163264
Kurtosis	916.08661
Mean	77774383
Median Absolute Deviation (MAD)	2766160.5
Skewness	21.578621
Sum	7.7774383 × 10¹¹
Variance	1.2890393 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2504	25.0%
500000	32	0.3%
250000	22	0.2%
300000	19	0.2%
484000	14	0.1%
1000000	12	0.1%
10000000	12	0.1%
20000000	11	0.1%
5000000	11	0.1%
100000	11	0.1%
Other values (7164)	7352	73.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-4090240000	1	< 0.1%
-328099290	1	< 0.1%
-321546036	1	< 0.1%
-163222500	1	< 0.1%
-93463340	1	< 0.1%
-89099256	1	< 0.1%
-85732520	1	< 0.1%
-85361160	1	< 0.1%
-83889710	1	< 0.1%
-81999544	1	< 0.1%

Value	Count	Frequency (%)
19396992307	1	< 0.1%
7637948975	1	< 0.1%
6062532886	1	< 0.1%
5551857570	1	< 0.1%
5451557816	1	< 0.1%
5260505618	1	< 0.1%
5168539258	1	< 0.1%
4877444847	1	< 0.1%
4809640569	1	< 0.1%
4668714394	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.504
금액	0.504	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
49867	원효산호	A14085002	기타당좌자산	202207	0
65080	우장산한화꿈에그린	A15701004	선급비용	202207	1694900
56404	신림동부	A15101101	가지급금	202207	544930
52567	구의현대6단지	A14383203	퇴직급여충당부채	202207	50903701
28655	천호삼익	A13486701	가지급금	202207	3046200
34221	길음SHVILLE	A13611009	상여충당부채	202207	0
53035	여의도삼부	A15001020	퇴직급여충당예금	202207	353547483
42719	풍납 현대리버빌1차	A13887405	공동주택적립금	202207	8952730
62523	대방경남아너스빌	A15602001	당기순이익	202207	7935531
71749	신정5차현대	A15886504	비품	202207	637000

	아파트명	아파트코드	비용명	년월일	금액
35823	정릉우정에쉐르	A13677807	관리비미수금	202207	12158020
63082	노량진쌍용예가	A15605003	선수전기료	202207	0
33729	정릉산장	A13610004	주차장충당예금	202207	8072802
54387	양평삼성래미안	A15010202	당기순이익	202207	19526424
60181	신구로현대	A15283902	비품	202207	5620000
44069	중계한화꿈에그린	A13922905	상여충당부채	202207	0
47050	월계사슴2단지	A13984409	선급비용	202207	0
43772	중계3벽산	A13922103	비품	202207	4555140
15087	북한산래미안	A12275201	비품감가상각누계액	202207	-31344190
7084	래미안프레비뉴	A10027755	기타재고자산	202207	251175744

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample