gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2231 (22.3%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:00:12.604539
Analysis finished	2024-05-11 06:00:13.622948
Duration	1.02 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2183
Distinct (%)	21.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	24
Median length	21
Mean length	7.2304
Min length	2

Characters and Unicode

Total characters	72304
Distinct characters	433
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	125 ?
Unique (%)	1.2%

Sample

1st row	등촌태진아름
2nd row	상계주공2단지
3rd row	신내5단지대림두산
4th row	우장산한화꿈에그린
5th row	염창한마음삼성

Value	Count	Frequency (%)
아파트	115	1.1%
래미안	36	0.3%
북한산	17	0.2%
래미안밤섬리베뉴	16	0.2%
e편한세상	15	0.1%
신반포	14	0.1%
힐스테이트	13	0.1%
2단지	12	0.1%
고덕	12	0.1%
염창	12	0.1%
Other values (2246)	10277	97.5%

Most occurring characters

Value	Count	Frequency (%)
아	2352	3.3%
파	2321	3.2%
트	2076	2.9%
대	1868	2.6%
지	1829	2.5%
동	1762	2.4%
차	1561	2.2%
신	1529	2.1%
단	1433	2.0%
성	1311	1.8%
Other values (423)	54262	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66402	91.8%
Decimal Number	3752	5.2%
Uppercase Letter	686	0.9%
Space Separator	607	0.8%
Lowercase Letter	341	0.5%
Dash Punctuation	134	0.2%
Close Punctuation	133	0.2%
Open Punctuation	133	0.2%
Other Punctuation	102	0.1%
Letter Number	9	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2352	3.5%
파	2321	3.5%
트	2076	3.1%
대	1868	2.8%
지	1829	2.8%
동	1762	2.7%
차	1561	2.4%
신	1529	2.3%
단	1433	2.2%
성	1311	2.0%
Other values (377)	48360	72.8%

Uppercase Letter

Value	Count	Frequency (%)
S	128	18.7%
K	87	12.7%
C	75	10.9%
L	51	7.4%
D	49	7.1%
M	49	7.1%
H	42	6.1%
G	39	5.7%
I	37	5.4%
E	36	5.2%
Other values (7)	93	13.6%

Lowercase Letter

Value	Count	Frequency (%)
e	202	59.2%
i	29	8.5%
l	28	8.2%
v	20	5.9%
k	15	4.4%
s	15	4.4%
w	10	2.9%
c	8	2.3%
g	5	1.5%
a	5	1.5%

Decimal Number

Value	Count	Frequency (%)
1	1164	31.0%
2	1040	27.7%
3	513	13.7%
4	252	6.7%
5	216	5.8%
6	165	4.4%
7	125	3.3%
8	98	2.6%
9	91	2.4%
0	88	2.3%

Other Punctuation

Value	Count	Frequency (%)
,	79	77.5%
.	23	22.5%

Space Separator

Value	Count	Frequency (%)
	607	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	134	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	133	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	133	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	9	100.0%

Math Symbol

Value	Count	Frequency (%)
~	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66402	91.8%
Common	4866	6.7%
Latin	1036	1.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2352	3.5%
파	2321	3.5%
트	2076	3.1%
대	1868	2.8%
지	1829	2.8%
동	1762	2.7%
차	1561	2.4%
신	1529	2.3%
단	1433	2.2%
성	1311	2.0%
Other values (377)	48360	72.8%

Latin

Value	Count	Frequency (%)
e	202	19.5%
S	128	12.4%
K	87	8.4%
C	75	7.2%
L	51	4.9%
D	49	4.7%
M	49	4.7%
H	42	4.1%
G	39	3.8%
I	37	3.6%
Other values (19)	277	26.7%

Common

Value	Count	Frequency (%)
1	1164	23.9%
2	1040	21.4%
	607	12.5%
3	513	10.5%
4	252	5.2%
5	216	4.4%
6	165	3.4%
-	134	2.8%
)	133	2.7%
(	133	2.7%
Other values (7)	509	10.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66402	91.8%
ASCII	5893	8.2%
Number Forms	9	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2352	3.5%
파	2321	3.5%
트	2076	3.1%
대	1868	2.8%
지	1829	2.8%
동	1762	2.7%
차	1561	2.4%
신	1529	2.3%
단	1433	2.2%
성	1311	2.0%
Other values (377)	48360	72.8%

ASCII

Value	Count	Frequency (%)
1	1164	19.8%
2	1040	17.6%
	607	10.3%
3	513	8.7%
4	252	4.3%
5	216	3.7%
e	202	3.4%
6	165	2.8%
-	134	2.3%
)	133	2.3%
Other values (35)	1467	24.9%

Number Forms

Value	Count	Frequency (%)
Ⅰ	9	100.0%

아파트코드
Text

Distinct	2190
Distinct (%)	21.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	125 ?
Unique (%)	1.2%

Sample

1st row	A15784402
2nd row	A13983004
3rd row	A13184610
4th row	A15701004
5th row	A15786118

Value	Count	Frequency (%)
a12170601	12	0.1%
a13204510	12	0.1%
a13684605	12	0.1%
a13486504	12	0.1%
a13010005	12	0.1%
a13611005	12	0.1%
a13880806	11	0.1%
a15080604	11	0.1%
a15807703	11	0.1%
a13290003	11	0.1%
Other values (2180)	9884	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18251	20.3%
1	17746	19.7%
A	9983	11.1%
3	8820	9.8%
2	8349	9.3%
5	6194	6.9%
8	5785	6.4%
7	4800	5.3%
4	3828	4.3%
6	3319	3.7%
Other values (2)	2925	3.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18251	22.8%
1	17746	22.2%
3	8820	11.0%
2	8349	10.4%
5	6194	7.7%
8	5785	7.2%
7	4800	6.0%
4	3828	4.8%
6	3319	4.1%
9	2908	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	9983	99.8%
B	17	0.2%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18251	22.8%
1	17746	22.2%
3	8820	11.0%
2	8349	10.4%
5	6194	7.7%
8	5785	7.2%
7	4800	6.0%
4	3828	4.8%
6	3319	4.1%
9	2908	3.6%

Latin

Value	Count	Frequency (%)
A	9983	99.8%
B	17	0.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18251	20.3%
1	17746	19.7%
A	9983	11.1%
3	8820	9.8%
2	8349	9.3%
5	6194	6.9%
8	5785	6.4%
7	4800	5.3%
4	3828	4.3%
6	3319	3.7%
Other values (2)	2925	3.2%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	6.0154
Min length	2

Characters and Unicode

Total characters	60154
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	수선유지비충당부채
2nd row	미처분이익잉여금
3rd row	관리비미수금
4th row	미부과관리비
5th row	예금

Value	Count	Frequency (%)
장기수선충당예금	329	3.3%
당기순이익	319	3.2%
예금	317	3.2%
관리비미수금	313	3.1%
미처분이익잉여금	312	3.1%
공동주택적립금	310	3.1%
퇴직급여충당부채	304	3.0%
가수금	304	3.0%
예수금	300	3.0%
수선유지비충당부채	296	3.0%
Other values (67)	6896	69.0%

Most occurring characters

Value	Count	Frequency (%)
금	4700	7.8%
당	3857	6.4%
수	3234	5.4%
충	3160	5.3%
부	2998	5.0%
비	2997	5.0%
채	2695	4.5%
기	2425	4.0%
선	1919	3.2%
예	1820	3.0%
Other values (97)	30349	50.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60154	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4700	7.8%
당	3857	6.4%
수	3234	5.4%
충	3160	5.3%
부	2998	5.0%
비	2997	5.0%
채	2695	4.5%
기	2425	4.0%
선	1919	3.2%
예	1820	3.0%
Other values (97)	30349	50.5%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60154	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4700	7.8%
당	3857	6.4%
수	3234	5.4%
충	3160	5.3%
부	2998	5.0%
비	2997	5.0%
채	2695	4.5%
기	2425	4.0%
선	1919	3.2%
예	1820	3.0%
Other values (97)	30349	50.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60154	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4700	7.8%
당	3857	6.4%
수	3234	5.4%
충	3160	5.3%
부	2998	5.0%
비	2997	5.0%
채	2695	4.5%
기	2425	4.0%
선	1919	3.2%
예	1820	3.0%
Other values (97)	30349	50.5%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202006	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202006
2nd row	202006
3rd row	202006
4th row	202006
5th row	202006

Common Values

Value	Count	Frequency (%)
202006	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202006	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7426
Distinct (%)	74.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	75773341

Minimum	-5.5916345 × 10⁸
Maximum	6.7604684 × 10⁹
Zeros	2231
Zeros (%)	22.3%
Negative	314
Negative (%)	3.1%
Memory size	166.0 KiB

Quantile statistics

Minimum	-5.5916345 × 10⁸
5-th percentile	0
Q1	0
median	3233375
Q3	34059952
95-th percentile	3.7427926 × 10⁸
Maximum	6.7604684 × 10⁹
Range	7.3196319 × 10⁹
Interquartile range (IQR)	34059952

Descriptive statistics

Standard deviation	2.9133043 × 10⁸
Coefficient of variation (CV)	3.8447616
Kurtosis	134.05564
Mean	75773341
Median Absolute Deviation (MAD)	3233375
Skewness	9.6467215
Sum	7.5773341 × 10¹¹
Variance	8.4873421 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2231	22.3%
500000	28	0.3%
250000	27	0.3%
300000	18	0.2%
484000	14	0.1%
10000000	13	0.1%
2000000	13	0.1%
250400	12	0.1%
100000	10	0.1%
20000000	8	0.1%
Other values (7416)	7626	76.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-559163452	1	< 0.1%
-349832798	1	< 0.1%
-273320722	1	< 0.1%
-246078175	1	< 0.1%
-237355800	1	< 0.1%
-152477584	1	< 0.1%
-144739812	1	< 0.1%
-134212500	1	< 0.1%
-118828690	1	< 0.1%
-113580500	1	< 0.1%

Value	Count	Frequency (%)
6760468422	1	< 0.1%
6555730743	1	< 0.1%
5987020508	1	< 0.1%
4850215901	1	< 0.1%
4384188661	1	< 0.1%
4090240000	1	< 0.1%
3978461203	1	< 0.1%
3942659299	1	< 0.1%
3911517390	2	< 0.1%
3775379084	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.597
금액	0.597	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
64248	등촌태진아름	A15784402	수선유지비충당부채	202006	5271755
43120	상계주공2단지	A13983004	미처분이익잉여금	202006	0
17240	신내5단지대림두산	A13184610	관리비미수금	202006	15159980
61301	우장산한화꿈에그린	A15701004	미부과관리비	202006	32546510
64729	염창한마음삼성	A15786118	예금	202006	42297125
25866	천호금호	A13486102	관리비미수금	202006	2146590
48762	광장삼성1,2차	A14381506	기타유동부채	202006	3000000
6616	명륜아남1차	A11052201	장기수선충당부채	202006	769209162
61592	등촌동성	A15703302	기타의비유동부채	202006	0
25429	명일동우성	A13482505	미지급금	202006	98450330

	아파트명	아파트코드	비용명	년월일	금액
26765	청담래미안로이뷰	A13510009	선급비용	202006	20860620
48746	광장동금호베스트빌	A14381504	당기순이익	202006	23181185
51371	당산현대5차	A15080507	관리비예치금	202006	152320000
38449	송파꿈에그린아파트	A13876114	기타시설운영충당부채	202006	0
45719	한강타운	A14004001	선수관리비	202006	34680000
11406	성산2차현대	A12187703	가수금	202006	6891870
21467	서울숲삼부아파트	A13307101	기타충당부채	202006	1853998
26542	삼성서광	A13509006	전신전화가입권	202006	500000
66690	목동12단지	A15807706	퇴직급여충당부채	202006	173366119
37233	잠실한솔	A13819001	상여충당부채	202006	0

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Close Punctuation

Open Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample