gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 30.48539677)	Skewed
`금액` has 2187 (21.9%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:20.163278
Analysis finished	2024-05-11 05:59:21.094139
Duration	0.93 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2219
Distinct (%)	22.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	21
Median length	19
Mean length	7.2877
Min length	2

Characters and Unicode

Total characters	72877
Distinct characters	435
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	121 ?
Unique (%)	1.2%

Sample

1st row	도봉서원제2
2nd row	양재우성KBS(113동)
3rd row	은평뉴타운상림마을12단지
4th row	길음뉴타운9단지제2
5th row	천호태영

Value	Count	Frequency (%)
아파트	142	1.3%
래미안	32	0.3%
아이파크	21	0.2%
e편한세상	19	0.2%
경남아너스빌	17	0.2%
북한산	16	0.2%
미아경남아너스빌	13	0.1%
서울숲2차푸르지오임대	12	0.1%
상계보람	12	0.1%
중계그린	12	0.1%
Other values (2287)	10315	97.2%

Most occurring characters

Value	Count	Frequency (%)
아	2427	3.3%
파	2396	3.3%
트	2184	3.0%
대	1841	2.5%
지	1825	2.5%
동	1685	2.3%
차	1457	2.0%
단	1430	2.0%
신	1418	1.9%
성	1329	1.8%
Other values (425)	54885	75.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66898	91.8%
Decimal Number	3630	5.0%
Uppercase Letter	782	1.1%
Space Separator	685	0.9%
Lowercase Letter	363	0.5%
Dash Punctuation	139	0.2%
Open Punctuation	129	0.2%
Close Punctuation	129	0.2%
Other Punctuation	115	0.2%
Letter Number	7	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2427	3.6%
파	2396	3.6%
트	2184	3.3%
대	1841	2.8%
지	1825	2.7%
동	1685	2.5%
차	1457	2.2%
단	1430	2.1%
신	1418	2.1%
성	1329	2.0%
Other values (380)	48906	73.1%

Uppercase Letter

Value	Count	Frequency (%)
S	119	15.2%
C	104	13.3%
K	86	11.0%
D	73	9.3%
M	73	9.3%
L	66	8.4%
H	58	7.4%
I	41	5.2%
G	38	4.9%
E	32	4.1%
Other values (7)	92	11.8%

Lowercase Letter

Value	Count	Frequency (%)
e	202	55.6%
l	32	8.8%
i	29	8.0%
k	23	6.3%
v	22	6.1%
s	20	5.5%
c	16	4.4%
w	12	3.3%
h	5	1.4%
g	1	0.3%

Decimal Number

Value	Count	Frequency (%)
1	1126	31.0%
2	1012	27.9%
3	466	12.8%
4	280	7.7%
5	214	5.9%
6	146	4.0%
7	125	3.4%
8	93	2.6%
0	91	2.5%
9	77	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	94	81.7%
.	21	18.3%

Space Separator

Value	Count	Frequency (%)
	685	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	139	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	129	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	129	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66898	91.8%
Common	4827	6.6%
Latin	1152	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2427	3.6%
파	2396	3.6%
트	2184	3.3%
대	1841	2.8%
지	1825	2.7%
동	1685	2.5%
차	1457	2.2%
단	1430	2.1%
신	1418	2.1%
성	1329	2.0%
Other values (380)	48906	73.1%

Latin

Value	Count	Frequency (%)
e	202	17.5%
S	119	10.3%
C	104	9.0%
K	86	7.5%
D	73	6.3%
M	73	6.3%
L	66	5.7%
H	58	5.0%
I	41	3.6%
G	38	3.3%
Other values (19)	292	25.3%

Common

Value	Count	Frequency (%)
1	1126	23.3%
2	1012	21.0%
	685	14.2%
3	466	9.7%
4	280	5.8%
5	214	4.4%
6	146	3.0%
-	139	2.9%
(	129	2.7%
)	129	2.7%
Other values (6)	501	10.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66898	91.8%
ASCII	5972	8.2%
Number Forms	7	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2427	3.6%
파	2396	3.6%
트	2184	3.3%
대	1841	2.8%
지	1825	2.7%
동	1685	2.5%
차	1457	2.2%
단	1430	2.1%
신	1418	2.1%
성	1329	2.0%
Other values (380)	48906	73.1%

ASCII

Value	Count	Frequency (%)
1	1126	18.9%
2	1012	16.9%
	685	11.5%
3	466	7.8%
4	280	4.7%
5	214	3.6%
e	202	3.4%
6	146	2.4%
-	139	2.3%
(	129	2.2%
Other values (34)	1573	26.3%

Number Forms

Value	Count	Frequency (%)
Ⅰ	7	100.0%

아파트코드
Text

Distinct	2226
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	121 ?
Unique (%)	1.2%

Sample

1st row	A13275302
2nd row	A13789201
3rd row	A12220004
4th row	A13679402
5th row	A13402002

Value	Count	Frequency (%)
a14272306	13	0.1%
a13982604	12	0.1%
a13986306	12	0.1%
a14003001	12	0.1%
a15601105	11	0.1%
a15105008	11	0.1%
a13778204	11	0.1%
a13922907	11	0.1%
a15180705	11	0.1%
a13410006	11	0.1%
Other values (2216)	9885	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18473	20.5%
1	17682	19.6%
A	9981	11.1%
3	8811	9.8%
2	8183	9.1%
5	6234	6.9%
8	5606	6.2%
7	4750	5.3%
4	3900	4.3%
6	3313	3.7%
Other values (2)	3067	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18473	23.1%
1	17682	22.1%
3	8811	11.0%
2	8183	10.2%
5	6234	7.8%
8	5606	7.0%
7	4750	5.9%
4	3900	4.9%
6	3313	4.1%
9	3048	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9981	99.8%
B	19	0.2%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18473	23.1%
1	17682	22.1%
3	8811	11.0%
2	8183	10.2%
5	6234	7.8%
8	5606	7.0%
7	4750	5.9%
4	3900	4.9%
6	3313	4.1%
9	3048	3.8%

Latin

Value	Count	Frequency (%)
A	9981	99.8%
B	19	0.2%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18473	20.5%
1	17682	19.6%
A	9981	11.1%
3	8811	9.8%
2	8183	9.1%
5	6234	6.9%
8	5606	6.2%
7	4750	5.3%
4	3900	4.3%
6	3313	3.7%
Other values (2)	3067	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9972
Min length	2

Characters and Unicode

Total characters	59972
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	공동주택적립금
2nd row	주차장충당예금
3rd row	미부과관리비
4th row	미처분이익잉여금
5th row	주차장충당부채

Value	Count	Frequency (%)
미처분이익잉여금	336	3.4%
예금	334	3.3%
퇴직급여충당부채	325	3.2%
당기순이익	321	3.2%
연차수당충당부채	316	3.2%
선급비용	310	3.1%
장기수선충당부채	306	3.1%
장기수선충당예금	300	3.0%
예수금	296	3.0%
공동주택적립금	295	2.9%
Other values (67)	6861	68.6%

Most occurring characters

Value	Count	Frequency (%)
금	4748	7.9%
당	3839	6.4%
수	3154	5.3%
충	3120	5.2%
부	2976	5.0%
비	2961	4.9%
채	2673	4.5%
기	2376	4.0%
선	1914	3.2%
예	1791	3.0%
Other values (97)	30420	50.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59972	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4748	7.9%
당	3839	6.4%
수	3154	5.3%
충	3120	5.2%
부	2976	5.0%
비	2961	4.9%
채	2673	4.5%
기	2376	4.0%
선	1914	3.2%
예	1791	3.0%
Other values (97)	30420	50.7%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59972	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4748	7.9%
당	3839	6.4%
수	3154	5.3%
충	3120	5.2%
부	2976	5.0%
비	2961	4.9%
채	2673	4.5%
기	2376	4.0%
선	1914	3.2%
예	1791	3.0%
Other values (97)	30420	50.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59972	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4748	7.9%
당	3839	6.4%
수	3154	5.3%
충	3120	5.2%
부	2976	5.0%
비	2961	4.9%
채	2673	4.5%
기	2376	4.0%
선	1914	3.2%
예	1791	3.0%
Other values (97)	30420	50.7%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202103	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202103
2nd row	202103
3rd row	202103
4th row	202103
5th row	202103

Common Values

Value	Count	Frequency (%)
202103	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202103	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7517
Distinct (%)	75.2%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	73921529

Minimum	-2.4201078 × 10⁹
Maximum	2.2105239 × 10¹⁰
Zeros	2187
Zeros (%)	21.9%
Negative	343
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-2.4201078 × 10⁹
5-th percentile	0
Q1	0
median	3313210
Q3	33731948
95-th percentile	3.6332115 × 10⁸
Maximum	2.2105239 × 10¹⁰
Range	2.4525347 × 10¹⁰
Interquartile range (IQR)	33731948

Descriptive statistics

Standard deviation	3.5569236 × 10⁸
Coefficient of variation (CV)	4.811756
Kurtosis	1616.0042
Mean	73921529
Median Absolute Deviation (MAD)	3313210
Skewness	30.485397
Sum	7.3921529 × 10¹¹
Variance	1.2651705 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2187	21.9%
500000	26	0.3%
250000	19	0.2%
300000	15	0.1%
1000000	14	0.1%
242000	12	0.1%
484000	12	0.1%
3000000	12	0.1%
10000000	9	0.1%
30000000	8	0.1%
Other values (7507)	7686	76.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-2420107766	1	< 0.1%
-766340955	1	< 0.1%
-361355158	1	< 0.1%
-197034426	1	< 0.1%
-161469720	1	< 0.1%
-148144012	1	< 0.1%
-134098170	1	< 0.1%
-130932285	1	< 0.1%
-122481350	1	< 0.1%
-119511381	1	< 0.1%

Value	Count	Frequency (%)
22105239225	1	< 0.1%
11634939102	1	< 0.1%
6027070234	1	< 0.1%
5954105162	1	< 0.1%
4340686033	1	< 0.1%
4275204643	1	< 0.1%
4085705462	1	< 0.1%
4061418186	1	< 0.1%
3990814123	1	< 0.1%
3756922003	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.646
금액	0.646	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
21138	도봉서원제2	A13275302	공동주택적립금	202103	0
37294	양재우성KBS(113동)	A13789201	주차장충당예금	202103	44686898
13712	은평뉴타운상림마을12단지	A12220004	미부과관리비	202103	115008480
34404	길음뉴타운9단지제2	A13679402	미처분이익잉여금	202103	1294302
25170	천호태영	A13402002	주차장충당부채	202103	0
31051	역삼개나리래미안	A13592601	미지급금	202103	0
53346	여의도자이	A15076302	비품감가상각누계액	202103	-117717769
29692	대치삼성	A13528003	기타충당부채	202103	0
36872	서초우성5차아파트	A13785705	비품	202103	0
25200	대우한강베네시티	A13402003	장기수선충당부채	202103	286877824

	아파트명	아파트코드	비용명	년월일	금액
71139	은평뉴타운구파발9-2단지	A41279920	당기순이익	202103	8984268
52711	문래대원	A15009603	비품	202103	10132698
44450	상계대동아파트	A13981606	승강기유지비충당부채	202103	720000
33739	종암극동아파트	A13671207	선급금	202103	524540
16126	답십리대우	A13080201	상여충당부채	202103	2499260
5877	신내의료안심주택	A10027775	미수관리비예치금	202103	3144000
8971	DMC휴먼빌	A12013001	미수관리비예치금	202103	0
30889	압구정현대아파트	A13589802	선수금	202103	46925063
49219	미아현대	A14272307	당기순이익	202103	2167243
70146	신정대림	A15885303	주차장충당예금	202103	0

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Open Punctuation

Close Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample