gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 22.11784956)	Skewed
`금액` has 2360 (23.6%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:57:58.198287
Analysis finished	2024-05-11 05:57:59.092075
Duration	0.89 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2228
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.3622
Min length	2

Characters and Unicode

Total characters	73622
Distinct characters	436
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	108 ?
Unique (%)	1.1%

Sample

1st row	방배1차현대
2nd row	갈현한솔아파트
3rd row	용산센트럴파크
4th row	신동아아파트
5th row	여의도장미

Value	Count	Frequency (%)
아파트	163	1.5%
래미안	43	0.4%
e편한세상	25	0.2%
아이파크	24	0.2%
푸르지오	17	0.2%
래미안밤섬리베뉴	15	0.1%
경남아너스빌	15	0.1%
고덕	15	0.1%
은평뉴타운상림마을6단지	15	0.1%
신트리1단지	13	0.1%
Other values (2306)	10394	96.8%

Most occurring characters

Value	Count	Frequency (%)
아	2581	3.5%
파	2516	3.4%
트	2332	3.2%
지	1908	2.6%
대	1734	2.4%
동	1632	2.2%
단	1511	2.1%
차	1424	1.9%
신	1383	1.9%
이	1323	1.8%
Other values (426)	55278	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67367	91.5%
Decimal Number	3694	5.0%
Uppercase Letter	860	1.2%
Space Separator	805	1.1%
Lowercase Letter	353	0.5%
Close Punctuation	147	0.2%
Open Punctuation	147	0.2%
Dash Punctuation	145	0.2%
Other Punctuation	96	0.1%
Letter Number	8	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2581	3.8%
파	2516	3.7%
트	2332	3.5%
지	1908	2.8%
대	1734	2.6%
동	1632	2.4%
단	1511	2.2%
차	1424	2.1%
신	1383	2.1%
이	1323	2.0%
Other values (381)	49023	72.8%

Uppercase Letter

Value	Count	Frequency (%)
S	135	15.7%
C	109	12.7%
K	104	12.1%
M	78	9.1%
D	78	9.1%
L	62	7.2%
H	58	6.7%
I	46	5.3%
E	45	5.2%
V	27	3.1%
Other values (7)	118	13.7%

Lowercase Letter

Value	Count	Frequency (%)
e	190	53.8%
l	40	11.3%
i	34	9.6%
v	23	6.5%
s	18	5.1%
k	14	4.0%
h	8	2.3%
w	8	2.3%
a	7	2.0%
g	7	2.0%

Decimal Number

Value	Count	Frequency (%)
1	1157	31.3%
2	1042	28.2%
3	477	12.9%
4	266	7.2%
5	209	5.7%
6	169	4.6%
8	110	3.0%
7	98	2.7%
9	90	2.4%
0	76	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	77	80.2%
.	19	19.8%

Space Separator

Value	Count	Frequency (%)
	805	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	147	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	147	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	145	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	8	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67367	91.5%
Common	5034	6.8%
Latin	1221	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2581	3.8%
파	2516	3.7%
트	2332	3.5%
지	1908	2.8%
대	1734	2.6%
동	1632	2.4%
단	1511	2.2%
차	1424	2.1%
신	1383	2.1%
이	1323	2.0%
Other values (381)	49023	72.8%

Latin

Value	Count	Frequency (%)
e	190	15.6%
S	135	11.1%
C	109	8.9%
K	104	8.5%
M	78	6.4%
D	78	6.4%
L	62	5.1%
H	58	4.8%
I	46	3.8%
E	45	3.7%
Other values (19)	316	25.9%

Common

Value	Count	Frequency (%)
1	1157	23.0%
2	1042	20.7%
	805	16.0%
3	477	9.5%
4	266	5.3%
5	209	4.2%
6	169	3.4%
)	147	2.9%
(	147	2.9%
-	145	2.9%
Other values (6)	470	9.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67367	91.5%
ASCII	6247	8.5%
Number Forms	8	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2581	3.8%
파	2516	3.7%
트	2332	3.5%
지	1908	2.8%
대	1734	2.6%
동	1632	2.4%
단	1511	2.2%
차	1424	2.1%
신	1383	2.1%
이	1323	2.0%
Other values (381)	49023	72.8%

ASCII

Value	Count	Frequency (%)
1	1157	18.5%
2	1042	16.7%
	805	12.9%
3	477	7.6%
4	266	4.3%
5	209	3.3%
e	190	3.0%
6	169	2.7%
)	147	2.4%
(	147	2.4%
Other values (34)	1638	26.2%

Number Forms

Value	Count	Frequency (%)
Ⅰ	8	100.0%

아파트코드
Text

Distinct	2234
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	108 ?
Unique (%)	1.1%

Sample

1st row	A13785203
2nd row	A12281801
3rd row	A10024691
4th row	A14082601
5th row	A15001004

Value	Count	Frequency (%)
a15807002	13	0.1%
a12208204	12	0.1%
a13204406	11	0.1%
a13384403	11	0.1%
a13986701	11	0.1%
a14381516	11	0.1%
a15785711	11	0.1%
a13922111	11	0.1%
a12013202	11	0.1%
a13078701	10	0.1%
Other values (2224)	9888	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18510	20.6%
1	17579	19.5%
A	9993	11.1%
3	8833	9.8%
2	8237	9.2%
5	6006	6.7%
8	5576	6.2%
7	4709	5.2%
4	4079	4.5%
6	3373	3.7%
Other values (2)	3105	3.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18510	23.1%
1	17579	22.0%
3	8833	11.0%
2	8237	10.3%
5	6006	7.5%
8	5576	7.0%
7	4709	5.9%
4	4079	5.1%
6	3373	4.2%
9	3098	3.9%

Uppercase Letter

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18510	23.1%
1	17579	22.0%
3	8833	11.0%
2	8237	10.3%
5	6006	7.5%
8	5576	7.0%
7	4709	5.9%
4	4079	5.1%
6	3373	4.2%
9	3098	3.9%

Latin

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18510	20.6%
1	17579	19.5%
A	9993	11.1%
3	8833	9.8%
2	8237	9.2%
5	6006	6.7%
8	5576	6.2%
7	4709	5.2%
4	4079	4.5%
6	3373	3.7%
Other values (2)	3105	3.5%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9905
Min length	2

Characters and Unicode

Total characters	59905
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	장기수선충당부채
2nd row	관리비미수금
3rd row	장기수선충당예금
4th row	공동주택적립금
5th row	예수금

Value	Count	Frequency (%)
예금	331	3.3%
예수금	327	3.3%
공동주택적립금	319	3.2%
연차수당충당부채	314	3.1%
퇴직급여충당부채	310	3.1%
관리비미수금	306	3.1%
당기순이익	303	3.0%
미처분이익잉여금	302	3.0%
장기수선충당예금	301	3.0%
수선유지비충당부채	298	3.0%
Other values (67)	6889	68.9%

Most occurring characters

Value	Count	Frequency (%)
금	4728	7.9%
당	3908	6.5%
수	3176	5.3%
충	3110	5.2%
비	3001	5.0%
부	2926	4.9%
채	2638	4.4%
기	2445	4.1%
선	1877	3.1%
예	1815	3.0%
Other values (97)	30281	50.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59905	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4728	7.9%
당	3908	6.5%
수	3176	5.3%
충	3110	5.2%
비	3001	5.0%
부	2926	4.9%
채	2638	4.4%
기	2445	4.1%
선	1877	3.1%
예	1815	3.0%
Other values (97)	30281	50.5%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59905	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4728	7.9%
당	3908	6.5%
수	3176	5.3%
충	3110	5.2%
비	3001	5.0%
부	2926	4.9%
채	2638	4.4%
기	2445	4.1%
선	1877	3.1%
예	1815	3.0%
Other values (97)	30281	50.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59905	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4728	7.9%
당	3908	6.5%
수	3176	5.3%
충	3110	5.2%
비	3001	5.0%
부	2926	4.9%
채	2638	4.4%
기	2445	4.1%
선	1877	3.1%
예	1815	3.0%
Other values (97)	30281	50.5%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202205	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202205
2nd row	202205
3rd row	202205
4th row	202205
5th row	202205

Common Values

Value	Count	Frequency (%)
202205	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202205	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7308
Distinct (%)	73.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	80679369

Minimum	-4.4797938 × 10⁸
Maximum	2.0760626 × 10¹⁰
Zeros	2360
Zeros (%)	23.6%
Negative	343
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.4797938 × 10⁸
5-th percentile	0
Q1	0
median	2974104
Q3	33994437
95-th percentile	3.616385 × 10⁸
Maximum	2.0760626 × 10¹⁰
Range	2.1208606 × 10¹⁰
Interquartile range (IQR)	33994437

Descriptive statistics

Standard deviation	3.827196 × 10⁸
Coefficient of variation (CV)	4.7437109
Kurtosis	933.27788
Mean	80679369
Median Absolute Deviation (MAD)	2974104
Skewness	22.11785
Sum	8.0679369 × 10¹¹
Variance	1.464743 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2360	23.6%
500000	28	0.3%
250000	18	0.2%
200000	15	0.1%
300000	13	0.1%
484000	13	0.1%
242000	13	0.1%
100000	12	0.1%
10000000	12	0.1%
30000000	9	0.1%
Other values (7298)	7507	75.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-447979377	1	< 0.1%
-375250468	1	< 0.1%
-315812936	1	< 0.1%
-280663216	1	< 0.1%
-245626510	1	< 0.1%
-230922000	1	< 0.1%
-201330000	1	< 0.1%
-195595710	1	< 0.1%
-190422700	1	< 0.1%
-154762018	1	< 0.1%

Value	Count	Frequency (%)
20760626250	1	< 0.1%
7751492557	1	< 0.1%
7508019841	1	< 0.1%
7477225598	1	< 0.1%
5759842614	1	< 0.1%
5004400593	1	< 0.1%
4838246619	1	< 0.1%
4835743985	1	< 0.1%
4783509589	1	< 0.1%
4671293406	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.302
금액	0.302	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
37924	방배1차현대	A13785203	장기수선충당부채	202205	326036221
15118	갈현한솔아파트	A12281801	관리비미수금	202205	2030220
1704	용산센트럴파크	A10024691	장기수선충당예금	202205	321966959
49526	신동아아파트	A14082601	공동주택적립금	202205	55188745
52597	여의도장미	A15001004	예수금	202205	1000680
30339	수서까치마을	A13522007	가지급금	202205	531240
32958	돈암동일하이빌	A13603501	저장품	202205	336500
31527	역삼개나리푸르지오	A13579501	선수관리비	202205	81084000
4018	연희파크푸르지오 아파트	A10025822	당기순이익	202205	19185012
32249	압구정한양아파트제2단지	A13590204	연차수당충당부채	202205	69212958

	아파트명	아파트코드	비용명	년월일	금액
53492	신길삼성래미안	A15005402	임대보증금	202205	0
66467	마곡수명산파크7단지	A15728005	기타충당부채	202205	5870000
28576	천호한신	A13486601	선수금	202205	28455000
9622	홍제성원아파트	A12009201	기타투자자산	202205	56512
11660	마포강변힐스테이트	A12112002	기타시설운영충당부채	202205	4242700
7432	강남한신휴플러스 8단지	A10027909	선수수도료	202205	0
31011	도곡렉슬	A13527203	현금	202205	168110177
55747	여의도화랑	A15088802	기타충당예금	202205	1811790
49863	번동금호어울림	A14206002	미처분이익잉여금	202205	0
59388	신개봉삼환	A15280602	임대보증금	202205	5600000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample