gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2267 (22.7%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:25.500873
Analysis finished	2024-05-11 05:59:26.336700
Duration	0.84 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2231
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	21
Median length	19
Mean length	7.2511
Min length	2

Characters and Unicode

Total characters	72511
Distinct characters	436
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	108 ?
Unique (%)	1.1%

Sample

1st row	구로다솜금호
2nd row	신당약수하이츠
3rd row	한강
4th row	래미안도곡카운티
5th row	보라매 sk뷰

Value	Count	Frequency (%)
아파트	157	1.5%
아이파크	28	0.3%
래미안	24	0.2%
e편한세상	19	0.2%
고덕	15	0.1%
힐스테이트	15	0.1%
서울숲2차푸르지오임대	13	0.1%
경남아너스빌	13	0.1%
해모로	12	0.1%
강일리버파크6단지	12	0.1%
Other values (2299)	10329	97.1%

Most occurring characters

Value	Count	Frequency (%)
파	2464	3.4%
아	2462	3.4%
트	2219	3.1%
지	1864	2.6%
대	1838	2.5%
동	1664	2.3%
차	1509	2.1%
단	1480	2.0%
신	1473	2.0%
이	1306	1.8%
Other values (426)	54232	74.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66451	91.6%
Decimal Number	3725	5.1%
Space Separator	725	1.0%
Uppercase Letter	676	0.9%
Lowercase Letter	375	0.5%
Dash Punctuation	149	0.2%
Close Punctuation	146	0.2%
Open Punctuation	146	0.2%
Other Punctuation	114	0.2%
Letter Number	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
파	2464	3.7%
아	2462	3.7%
트	2219	3.3%
지	1864	2.8%
대	1838	2.8%
동	1664	2.5%
차	1509	2.3%
단	1480	2.2%
신	1473	2.2%
이	1306	2.0%
Other values (381)	48172	72.5%

Uppercase Letter

Value	Count	Frequency (%)
S	112	16.6%
C	91	13.5%
K	79	11.7%
D	66	9.8%
M	66	9.8%
L	47	7.0%
H	45	6.7%
I	29	4.3%
E	26	3.8%
G	24	3.6%
Other values (7)	91	13.5%

Lowercase Letter

Value	Count	Frequency (%)
e	196	52.3%
l	48	12.8%
i	37	9.9%
v	29	7.7%
s	22	5.9%
k	15	4.0%
w	8	2.1%
h	8	2.1%
g	5	1.3%
a	5	1.3%

Decimal Number

Value	Count	Frequency (%)
2	1124	30.2%
1	1071	28.8%
3	485	13.0%
4	273	7.3%
5	221	5.9%
6	152	4.1%
7	128	3.4%
8	104	2.8%
9	102	2.7%
0	65	1.7%

Other Punctuation

Value	Count	Frequency (%)
,	92	80.7%
.	22	19.3%

Space Separator

Value	Count	Frequency (%)
	725	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	149	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	146	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	146	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66451	91.6%
Common	5005	6.9%
Latin	1055	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
파	2464	3.7%
아	2462	3.7%
트	2219	3.3%
지	1864	2.8%
대	1838	2.8%
동	1664	2.5%
차	1509	2.3%
단	1480	2.2%
신	1473	2.2%
이	1306	2.0%
Other values (381)	48172	72.5%

Latin

Value	Count	Frequency (%)
e	196	18.6%
S	112	10.6%
C	91	8.6%
K	79	7.5%
D	66	6.3%
M	66	6.3%
l	48	4.5%
L	47	4.5%
H	45	4.3%
i	37	3.5%
Other values (19)	268	25.4%

Common

Value	Count	Frequency (%)
2	1124	22.5%
1	1071	21.4%
	725	14.5%
3	485	9.7%
4	273	5.5%
5	221	4.4%
6	152	3.0%
-	149	3.0%
)	146	2.9%
(	146	2.9%
Other values (6)	513	10.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66451	91.6%
ASCII	6056	8.4%
Number Forms	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
파	2464	3.7%
아	2462	3.7%
트	2219	3.3%
지	1864	2.8%
대	1838	2.8%
동	1664	2.5%
차	1509	2.3%
단	1480	2.2%
신	1473	2.2%
이	1306	2.0%
Other values (381)	48172	72.5%

ASCII

Value	Count	Frequency (%)
2	1124	18.6%
1	1071	17.7%
	725	12.0%
3	485	8.0%
4	273	4.5%
5	221	3.6%
e	196	3.2%
6	152	2.5%
-	149	2.5%
)	146	2.4%
Other values (34)	1514	25.0%

Number Forms

Value	Count	Frequency (%)
Ⅰ	4	100.0%

아파트코드
Text

Distinct	2237
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	109 ?
Unique (%)	1.1%

Sample

1st row	A15283806
2nd row	A10045404
3rd row	A13790620
4th row	A13585404
5th row	A10025070

Value	Count	Frequency (%)
a13993501	12	0.1%
a14272314	12	0.1%
a13410004	12	0.1%
a13307204	11	0.1%
a13987301	11	0.1%
a15785613	11	0.1%
a15210207	11	0.1%
a13922110	11	0.1%
a15805002	11	0.1%
a15080507	11	0.1%
Other values (2227)	9887	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18348	20.4%
1	17675	19.6%
A	9986	11.1%
3	8743	9.7%
2	8229	9.1%
5	6211	6.9%
8	5728	6.4%
7	4766	5.3%
4	3935	4.4%
6	3376	3.8%
Other values (2)	3003	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18348	22.9%
1	17675	22.1%
3	8743	10.9%
2	8229	10.3%
5	6211	7.8%
8	5728	7.2%
7	4766	6.0%
4	3935	4.9%
6	3376	4.2%
9	2989	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9986	99.9%
B	14	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18348	22.9%
1	17675	22.1%
3	8743	10.9%
2	8229	10.3%
5	6211	7.8%
8	5728	7.2%
7	4766	6.0%
4	3935	4.9%
6	3376	4.2%
9	2989	3.7%

Latin

Value	Count	Frequency (%)
A	9986	99.9%
B	14	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18348	20.4%
1	17675	19.6%
A	9986	11.1%
3	8743	9.7%
2	8229	9.1%
5	6211	6.9%
8	5728	6.4%
7	4766	5.3%
4	3935	4.4%
6	3376	3.8%
Other values (2)	3003	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9648
Min length	2

Characters and Unicode

Total characters	59648
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	기타의비유동부채
2nd row	선급금
3rd row	당기순이익
4th row	기타시설운영충당부채
5th row	당기순이익

Value	Count	Frequency (%)
연차수당충당부채	330	3.3%
예금	323	3.2%
당기순이익	318	3.2%
미처분이익잉여금	316	3.2%
관리비미수금	302	3.0%
예수금	300	3.0%
선급비용	300	3.0%
장기수선충당부채	299	3.0%
가수금	295	2.9%
장기수선충당예금	294	2.9%
Other values (67)	6923	69.2%

Most occurring characters

Value	Count	Frequency (%)
금	4719	7.9%
당	3813	6.4%
수	3210	5.4%
충	3074	5.2%
비	2980	5.0%
부	2953	5.0%
채	2657	4.5%
기	2413	4.0%
선	1916	3.2%
예	1784	3.0%
Other values (97)	30129	50.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59648	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4719	7.9%
당	3813	6.4%
수	3210	5.4%
충	3074	5.2%
비	2980	5.0%
부	2953	5.0%
채	2657	4.5%
기	2413	4.0%
선	1916	3.2%
예	1784	3.0%
Other values (97)	30129	50.5%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59648	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4719	7.9%
당	3813	6.4%
수	3210	5.4%
충	3074	5.2%
비	2980	5.0%
부	2953	5.0%
채	2657	4.5%
기	2413	4.0%
선	1916	3.2%
예	1784	3.0%
Other values (97)	30129	50.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59648	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4719	7.9%
당	3813	6.4%
수	3210	5.4%
충	3074	5.2%
비	2980	5.0%
부	2953	5.0%
채	2657	4.5%
기	2413	4.0%
선	1916	3.2%
예	1784	3.0%
Other values (97)	30129	50.5%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202102	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202102
2nd row	202102
3rd row	202102
4th row	202102
5th row	202102

Common Values

Value	Count	Frequency (%)
202102	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202102	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7379
Distinct (%)	73.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	73533291

Minimum	-4.09024 × 10⁹
Maximum	9.022582 × 10⁹
Zeros	2267
Zeros (%)	22.7%
Negative	343
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.09024 × 10⁹
5-th percentile	0
Q1	0
median	3084943
Q3	37729620
95-th percentile	3.7505777 × 10⁸
Maximum	9.022582 × 10⁹
Range	1.3112822 × 10¹⁰
Interquartile range (IQR)	37729620

Descriptive statistics

Standard deviation	2.8217111 × 10⁸
Coefficient of variation (CV)	3.8373247
Kurtosis	262.41963
Mean	73533291
Median Absolute Deviation (MAD)	3084943
Skewness	11.882754
Sum	7.3533291 × 10¹¹
Variance	7.9620536 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2267	22.7%
250000	26	0.3%
500000	23	0.2%
200000	14	0.1%
484000	14	0.1%
1000000	11	0.1%
30000000	11	0.1%
55000	11	0.1%
2000000	10	0.1%
10000000	10	0.1%
Other values (7369)	7603	76.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-4090240000	1	< 0.1%
-279460094	1	< 0.1%
-269782080	1	< 0.1%
-244922944	1	< 0.1%
-166143777	1	< 0.1%
-143400530	1	< 0.1%
-138548880	1	< 0.1%
-122517705	1	< 0.1%
-97530000	1	< 0.1%
-91859147	1	< 0.1%

Value	Count	Frequency (%)
9022581992	1	< 0.1%
7573220043	1	< 0.1%
7495281423	1	< 0.1%
5624354910	1	< 0.1%
5528919317	1	< 0.1%
4139169162	1	< 0.1%
3791119443	1	< 0.1%
3751211249	1	< 0.1%
3416407080	1	< 0.1%
3176503778	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.282
금액	0.282	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
58612	구로다솜금호	A15283806	기타의비유동부채	202102	0
7008	신당약수하이츠	A10045404	선급금	202102	3964050
37511	한강	A13790620	당기순이익	202102	75235680
30794	래미안도곡카운티	A13585404	기타시설운영충당부채	202102	4868442
1481	보라매 sk뷰	A10025070	당기순이익	202102	25062463
10254	토정한강삼성	A12106001	장기수선충당부채	202102	983111482
497	종암sh빌아파트	A10024603	가수금	202102	1075528
65322	마곡수명산파크2단지	A15728004	미처분이익잉여금	202102	26387799
36612	방배임광1,2차	A13785005	선급금	202102	10400
47132	대우월드마크용산	A14001101	미수관리비예치금	202102	0

	아파트명	아파트코드	비용명	년월일	금액
47813	후암미주	A14019001	전신전화가입권	202102	448000
1064	대림 우성2차	A10024829	선수관리비	202102	24000000
55651	남현동한일유앤아이	A15108001	선수전기료	202102	181420
52058	신길건영	A15005302	미수금	202102	339680
28528	강남신동아파밀리에1단지	A13519001	미수금	202102	0
11069	월드컵아이파크1단지	A12171101	예수금	202102	2328540
13898	DMC자이1단지	A12275501	공동주택적립금	202102	45670575
45672	월계청백3단지	A13985105	기타의비유동부채	202102	0
49683	화양현대	A14313001	장기수선충당예금	202102	351511198
26100	강일리버파크6단지	A13410004	관리비미수금	202102	47325700

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Close Punctuation

Open Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample