gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2465 (24.6%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:57:06.749498
Analysis finished	2024-05-11 05:57:07.952218
Duration	1.2 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2240
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	21
Mean length	7.4082
Min length	2

Characters and Unicode

Total characters	74082
Distinct characters	436
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	123 ?
Unique (%)	1.2%

Sample

1st row	사당4-3우성
2nd row	가락금호
3rd row	오류금강수목원
4th row	신트리3단지
5th row	목동파크자이아파트

Value	Count	Frequency (%)
아파트	165	1.5%
래미안	44	0.4%
경남아너스빌	20	0.2%
신도림현대	17	0.2%
아이파크	17	0.2%
e편한세상	17	0.2%
푸르지오	16	0.1%
힐스테이트	14	0.1%
은평뉴타운상림마을6단지	14	0.1%
잠실엘스아파트	13	0.1%
Other values (2325)	10444	96.9%

Most occurring characters

Value	Count	Frequency (%)
파	2572	3.5%
아	2552	3.4%
트	2330	3.1%
지	1922	2.6%
대	1745	2.4%
동	1703	2.3%
단	1494	2.0%
신	1433	1.9%
차	1420	1.9%
이	1379	1.9%
Other values (426)	55532	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67865	91.6%
Decimal Number	3651	4.9%
Uppercase Letter	879	1.2%
Space Separator	874	1.2%
Lowercase Letter	284	0.4%
Open Punctuation	149	0.2%
Close Punctuation	149	0.2%
Dash Punctuation	126	0.2%
Other Punctuation	99	0.1%
Letter Number	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
파	2572	3.8%
아	2552	3.8%
트	2330	3.4%
지	1922	2.8%
대	1745	2.6%
동	1703	2.5%
단	1494	2.2%
신	1433	2.1%
차	1420	2.1%
이	1379	2.0%
Other values (381)	49315	72.7%

Uppercase Letter

Value	Count	Frequency (%)
C	137	15.6%
S	130	14.8%
K	109	12.4%
D	87	9.9%
M	87	9.9%
L	56	6.4%
I	51	5.8%
H	43	4.9%
E	42	4.8%
V	27	3.1%
Other values (7)	110	12.5%

Lowercase Letter

Value	Count	Frequency (%)
e	184	64.8%
i	21	7.4%
s	14	4.9%
k	13	4.6%
l	12	4.2%
v	11	3.9%
w	8	2.8%
a	7	2.5%
g	7	2.5%
c	4	1.4%

Decimal Number

Value	Count	Frequency (%)
1	1087	29.8%
2	1048	28.7%
3	487	13.3%
4	250	6.8%
5	204	5.6%
6	156	4.3%
7	135	3.7%
8	112	3.1%
9	94	2.6%
0	78	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	80	80.8%
.	19	19.2%

Space Separator

Value	Count	Frequency (%)
	874	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	149	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	149	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	126	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67865	91.6%
Common	5048	6.8%
Latin	1169	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
파	2572	3.8%
아	2552	3.8%
트	2330	3.4%
지	1922	2.8%
대	1745	2.6%
동	1703	2.5%
단	1494	2.2%
신	1433	2.1%
차	1420	2.1%
이	1379	2.0%
Other values (381)	49315	72.7%

Latin

Value	Count	Frequency (%)
e	184	15.7%
C	137	11.7%
S	130	11.1%
K	109	9.3%
D	87	7.4%
M	87	7.4%
L	56	4.8%
I	51	4.4%
H	43	3.7%
E	42	3.6%
Other values (19)	243	20.8%

Common

Value	Count	Frequency (%)
1	1087	21.5%
2	1048	20.8%
	874	17.3%
3	487	9.6%
4	250	5.0%
5	204	4.0%
6	156	3.1%
(	149	3.0%
)	149	3.0%
7	135	2.7%
Other values (6)	509	10.1%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67865	91.6%
ASCII	6211	8.4%
Number Forms	6	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
파	2572	3.8%
아	2552	3.8%
트	2330	3.4%
지	1922	2.8%
대	1745	2.6%
동	1703	2.5%
단	1494	2.2%
신	1433	2.1%
차	1420	2.1%
이	1379	2.0%
Other values (381)	49315	72.7%

ASCII

Value	Count	Frequency (%)
1	1087	17.5%
2	1048	16.9%
	874	14.1%
3	487	7.8%
4	250	4.0%
5	204	3.3%
e	184	3.0%
6	156	2.5%
(	149	2.4%
)	149	2.4%
Other values (34)	1623	26.1%

Number Forms

Value	Count	Frequency (%)
Ⅰ	6	100.0%

아파트코드
Text

Distinct	2245
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	124 ?
Unique (%)	1.2%

Sample

1st row	A15681501
2nd row	A13880407
3rd row	A15210211
4th row	A15807311
5th row	A10025729

Value	Count	Frequency (%)
a13822004	13	0.1%
a15721006	12	0.1%
a15007201	12	0.1%
a15603203	12	0.1%
a13922910	11	0.1%
a12104005	11	0.1%
a12079501	11	0.1%
a12007001	11	0.1%
a13002002	11	0.1%
a10026207	11	0.1%
Other values (2235)	9885	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18438	20.5%
1	17509	19.5%
A	9989	11.1%
3	8920	9.9%
2	8289	9.2%
5	6241	6.9%
8	5590	6.2%
7	4675	5.2%
4	3984	4.4%
6	3304	3.7%
Other values (2)	3061	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18438	23.0%
1	17509	21.9%
3	8920	11.2%
2	8289	10.4%
5	6241	7.8%
8	5590	7.0%
7	4675	5.8%
4	3984	5.0%
6	3304	4.1%
9	3050	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18438	23.0%
1	17509	21.9%
3	8920	11.2%
2	8289	10.4%
5	6241	7.8%
8	5590	7.0%
7	4675	5.8%
4	3984	5.0%
6	3304	4.1%
9	3050	3.8%

Latin

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18438	20.5%
1	17509	19.5%
A	9989	11.1%
3	8920	9.9%
2	8289	9.2%
5	6241	6.9%
8	5590	6.2%
7	4675	5.2%
4	3984	4.4%
6	3304	3.7%
Other values (2)	3061	3.4%

비용명
Text

Distinct	76
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9868
Min length	2

Characters and Unicode

Total characters	59868
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	비품감가상각누계액
2nd row	예수금
3rd row	가수금
4th row	경비비충당부채
5th row	미처분이익잉여금

Value	Count	Frequency (%)
예수금	333	3.3%
당기순이익	328	3.3%
관리비미수금	320	3.2%
미처분이익잉여금	315	3.1%
비품	313	3.1%
연차수당충당부채	311	3.1%
예금	310	3.1%
선급비용	307	3.1%
퇴직급여충당부채	302	3.0%
미부과관리비	298	3.0%
Other values (66)	6863	68.6%

Most occurring characters

Value	Count	Frequency (%)
금	4479	7.5%
당	3841	6.4%
수	3117	5.2%
비	3086	5.2%
충	2996	5.0%
부	2963	4.9%
채	2649	4.4%
기	2559	4.3%
선	1846	3.1%
예	1690	2.8%
Other values (97)	30642	51.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59868	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4479	7.5%
당	3841	6.4%
수	3117	5.2%
비	3086	5.2%
충	2996	5.0%
부	2963	4.9%
채	2649	4.4%
기	2559	4.3%
선	1846	3.1%
예	1690	2.8%
Other values (97)	30642	51.2%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59868	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4479	7.5%
당	3841	6.4%
수	3117	5.2%
비	3086	5.2%
충	2996	5.0%
부	2963	4.9%
채	2649	4.4%
기	2559	4.3%
선	1846	3.1%
예	1690	2.8%
Other values (97)	30642	51.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59868	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4479	7.5%
당	3841	6.4%
수	3117	5.2%
비	3086	5.2%
충	2996	5.0%
부	2963	4.9%
채	2649	4.4%
기	2559	4.3%
선	1846	3.1%
예	1690	2.8%
Other values (97)	30642	51.2%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202212	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202212
2nd row	202212
3rd row	202212
4th row	202212
5th row	202212

Common Values

Value	Count	Frequency (%)
202212	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202212	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7209
Distinct (%)	72.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	77109353

Minimum	-5.3920078 × 10⁸
Maximum	8.0600941 × 10⁹
Zeros	2465
Zeros (%)	24.6%
Negative	349
Negative (%)	3.5%
Memory size	166.0 KiB

Quantile statistics

Minimum	-5.3920078 × 10⁸
5-th percentile	0
Q1	0
median	3186842
Q3	39642525
95-th percentile	3.7252641 × 10⁸
Maximum	8.0600941 × 10⁹
Range	8.5992949 × 10⁹
Interquartile range (IQR)	39642525

Descriptive statistics

Standard deviation	2.8592909 × 10⁸
Coefficient of variation (CV)	3.7080986
Kurtosis	162.39727
Mean	77109353
Median Absolute Deviation (MAD)	3186842
Skewness	10.295164
Sum	7.7109353 × 10¹¹
Variance	8.1755443 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2465	24.6%
500000	26	0.3%
300000	21	0.2%
250000	17	0.2%
484000	14	0.1%
55000	12	0.1%
242000	11	0.1%
100000	11	0.1%
2000000	10	0.1%
1000000	9	0.1%
Other values (7199)	7404	74.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-539200781	1	< 0.1%
-389001283	1	< 0.1%
-253309086	1	< 0.1%
-207330046	1	< 0.1%
-189911240	1	< 0.1%
-178668590	1	< 0.1%
-146927715	1	< 0.1%
-144587080	1	< 0.1%
-126305200	1	< 0.1%
-109453760	1	< 0.1%

Value	Count	Frequency (%)
8060094086	1	< 0.1%
5702598583	1	< 0.1%
5513147342	1	< 0.1%
5230883774	1	< 0.1%
4671473816	1	< 0.1%
4461687602	1	< 0.1%
4439795074	1	< 0.1%
4356268262	1	< 0.1%
4284746163	1	< 0.1%
4140213046	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.534
금액	0.534	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
63768	사당4-3우성	A15681501	비품감가상각누계액	202212	-39280250
41348	가락금호	A13880407	예수금	202212	3659510
58370	오류금강수목원	A15210211	가수금	202212	330540
69014	신트리3단지	A15807311	경비비충당부채	202212	100634030
4082	목동파크자이아파트	A10025729	미처분이익잉여금	202212	0
2889	고덕롯데캐슬베네루체	A10025112	가지급금	202212	50
41054	송파더센트레아파트	A13876113	기타유형자산감가상각누계액	202212	-2679770
24745	옥수삼성	A13375902	예금	202212	332722866
2714	휘경 해모로 프레스티지 아파트	A10025015	퇴직급여충당예금	202212	0
21015	방학삼성래미안2단지	A13202103	선급비용	202212	16714150

	아파트명	아파트코드	비용명	년월일	금액
55032	신길남서울	A15085805	미지급금	202212	37380790
22395	도봉파크빌2단지	A13275303	선수전기료	202212	1699800
10509	DMC휴먼빌	A12013001	가지급금	202212	0
23654	하왕금호베스트빌	A13302204	선수전기료	202212	2747390
19801	묵동신안2차	A13185502	비품	202212	3371370
22007	창동대동	A13204501	기타유동부채	202212	219000
21487	쌍문금호1차아파트	A13203408	현금	202212	59922
68063	목동한신청구	A15805002	가지급금	202212	0
27001	명일삼환아파트	A13407202	당기순이익	202212	10005491
45966	상계한신	A13983608	퇴직급여충당부채	202212	59467094

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample