gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2131 (21.3%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:14.906696
Analysis finished	2024-05-11 05:59:15.689167
Duration	0.78 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2227
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	21
Median length	19
Mean length	7.3212
Min length	2

Characters and Unicode

Total characters	73212
Distinct characters	435
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	128 ?
Unique (%)	1.3%

Sample

1st row	논현신동아
2nd row	독산동한양수자인아파트
3rd row	이촌동부센트레빌
4th row	쌍문한양5차
5th row	구로보광

Value	Count	Frequency (%)
아파트	171	1.6%
래미안	30	0.3%
아이파크	19	0.2%
해모로	18	0.2%
은평뉴타운상림마을6단지	18	0.2%
브라운스톤	17	0.2%
e편한세상	16	0.1%
북한산	15	0.1%
고덕	13	0.1%
신대방현대	13	0.1%
Other values (2294)	10338	96.9%

Most occurring characters

Value	Count	Frequency (%)
아	2580	3.5%
파	2493	3.4%
트	2231	3.0%
지	1827	2.5%
대	1805	2.5%
동	1674	2.3%
차	1506	2.1%
신	1461	2.0%
단	1440	2.0%
이	1302	1.8%
Other values (425)	54893	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67020	91.5%
Decimal Number	3794	5.2%
Space Separator	755	1.0%
Uppercase Letter	716	1.0%
Lowercase Letter	347	0.5%
Open Punctuation	161	0.2%
Close Punctuation	161	0.2%
Dash Punctuation	128	0.2%
Other Punctuation	126	0.2%
Letter Number	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2580	3.8%
파	2493	3.7%
트	2231	3.3%
지	1827	2.7%
대	1805	2.7%
동	1674	2.5%
차	1506	2.2%
신	1461	2.2%
단	1440	2.1%
이	1302	1.9%
Other values (380)	48701	72.7%

Uppercase Letter

Value	Count	Frequency (%)
S	123	17.2%
C	87	12.2%
K	81	11.3%
D	62	8.7%
M	62	8.7%
L	56	7.8%
H	46	6.4%
I	36	5.0%
G	35	4.9%
E	26	3.6%
Other values (7)	102	14.2%

Lowercase Letter

Value	Count	Frequency (%)
e	185	53.3%
i	32	9.2%
l	32	9.2%
v	26	7.5%
s	18	5.2%
k	18	5.2%
w	11	3.2%
c	10	2.9%
g	5	1.4%
h	5	1.4%

Decimal Number

Value	Count	Frequency (%)
1	1182	31.2%
2	1090	28.7%
3	486	12.8%
4	263	6.9%
5	222	5.9%
6	169	4.5%
7	103	2.7%
9	98	2.6%
0	96	2.5%
8	85	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	96	76.2%
.	30	23.8%

Space Separator

Value	Count	Frequency (%)
	755	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	161	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	161	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	128	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67020	91.5%
Common	5125	7.0%
Latin	1067	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2580	3.8%
파	2493	3.7%
트	2231	3.3%
지	1827	2.7%
대	1805	2.7%
동	1674	2.5%
차	1506	2.2%
신	1461	2.2%
단	1440	2.1%
이	1302	1.9%
Other values (380)	48701	72.7%

Latin

Value	Count	Frequency (%)
e	185	17.3%
S	123	11.5%
C	87	8.2%
K	81	7.6%
D	62	5.8%
M	62	5.8%
L	56	5.2%
H	46	4.3%
I	36	3.4%
G	35	3.3%
Other values (19)	294	27.6%

Common

Value	Count	Frequency (%)
1	1182	23.1%
2	1090	21.3%
	755	14.7%
3	486	9.5%
4	263	5.1%
5	222	4.3%
6	169	3.3%
(	161	3.1%
)	161	3.1%
-	128	2.5%
Other values (6)	508	9.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67020	91.5%
ASCII	6188	8.5%
Number Forms	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2580	3.8%
파	2493	3.7%
트	2231	3.3%
지	1827	2.7%
대	1805	2.7%
동	1674	2.5%
차	1506	2.2%
신	1461	2.2%
단	1440	2.1%
이	1302	1.9%
Other values (380)	48701	72.7%

ASCII

Value	Count	Frequency (%)
1	1182	19.1%
2	1090	17.6%
	755	12.2%
3	486	7.9%
4	263	4.3%
5	222	3.6%
e	185	3.0%
6	169	2.7%
(	161	2.6%
)	161	2.6%
Other values (34)	1514	24.5%

Number Forms

Value	Count	Frequency (%)
Ⅰ	4	100.0%

아파트코드
Text

Distinct	2232
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	128 ?
Unique (%)	1.3%

Sample

1st row	A13501004
2nd row	A15370301
3rd row	A14003004
4th row	A13286105
5th row	A15285503

Value	Count	Frequency (%)
a13606201	13	0.1%
a15601105	13	0.1%
a10045601	12	0.1%
a10027817	12	0.1%
a13522006	12	0.1%
a11081503	12	0.1%
a13583402	11	0.1%
a15785710	11	0.1%
a15205103	11	0.1%
a13184401	11	0.1%
Other values (2222)	9882	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18370	20.4%
1	17704	19.7%
A	9989	11.1%
3	8797	9.8%
2	8286	9.2%
5	6323	7.0%
8	5493	6.1%
7	4849	5.4%
4	3894	4.3%
6	3294	3.7%
Other values (2)	3001	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18370	23.0%
1	17704	22.1%
3	8797	11.0%
2	8286	10.4%
5	6323	7.9%
8	5493	6.9%
7	4849	6.1%
4	3894	4.9%
6	3294	4.1%
9	2990	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18370	23.0%
1	17704	22.1%
3	8797	11.0%
2	8286	10.4%
5	6323	7.9%
8	5493	6.9%
7	4849	6.1%
4	3894	4.9%
6	3294	4.1%
9	2990	3.7%

Latin

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18370	20.4%
1	17704	19.7%
A	9989	11.1%
3	8797	9.8%
2	8286	9.2%
5	6323	7.0%
8	5493	6.1%
7	4849	5.4%
4	3894	4.3%
6	3294	3.7%
Other values (2)	3001	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9292
Min length	2

Characters and Unicode

Total characters	59292
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	비품감가상각누계액
2nd row	저장품
3rd row	주차장충당예금
4th row	공동체활성화단체지원적립금
5th row	미처분이익잉여금

Value	Count	Frequency (%)
당기순이익	341	3.4%
미처분이익잉여금	335	3.4%
퇴직급여충당부채	326	3.3%
연차수당충당부채	323	3.2%
예금	323	3.2%
관리비미수금	313	3.1%
예수금	312	3.1%
선급비용	309	3.1%
장기수선충당부채	302	3.0%
현금	300	3.0%
Other values (67)	6816	68.2%

Most occurring characters

Value	Count	Frequency (%)
금	4676	7.9%
당	3824	6.4%
수	3146	5.3%
충	2984	5.0%
비	2934	4.9%
부	2903	4.9%
채	2619	4.4%
기	2461	4.2%
선	1871	3.2%
예	1708	2.9%
Other values (97)	30166	50.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59292	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4676	7.9%
당	3824	6.4%
수	3146	5.3%
충	2984	5.0%
비	2934	4.9%
부	2903	4.9%
채	2619	4.4%
기	2461	4.2%
선	1871	3.2%
예	1708	2.9%
Other values (97)	30166	50.9%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59292	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4676	7.9%
당	3824	6.4%
수	3146	5.3%
충	2984	5.0%
비	2934	4.9%
부	2903	4.9%
채	2619	4.4%
기	2461	4.2%
선	1871	3.2%
예	1708	2.9%
Other values (97)	30166	50.9%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59292	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4676	7.9%
당	3824	6.4%
수	3146	5.3%
충	2984	5.0%
비	2934	4.9%
부	2903	4.9%
채	2619	4.4%
기	2461	4.2%
선	1871	3.2%
예	1708	2.9%
Other values (97)	30166	50.9%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202104	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202104
2nd row	202104
3rd row	202104
4th row	202104
5th row	202104

Common Values

Value	Count	Frequency (%)
202104	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202104	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7544
Distinct (%)	75.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	70793966

Minimum	-4.3086778 × 10⁸
Maximum	7.7812863 × 10⁹
Zeros	2131
Zeros (%)	21.3%
Negative	300
Negative (%)	3.0%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.3086778 × 10⁸
5-th percentile	0
Q1	6940.5
median	3037613.5
Q3	32334470
95-th percentile	3.5975952 × 10⁸
Maximum	7.7812863 × 10⁹
Range	8.2121541 × 10⁹
Interquartile range (IQR)	32327530

Descriptive statistics

Standard deviation	2.6569723 × 10⁸
Coefficient of variation (CV)	3.7531056
Kurtosis	176.01228
Mean	70793966
Median Absolute Deviation (MAD)	3037613.5
Skewness	10.410688
Sum	7.0793966 × 10¹¹
Variance	7.0595018 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2131	21.3%
500000	21	0.2%
250000	20	0.2%
1000000	12	0.1%
20000000	11	0.1%
300000	11	0.1%
30000000	10	0.1%
2000000	9	0.1%
3000000	9	0.1%
200000	9	0.1%
Other values (7534)	7757	77.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-430867776	1	< 0.1%
-389001283	1	< 0.1%
-288131647	1	< 0.1%
-268025520	1	< 0.1%
-190422700	1	< 0.1%
-174724570	1	< 0.1%
-155891782	1	< 0.1%
-148478412	1	< 0.1%
-117902189	1	< 0.1%
-115783434	1	< 0.1%

Value	Count	Frequency (%)
7781286294	1	< 0.1%
6339079850	1	< 0.1%
5452830675	1	< 0.1%
4636264353	1	< 0.1%
4237699740	1	< 0.1%
3993049083	1	< 0.1%
3816357211	1	< 0.1%
3544988262	1	< 0.1%
3494880949	1	< 0.1%
3473469573	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.519
금액	0.519	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
27702	논현신동아	A13501004	비품감가상각누계액	202104	-11591720
60239	독산동한양수자인아파트	A15370301	저장품	202104	78000
47564	이촌동부센트레빌	A14003004	주차장충당예금	202104	0
21800	쌍문한양5차	A13286105	공동체활성화단체지원적립금	202104	8750800
59106	구로보광	A15285503	미처분이익잉여금	202104	0
33747	석관코오롱	A13615002	승강기유지비충당부채	202104	261470
23977	행당두산위브아파트	A13377901	현금	202104	50930
6689	위례 송파푸르지오	A10028086	퇴직급여충당부채	202104	41390216
44803	상계주공14단지	A13981903	예수금	202104	7520100
55973	봉천건영6차아파트	A15176602	상여충당부채	202104	0

	아파트명	아파트코드	비용명	년월일	금액
31609	개포7차우성	A13594403	단기보증금	202104	5300000
18832	신내석탑	A13186503	주차장충당예금	202104	74397232
59638	신도림우성3차	A15288804	기타유동부채	202104	1120000
6337	텐즈힐1단지	A10027920	당기순이익	202104	86340770
30187	삼성동중앙하이츠빌리지	A13550701	승강기유지비충당부채	202104	5948500
58101	서울수목원현대홈타운스위트	A15271601	예수금	202104	1010220
58381	개봉거성푸르뫼2차아피트	A15280303	주차장충당부채	202104	0
64675	강서한강자이	A15720001	예수금	202104	6541535
8834	연희대우	A12011002	승강기유지비충당부채	202104	0
20791	창동대동	A13204501	기타당좌자산	202104	490000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample