gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2318 (23.2%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:58:28.364310
Analysis finished	2024-05-11 05:58:29.496307
Duration	1.13 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2151
Distinct (%)	21.5%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.3642
Min length	2

Characters and Unicode

Total characters	73642
Distinct characters	434
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	89 ?
Unique (%)	0.9%

Sample

1st row	방화대림E편한세상
2nd row	신도림대림5차e-편한세상
3rd row	서초삼풍
4th row	관악우방
5th row	휘경 미소지움아파트

Value	Count	Frequency (%)
아파트	171	1.6%
래미안	41	0.4%
아이파크	24	0.2%
e편한세상	23	0.2%
sk뷰	19	0.2%
고덕	17	0.2%
신반포	16	0.1%
백련산	15	0.1%
신내동성1차2차	15	0.1%
푸르지오	15	0.1%
Other values (2223)	10401	96.7%

Most occurring characters

Value	Count	Frequency (%)
아	2539	3.4%
파	2509	3.4%
트	2279	3.1%
대	1749	2.4%
지	1741	2.4%
동	1657	2.3%
차	1506	2.0%
신	1467	2.0%
이	1375	1.9%
단	1373	1.9%
Other values (424)	55447	75.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67406	91.5%
Decimal Number	3638	4.9%
Space Separator	826	1.1%
Uppercase Letter	790	1.1%
Lowercase Letter	398	0.5%
Open Punctuation	167	0.2%
Close Punctuation	167	0.2%
Dash Punctuation	141	0.2%
Other Punctuation	106	0.1%
Letter Number	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2539	3.8%
파	2509	3.7%
트	2279	3.4%
대	1749	2.6%
지	1741	2.6%
동	1657	2.5%
차	1506	2.2%
신	1467	2.2%
이	1375	2.0%
단	1373	2.0%
Other values (379)	49211	73.0%

Uppercase Letter

Value	Count	Frequency (%)
S	139	17.6%
C	103	13.0%
K	86	10.9%
D	78	9.9%
M	78	9.9%
L	69	8.7%
H	58	7.3%
E	36	4.6%
G	36	4.6%
I	31	3.9%
Other values (7)	76	9.6%

Lowercase Letter

Value	Count	Frequency (%)
e	202	50.8%
l	40	10.1%
i	35	8.8%
k	25	6.3%
s	25	6.3%
v	24	6.0%
c	16	4.0%
h	8	2.0%
g	8	2.0%
a	8	2.0%

Decimal Number

Value	Count	Frequency (%)
1	1080	29.7%
2	1075	29.5%
3	499	13.7%
4	245	6.7%
5	197	5.4%
6	179	4.9%
7	117	3.2%
9	85	2.3%
0	83	2.3%
8	78	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	84	79.2%
.	22	20.8%

Space Separator

Value	Count	Frequency (%)
	826	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	167	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	167	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	141	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67406	91.5%
Common	5045	6.9%
Latin	1191	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2539	3.8%
파	2509	3.7%
트	2279	3.4%
대	1749	2.6%
지	1741	2.6%
동	1657	2.5%
차	1506	2.2%
신	1467	2.2%
이	1375	2.0%
단	1373	2.0%
Other values (379)	49211	73.0%

Latin

Value	Count	Frequency (%)
e	202	17.0%
S	139	11.7%
C	103	8.6%
K	86	7.2%
D	78	6.5%
M	78	6.5%
L	69	5.8%
H	58	4.9%
l	40	3.4%
E	36	3.0%
Other values (19)	302	25.4%

Common

Value	Count	Frequency (%)
1	1080	21.4%
2	1075	21.3%
	826	16.4%
3	499	9.9%
4	245	4.9%
5	197	3.9%
6	179	3.5%
(	167	3.3%
)	167	3.3%
-	141	2.8%
Other values (6)	469	9.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67406	91.5%
ASCII	6233	8.5%
Number Forms	3	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2539	3.8%
파	2509	3.7%
트	2279	3.4%
대	1749	2.6%
지	1741	2.6%
동	1657	2.5%
차	1506	2.2%
신	1467	2.2%
이	1375	2.0%
단	1373	2.0%
Other values (379)	49211	73.0%

ASCII

Value	Count	Frequency (%)
1	1080	17.3%
2	1075	17.2%
	826	13.3%
3	499	8.0%
4	245	3.9%
e	202	3.2%
5	197	3.2%
6	179	2.9%
(	167	2.7%
)	167	2.7%
Other values (34)	1596	25.6%

Number Forms

Value	Count	Frequency (%)
Ⅰ	3	100.0%

아파트코드
Text

Distinct	2156
Distinct (%)	21.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	89 ?
Unique (%)	0.9%

Sample

1st row	A15722204
2nd row	A15288805
3rd row	A13792001
4th row	A15303203
5th row	A13077702

Value	Count	Frequency (%)
a13186708	15	0.1%
a13881701	14	0.1%
a14085002	14	0.1%
a10025263	12	0.1%
a13407104	12	0.1%
a13402003	12	0.1%
a15081002	12	0.1%
a12282203	12	0.1%
a14003106	12	0.1%
a13920506	12	0.1%
Other values (2146)	9873	98.7%

Most occurring characters

Value	Count	Frequency (%)
0	18629	20.7%
1	17635	19.6%
A	10000	11.1%
3	8731	9.7%
2	8377	9.3%
5	6235	6.9%
8	5531	6.1%
7	4607	5.1%
4	4012	4.5%
6	3340	3.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18629	23.3%
1	17635	22.0%
3	8731	10.9%
2	8377	10.5%
5	6235	7.8%
8	5531	6.9%
7	4607	5.8%
4	4012	5.0%
6	3340	4.2%
9	2903	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18629	23.3%
1	17635	22.0%
3	8731	10.9%
2	8377	10.5%
5	6235	7.8%
8	5531	6.9%
7	4607	5.8%
4	4012	5.0%
6	3340	4.2%
9	2903	3.6%

Latin

Value	Count	Frequency (%)
A	10000	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18629	20.7%
1	17635	19.6%
A	10000	11.1%
3	8731	9.7%
2	8377	9.3%
5	6235	6.9%
8	5531	6.1%
7	4607	5.1%
4	4012	4.5%
6	3340	3.7%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9536
Min length	2

Characters and Unicode

Total characters	59536
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	가수금
2nd row	상여충당부채
3rd row	복리후생비충당부채
4th row	미처분이익잉여금
5th row	퇴직급여충당부채

Value	Count	Frequency (%)
미처분이익잉여금	328	3.3%
당기순이익	317	3.2%
관리비미수금	312	3.1%
선급비용	309	3.1%
공동주택적립금	307	3.1%
연차수당충당부채	307	3.1%
비품	305	3.0%
예금	299	3.0%
퇴직급여충당부채	299	3.0%
예수금	298	3.0%
Other values (67)	6919	69.2%

Most occurring characters

Value	Count	Frequency (%)
금	4565	7.7%
당	3847	6.5%
수	3121	5.2%
비	3046	5.1%
충	3035	5.1%
부	2960	5.0%
채	2656	4.5%
기	2485	4.2%
선	1890	3.2%
예	1697	2.9%
Other values (97)	30234	50.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59536	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4565	7.7%
당	3847	6.5%
수	3121	5.2%
비	3046	5.1%
충	3035	5.1%
부	2960	5.0%
채	2656	4.5%
기	2485	4.2%
선	1890	3.2%
예	1697	2.9%
Other values (97)	30234	50.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59536	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4565	7.7%
당	3847	6.5%
수	3121	5.2%
비	3046	5.1%
충	3035	5.1%
부	2960	5.0%
채	2656	4.5%
기	2485	4.2%
선	1890	3.2%
예	1697	2.9%
Other values (97)	30234	50.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59536	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4565	7.7%
당	3847	6.5%
수	3121	5.2%
비	3046	5.1%
충	3035	5.1%
부	2960	5.0%
채	2656	4.5%
기	2485	4.2%
선	1890	3.2%
예	1697	2.9%
Other values (97)	30234	50.8%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202111	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202111
2nd row	202111
3rd row	202111
4th row	202111
5th row	202111

Common Values

Value	Count	Frequency (%)
202111	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202111	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7341
Distinct (%)	73.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	72390703

Minimum	-3.6943877 × 10⁸
Maximum	6.7024207 × 10⁹
Zeros	2318
Zeros (%)	23.2%
Negative	323
Negative (%)	3.2%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.6943877 × 10⁸
5-th percentile	0
Q1	0
median	3005049
Q3	34301188
95-th percentile	3.4176207 × 10⁸
Maximum	6.7024207 × 10⁹
Range	7.0718594 × 10⁹
Interquartile range (IQR)	34301188

Descriptive statistics

Standard deviation	2.7537762 × 10⁸
Coefficient of variation (CV)	3.8040468
Kurtosis	142.33877
Mean	72390703
Median Absolute Deviation (MAD)	3005049
Skewness	9.9056155
Sum	7.2390703 × 10¹¹
Variance	7.5832836 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2318	23.2%
500000	36	0.4%
242000	17	0.2%
250000	16	0.2%
484000	13	0.1%
300000	13	0.1%
1000000	12	0.1%
100000	12	0.1%
200000	11	0.1%
2000000	10	0.1%
Other values (7331)	7542	75.4%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-369438768	1	< 0.1%
-264005590	1	< 0.1%
-257913346	1	< 0.1%
-230922000	1	< 0.1%
-201330000	1	< 0.1%
-167011730	1	< 0.1%
-139259226	1	< 0.1%
-133221705	1	< 0.1%
-119527363	1	< 0.1%
-106213220	1	< 0.1%

Value	Count	Frequency (%)
6702420662	1	< 0.1%
5447921597	1	< 0.1%
5230947921	1	< 0.1%
5168126591	1	< 0.1%
4921004897	1	< 0.1%
4904836096	1	< 0.1%
4873014602	1	< 0.1%
3811349718	1	< 0.1%
3733288838	1	< 0.1%
3727769325	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.530
금액	0.530	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
63078	방화대림E편한세상	A15722204	가수금	202111	3032020
57659	신도림대림5차e-편한세상	A15288805	상여충당부채	202111	0
36901	서초삼풍	A13792001	복리후생비충당부채	202111	951490
58083	관악우방	A15303203	미처분이익잉여금	202111	0
15968	휘경 미소지움아파트	A13077702	퇴직급여충당부채	202111	31761660
58406	독산계룡	A15381402	예수금	202111	4293180
45979	하계한신	A13993503	수선유지비충당부채	202111	33137727
58250	독산주공14단지	A15375809	장기수선충당예금	202111	737748211
68687	은평뉴타운마고정11단지	A41279913	선급금	202111	1812709
62513	마곡금호어울림	A15721001	기타유동부채	202111	0

	아파트명	아파트코드	비용명	년월일	금액
20315	창동금용	A13204201	장기수선충당부채	202111	405745967
41891	하계2차현대아파트	A13923106	선급금	202111	1833310
3462	항동하버라인8단지	A10025858	예금	202111	141547079
65419	목동3단지	A15805003	선수전기료	202111	2452315
39222	잠실5단지아파트	A13879102	승강기유지비충당부채	202111	36590440
27203	동양파라곤	A13501001	미처분이익잉여금	202111	4016006
27438	역삼아이파크	A13508009	미처분이익잉여금	202111	0
20413	창동태영데시앙	A13204205	장기수선충당예금	202111	868152202
34657	래미안 서초스위트 아파트	A13707009	선급비용	202111	1112170
66449	신정동일하이빌	A15807315	장기수선충당예금	202111	891594389

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample