gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2347 (23.5%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:56:00.506221
Analysis finished	2024-05-11 05:56:01.618575
Duration	1.11 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2262
Distinct (%)	22.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.4677
Min length	2

Characters and Unicode

Total characters	74677
Distinct characters	433
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	131 ?
Unique (%)	1.3%

Sample

1st row	양재우성
2nd row	고덕아남
3rd row	천호삼성아파트
4th row	현대멤피스아파트
5th row	금호어울림1차

Value	Count	Frequency (%)
아파트	156	1.4%
래미안	51	0.5%
e편한세상	25	0.2%
푸르지오	19	0.2%
sk뷰	19	0.2%
송파	16	0.1%
아이파크	16	0.1%
신반포	15	0.1%
해모로	15	0.1%
강남한신휴플러스	15	0.1%
Other values (2347)	10489	96.8%

Most occurring characters

Value	Count	Frequency (%)
파	2497	3.3%
아	2490	3.3%
트	2350	3.1%
지	1945	2.6%
대	1637	2.2%
동	1606	2.2%
단	1519	2.0%
차	1454	1.9%
신	1436	1.9%
이	1403	1.9%
Other values (423)	56340	75.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	68178	91.3%
Decimal Number	3724	5.0%
Space Separator	918	1.2%
Uppercase Letter	903	1.2%
Lowercase Letter	364	0.5%
Open Punctuation	157	0.2%
Close Punctuation	157	0.2%
Dash Punctuation	146	0.2%
Other Punctuation	125	0.2%
Letter Number	5	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
파	2497	3.7%
아	2490	3.7%
트	2350	3.4%
지	1945	2.9%
대	1637	2.4%
동	1606	2.4%
단	1519	2.2%
차	1454	2.1%
신	1436	2.1%
이	1403	2.1%
Other values (378)	49841	73.1%

Uppercase Letter

Value	Count	Frequency (%)
S	162	17.9%
C	128	14.2%
K	123	13.6%
M	86	9.5%
D	86	9.5%
L	63	7.0%
H	46	5.1%
I	46	5.1%
E	41	4.5%
V	31	3.4%
Other values (7)	91	10.1%

Lowercase Letter

Value	Count	Frequency (%)
e	196	53.8%
l	34	9.3%
i	34	9.3%
s	23	6.3%
v	19	5.2%
k	13	3.6%
h	13	3.6%
g	9	2.5%
a	9	2.5%
w	8	2.2%

Decimal Number

Value	Count	Frequency (%)
1	1172	31.5%
2	1045	28.1%
3	534	14.3%
4	241	6.5%
5	196	5.3%
6	161	4.3%
8	107	2.9%
7	105	2.8%
9	95	2.6%
0	68	1.8%

Other Punctuation

Value	Count	Frequency (%)
,	89	71.2%
.	36	28.8%

Space Separator

Value	Count	Frequency (%)
	918	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	157	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	157	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	146	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	68178	91.3%
Common	5227	7.0%
Latin	1272	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
파	2497	3.7%
아	2490	3.7%
트	2350	3.4%
지	1945	2.9%
대	1637	2.4%
동	1606	2.4%
단	1519	2.2%
차	1454	2.1%
신	1436	2.1%
이	1403	2.1%
Other values (378)	49841	73.1%

Latin

Value	Count	Frequency (%)
e	196	15.4%
S	162	12.7%
C	128	10.1%
K	123	9.7%
M	86	6.8%
D	86	6.8%
L	63	5.0%
H	46	3.6%
I	46	3.6%
E	41	3.2%
Other values (19)	295	23.2%

Common

Value	Count	Frequency (%)
1	1172	22.4%
2	1045	20.0%
	918	17.6%
3	534	10.2%
4	241	4.6%
5	196	3.7%
6	161	3.1%
(	157	3.0%
)	157	3.0%
-	146	2.8%
Other values (6)	500	9.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	68178	91.3%
ASCII	6494	8.7%
Number Forms	5	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
파	2497	3.7%
아	2490	3.7%
트	2350	3.4%
지	1945	2.9%
대	1637	2.4%
동	1606	2.4%
단	1519	2.2%
차	1454	2.1%
신	1436	2.1%
이	1403	2.1%
Other values (378)	49841	73.1%

ASCII

Value	Count	Frequency (%)
1	1172	18.0%
2	1045	16.1%
	918	14.1%
3	534	8.2%
4	241	3.7%
5	196	3.0%
e	196	3.0%
S	162	2.5%
6	161	2.5%
(	157	2.4%
Other values (34)	1712	26.4%

Number Forms

Value	Count	Frequency (%)
Ⅰ	5	100.0%

아파트코드
Text

Distinct	2266
Distinct (%)	22.7%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	132 ?
Unique (%)	1.3%

Sample

1st row	A13789203
2nd row	A13480403
3rd row	A13402305
4th row	A13782902
5th row	A13812003

Value	Count	Frequency (%)
a15003002	12	0.1%
a13987306	12	0.1%
a13590602	12	0.1%
a15601003	11	0.1%
a14272306	11	0.1%
a13671206	11	0.1%
a13006003	11	0.1%
a13613011	11	0.1%
a13822003	11	0.1%
a13887405	11	0.1%
Other values (2256)	9887	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18450	20.5%
1	17399	19.3%
A	9989	11.1%
3	9000	10.0%
2	8260	9.2%
5	6280	7.0%
8	5557	6.2%
7	4659	5.2%
4	3976	4.4%
6	3394	3.8%
Other values (2)	3036	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18450	23.1%
1	17399	21.7%
3	9000	11.2%
2	8260	10.3%
5	6280	7.8%
8	5557	6.9%
7	4659	5.8%
4	3976	5.0%
6	3394	4.2%
9	3025	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18450	23.1%
1	17399	21.7%
3	9000	11.2%
2	8260	10.3%
5	6280	7.8%
8	5557	6.9%
7	4659	5.8%
4	3976	5.0%
6	3394	4.2%
9	3025	3.8%

Latin

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18450	20.5%
1	17399	19.3%
A	9989	11.1%
3	9000	10.0%
2	8260	9.2%
5	6280	7.0%
8	5557	6.2%
7	4659	5.2%
4	3976	4.4%
6	3394	3.8%
Other values (2)	3036	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9622
Min length	2

Characters and Unicode

Total characters	59622
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	연차수당충당부채
2nd row	퇴직급여충당예금
3rd row	장기수선충당예금
4th row	연차수당충당부채
5th row	수선유지비충당부채

Value	Count	Frequency (%)
예금	332	3.3%
연차수당충당부채	327	3.3%
관리비미수금	317	3.2%
공동주택적립금	313	3.1%
장기수선충당예금	311	3.1%
퇴직급여충당부채	305	3.0%
선급비용	303	3.0%
장기수선충당부채	302	3.0%
미처분이익잉여금	300	3.0%
가수금	299	3.0%
Other values (67)	6891	68.9%

Most occurring characters

Value	Count	Frequency (%)
금	4655	7.8%
당	3919	6.6%
수	3207	5.4%
충	3092	5.2%
비	2981	5.0%
부	2913	4.9%
채	2646	4.4%
기	2454	4.1%
선	1936	3.2%
예	1758	2.9%
Other values (97)	30061	50.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59622	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4655	7.8%
당	3919	6.6%
수	3207	5.4%
충	3092	5.2%
비	2981	5.0%
부	2913	4.9%
채	2646	4.4%
기	2454	4.1%
선	1936	3.2%
예	1758	2.9%
Other values (97)	30061	50.4%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59622	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3919	6.6%
수	3207	5.4%
충	3092	5.2%
비	2981	5.0%
부	2913	4.9%
채	2646	4.4%
기	2454	4.1%
선	1936	3.2%
예	1758	2.9%
Other values (97)	30061	50.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59622	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4655	7.8%
당	3919	6.6%
수	3207	5.4%
충	3092	5.2%
비	2981	5.0%
부	2913	4.9%
채	2646	4.4%
기	2454	4.1%
선	1936	3.2%
예	1758	2.9%
Other values (97)	30061	50.4%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202306	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202306
2nd row	202306
3rd row	202306
4th row	202306
5th row	202306

Common Values

Value	Count	Frequency (%)
202306	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202306	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7334
Distinct (%)	73.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	84359812

Minimum	-7.3862196 × 10⁸
Maximum	1.6922472 × 10¹⁰
Zeros	2347
Zeros (%)	23.5%
Negative	336
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-7.3862196 × 10⁸
5-th percentile	0
Q1	0
median	3197775
Q3	37194302
95-th percentile	3.9086267 × 10⁸
Maximum	1.6922472 × 10¹⁰
Range	1.7661094 × 10¹⁰
Interquartile range (IQR)	37194302

Descriptive statistics

Standard deviation	3.680474 × 10⁸
Coefficient of variation (CV)	4.3628286
Kurtosis	535.35607
Mean	84359812
Median Absolute Deviation (MAD)	3197775
Skewness	16.864668
Sum	8.4359812 × 10¹¹
Variance	1.3545889 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2347	23.5%
500000	27	0.3%
250000	20	0.2%
300000	14	0.1%
242000	12	0.1%
484000	11	0.1%
100000	11	0.1%
250400	10	0.1%
600000	9	0.1%
20000000	9	0.1%
Other values (7324)	7530	75.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-738621958	1	< 0.1%
-389001283	1	< 0.1%
-289749540	1	< 0.1%
-241325140	1	< 0.1%
-214954380	1	< 0.1%
-206615440	1	< 0.1%
-205993544	1	< 0.1%
-171420990	1	< 0.1%
-169446320	1	< 0.1%
-139539990	1	< 0.1%

Value	Count	Frequency (%)
16922472486	1	< 0.1%
8528159731	1	< 0.1%
7136250926	1	< 0.1%
6855010093	1	< 0.1%
5694659305	1	< 0.1%
5453286124	1	< 0.1%
5120999288	1	< 0.1%
4728350166	1	< 0.1%
4688111055	1	< 0.1%
4537885252	2	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.381
금액	0.381	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
39242	양재우성	A13789203	연차수당충당부채	202306	14266440
28714	고덕아남	A13480403	퇴직급여충당예금	202306	0
27388	천호삼성아파트	A13402305	장기수선충당예금	202306	324906548
38400	현대멤피스아파트	A13782902	연차수당충당부채	202306	4668732
40366	금호어울림1차	A13812003	수선유지비충당부채	202306	3566620
69920	목동대원칸타빌2,3단지	A15805404	현금	202306	17155
44140	상계주공10단지	A13920804	예금	202306	369138111
13113	삼성래미안공덕4차	A12170601	기타시설운영충당부채	202306	0
50399	시티파크2단지	A14088201	장기수선충당부채	202306	1058992157
8613	용마산하늘채아파트	A10028033	비품	202306	28396770

	아파트명	아파트코드	비용명	년월일	금액
8054	DMC파크뷰자이아파트	A10027817	선급금	202306	0
23620	방학삼성래미안1단지	A13285406	미수금	202306	0
17002	제기이수브라운스톤	A13006003	전신전화가입권	202306	0
52660	광장현대3단지아파트	A14381415	공동체활성화단체지원적립금	202306	0
55064	양평동보아파트	A15010501	관리비미수금	202306	1741840
36103	삼선푸르지오아파트	A13672101	미부과관리비	202306	247251628
35247	월곡3SH-vill	A13613003	기타충당예금	202306	14373666
67038	마곡서광	A15722306	선수금	202306	0
40670	오금현대아파트	A13813010	기타충당부채	202306	32100000
23775	쌍문성원	A13286106	저장품	202306	828000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample