gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2320 (23.2%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:57:29.000512
Analysis finished	2024-05-11 05:57:30.229668
Duration	1.23 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2232
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.4089
Min length	2

Characters and Unicode

Total characters	74089
Distinct characters	435
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	124 ?
Unique (%)	1.2%

Sample

1st row	래미안서초유니빌
2nd row	상계은빛1단지
3rd row	송파파인타운6단지
4th row	상암월드컵파크7단지
5th row	신정푸른마을2단지

Value	Count	Frequency (%)
아파트	170	1.6%
래미안	51	0.5%
e편한세상	33	0.3%
아이파크	24	0.2%
sk뷰	23	0.2%
고덕	17	0.2%
송파	15	0.1%
꿈의숲	14	0.1%
신반포	14	0.1%
보라매	13	0.1%
Other values (2317)	10448	96.5%

Most occurring characters

Value	Count	Frequency (%)
아	2520	3.4%
파	2502	3.4%
트	2347	3.2%
지	1876	2.5%
대	1752	2.4%
동	1636	2.2%
단	1492	2.0%
차	1456	2.0%
신	1432	1.9%
이	1377	1.9%
Other values (425)	55699	75.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67800	91.5%
Decimal Number	3605	4.9%
Space Separator	907	1.2%
Uppercase Letter	902	1.2%
Lowercase Letter	367	0.5%
Open Punctuation	137	0.2%
Close Punctuation	137	0.2%
Dash Punctuation	120	0.2%
Other Punctuation	109	0.1%
Letter Number	5	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2520	3.7%
파	2502	3.7%
트	2347	3.5%
지	1876	2.8%
대	1752	2.6%
동	1636	2.4%
단	1492	2.2%
차	1456	2.1%
신	1432	2.1%
이	1377	2.0%
Other values (380)	49410	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	163	18.1%
C	113	12.5%
K	108	12.0%
D	85	9.4%
M	85	9.4%
H	64	7.1%
L	51	5.7%
I	48	5.3%
E	47	5.2%
V	38	4.2%
Other values (7)	100	11.1%

Lowercase Letter

Value	Count	Frequency (%)
e	207	56.4%
l	34	9.3%
i	33	9.0%
v	24	6.5%
k	22	6.0%
s	20	5.4%
w	15	4.1%
c	8	2.2%
h	2	0.5%
g	1	0.3%

Decimal Number

Value	Count	Frequency (%)
1	1099	30.5%
2	1031	28.6%
3	444	12.3%
4	281	7.8%
5	227	6.3%
6	142	3.9%
7	135	3.7%
8	104	2.9%
9	83	2.3%
0	59	1.6%

Other Punctuation

Value	Count	Frequency (%)
,	80	73.4%
.	29	26.6%

Space Separator

Value	Count	Frequency (%)
	907	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	137	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	137	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	120	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67800	91.5%
Common	5015	6.8%
Latin	1274	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2520	3.7%
파	2502	3.7%
트	2347	3.5%
지	1876	2.8%
대	1752	2.6%
동	1636	2.4%
단	1492	2.2%
차	1456	2.1%
신	1432	2.1%
이	1377	2.0%
Other values (380)	49410	72.9%

Latin

Value	Count	Frequency (%)
e	207	16.2%
S	163	12.8%
C	113	8.9%
K	108	8.5%
D	85	6.7%
M	85	6.7%
H	64	5.0%
L	51	4.0%
I	48	3.8%
E	47	3.7%
Other values (19)	303	23.8%

Common

Value	Count	Frequency (%)
1	1099	21.9%
2	1031	20.6%
	907	18.1%
3	444	8.9%
4	281	5.6%
5	227	4.5%
6	142	2.8%
(	137	2.7%
)	137	2.7%
7	135	2.7%
Other values (6)	475	9.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67800	91.5%
ASCII	6284	8.5%
Number Forms	5	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2520	3.7%
파	2502	3.7%
트	2347	3.5%
지	1876	2.8%
대	1752	2.6%
동	1636	2.4%
단	1492	2.2%
차	1456	2.1%
신	1432	2.1%
이	1377	2.0%
Other values (380)	49410	72.9%

ASCII

Value	Count	Frequency (%)
1	1099	17.5%
2	1031	16.4%
	907	14.4%
3	444	7.1%
4	281	4.5%
5	227	3.6%
e	207	3.3%
S	163	2.6%
6	142	2.3%
(	137	2.2%
Other values (34)	1646	26.2%

Number Forms

Value	Count	Frequency (%)
Ⅰ	5	100.0%

아파트코드
Text

Distinct	2237
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	125 ?
Unique (%)	1.2%

Sample

1st row	A13707010
2nd row	A13983816
3rd row	A13876108
4th row	A12127005
5th row	A15886508

Value	Count	Frequency (%)
a13922114	13	0.1%
a13519001	12	0.1%
a15180705	12	0.1%
a13187302	12	0.1%
a13671207	11	0.1%
a13611006	11	0.1%
a41279909	11	0.1%
a13984005	11	0.1%
a13771601	11	0.1%
a14007002	11	0.1%
Other values (2227)	9885	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18554	20.6%
1	17522	19.5%
A	9997	11.1%
3	8690	9.7%
2	8387	9.3%
5	6228	6.9%
8	5598	6.2%
7	4738	5.3%
4	3971	4.4%
6	3323	3.7%
Other values (2)	2992	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18554	23.2%
1	17522	21.9%
3	8690	10.9%
2	8387	10.5%
5	6228	7.8%
8	5598	7.0%
7	4738	5.9%
4	3971	5.0%
6	3323	4.2%
9	2989	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9997	> 99.9%
B	3	< 0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18554	23.2%
1	17522	21.9%
3	8690	10.9%
2	8387	10.5%
5	6228	7.8%
8	5598	7.0%
7	4738	5.9%
4	3971	5.0%
6	3323	4.2%
9	2989	3.7%

Latin

Value	Count	Frequency (%)
A	9997	> 99.9%
B	3	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18554	20.6%
1	17522	19.5%
A	9997	11.1%
3	8690	9.7%
2	8387	9.3%
5	6228	6.9%
8	5598	6.2%
7	4738	5.3%
4	3971	4.4%
6	3323	3.7%
Other values (2)	2992	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	6.0013
Min length	2

Characters and Unicode

Total characters	60013
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	승강기유지비충당부채
2nd row	예금
3rd row	연차수당충당부채
4th row	기타충당부채
5th row	예수금

Value	Count	Frequency (%)
공동주택적립금	345	3.5%
예금	329	3.3%
관리비미수금	325	3.2%
장기수선충당부채	317	3.2%
선급비용	317	3.2%
미처분이익잉여금	314	3.1%
연차수당충당부채	305	3.0%
비품	303	3.0%
당기순이익	302	3.0%
퇴직급여충당부채	301	3.0%
Other values (67)	6842	68.4%

Most occurring characters

Value	Count	Frequency (%)
금	4602	7.7%
당	3887	6.5%
수	3136	5.2%
충	3076	5.1%
비	3035	5.1%
부	2945	4.9%
채	2649	4.4%
기	2526	4.2%
선	1911	3.2%
예	1761	2.9%
Other values (97)	30485	50.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60013	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4602	7.7%
당	3887	6.5%
수	3136	5.2%
충	3076	5.1%
비	3035	5.1%
부	2945	4.9%
채	2649	4.4%
기	2526	4.2%
선	1911	3.2%
예	1761	2.9%
Other values (97)	30485	50.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60013	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4602	7.7%
당	3887	6.5%
수	3136	5.2%
충	3076	5.1%
비	3035	5.1%
부	2945	4.9%
채	2649	4.4%
기	2526	4.2%
선	1911	3.2%
예	1761	2.9%
Other values (97)	30485	50.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60013	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4602	7.7%
당	3887	6.5%
수	3136	5.2%
충	3076	5.1%
비	3035	5.1%
부	2945	4.9%
채	2649	4.4%
기	2526	4.2%
선	1911	3.2%
예	1761	2.9%
Other values (97)	30485	50.8%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202210	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202210
2nd row	202210
3rd row	202210
4th row	202210
5th row	202210

Common Values

Value	Count	Frequency (%)
202210	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202210	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7366
Distinct (%)	73.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	77653346

Minimum	-2.5519501 × 10⁹
Maximum	9.0421074 × 10⁹
Zeros	2320
Zeros (%)	23.2%
Negative	335
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-2.5519501 × 10⁹
5-th percentile	0
Q1	0
median	3563675
Q3	38970549
95-th percentile	3.869958 × 10⁸
Maximum	9.0421074 × 10⁹
Range	1.1594058 × 10¹⁰
Interquartile range (IQR)	38970549

Descriptive statistics

Standard deviation	2.9269846 × 10⁸
Coefficient of variation (CV)	3.7692961
Kurtosis	189.46206
Mean	77653346
Median Absolute Deviation (MAD)	3563675
Skewness	10.666021
Sum	7.7653346 × 10¹¹
Variance	8.5672386 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2320	23.2%
500000	27	0.3%
250000	17	0.2%
200000	16	0.2%
300000	13	0.1%
484000	12	0.1%
242000	12	0.1%
2000000	11	0.1%
20000000	10	0.1%
1000000	10	0.1%
Other values (7356)	7552	75.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-2551950146	1	< 0.1%
-330224456	1	< 0.1%
-304675700	1	< 0.1%
-300272334	1	< 0.1%
-269170920	1	< 0.1%
-199705516	1	< 0.1%
-190422700	1	< 0.1%
-173712590	1	< 0.1%
-156589520	1	< 0.1%
-156250434	1	< 0.1%

Value	Count	Frequency (%)
9042107364	1	< 0.1%
6454311838	1	< 0.1%
5406157430	1	< 0.1%
5200430666	1	< 0.1%
5046139742	1	< 0.1%
4841341082	1	< 0.1%
4797118437	1	< 0.1%
4429098711	1	< 0.1%
4250990076	1	< 0.1%
3815992140	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.346
금액	0.346	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
36469	래미안서초유니빌	A13707010	승강기유지비충당부채	202210	0
46523	상계은빛1단지	A13983816	예금	202210	290489338
41056	송파파인타운6단지	A13876108	연차수당충당부채	202210	7900740
12224	상암월드컵파크7단지	A12127005	기타충당부채	202210	157597642
71227	신정푸른마을2단지	A15886508	예수금	202210	2873450
43610	중계주공4단지	A13922406	미지급금	202210	155929886
53999	양평삼호	A15010304	기타의비유동자산	202210	250000
43647	중계대림벽산	A13922903	선수금	202210	0
22415	도봉파크빌2단지	A13275303	선급금	202210	552460
30337	일원샘터마을	A13523004	기타유동부채	202210	87719110

	아파트명	아파트코드	비용명	년월일	금액
24907	어울림더리버아파트	A13375906	관리비예치금	202210	74682000
45123	공릉풍림아이원	A13980513	저장품	202210	148510
61312	독산한신	A15383307	선수관리비	202210	159584000
7636	강동역신동아파밀리에	A10027948	가지급금	202210	101754
7937	북한산힐스테이트7차제2 (임대)	A10028056	기타충당부채	202210	0
39867	송파동부센트레빌	A13816101	연차수당충당부채	202210	2439930
23504	창동한신	A13292002	수선유지비충당부채	202210	2848610
881	신내역 금강펜테리움 센트럴파크아파트	A10024214	비품	202210	39971500
62034	상도동원베네스트	A15603001	기타당좌자산	202210	0
3951	목동파크자이아파트	A10025729	현금	202210	1180214

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample