gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 666 (6.7%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:51:14.889237
Analysis finished	2024-05-11 06:51:17.332322
Duration	2.44 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2230
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.3967
Min length	2

Characters and Unicode

Total characters	73967
Distinct characters	433
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	126 ?
Unique (%)	1.3%

Sample

1st row	개포더샵트리에
2nd row	영등포 중흥S-클래스
3rd row	방배3차e편한세상
4th row	강남데시앙파크
5th row	월계삼창

Value	Count	Frequency (%)
아파트	195	1.8%
래미안	36	0.3%
e편한세상	31	0.3%
아이파크	25	0.2%
북한산	21	0.2%
sk뷰	18	0.2%
고덕	18	0.2%
송파	16	0.1%
길음뉴타운	16	0.1%
경남아너스빌	15	0.1%
Other values (2312)	10527	96.4%

Most occurring characters

Value	Count	Frequency (%)
아	2565	3.5%
파	2496	3.4%
트	2353	3.2%
지	1828	2.5%
동	1725	2.3%
대	1710	2.3%
차	1480	2.0%
이	1467	2.0%
단	1437	1.9%
신	1434	1.9%
Other values (423)	55472	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67543	91.3%
Decimal Number	3648	4.9%
Space Separator	1003	1.4%
Uppercase Letter	899	1.2%
Lowercase Letter	324	0.4%
Close Punctuation	147	0.2%
Open Punctuation	147	0.2%
Dash Punctuation	137	0.2%
Other Punctuation	111	0.2%
Letter Number	8	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2565	3.8%
파	2496	3.7%
트	2353	3.5%
지	1828	2.7%
동	1725	2.6%
대	1710	2.5%
차	1480	2.2%
이	1467	2.2%
단	1437	2.1%
신	1434	2.1%
Other values (380)	49048	72.6%

Uppercase Letter

Value	Count	Frequency (%)
S	148	16.5%
C	106	11.8%
K	104	11.6%
D	88	9.8%
M	88	9.8%
L	57	6.3%
I	54	6.0%
E	52	5.8%
H	50	5.6%
V	42	4.7%
Other values (7)	110	12.2%

Decimal Number

Value	Count	Frequency (%)
2	1118	30.6%
1	1066	29.2%
3	446	12.2%
4	255	7.0%
5	236	6.5%
7	131	3.6%
6	125	3.4%
9	108	3.0%
8	95	2.6%
0	68	1.9%

Lowercase Letter

Value	Count	Frequency (%)
e	198	61.1%
l	26	8.0%
i	23	7.1%
k	20	6.2%
s	16	4.9%
v	14	4.3%
w	12	3.7%
c	10	3.1%
h	5	1.5%

Other Punctuation

Value	Count	Frequency (%)
,	93	83.8%
.	18	16.2%

Space Separator

Value	Count	Frequency (%)
	1003	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	147	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	147	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	137	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	8	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67543	91.3%
Common	5193	7.0%
Latin	1231	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2565	3.8%
파	2496	3.7%
트	2353	3.5%
지	1828	2.7%
동	1725	2.6%
대	1710	2.5%
차	1480	2.2%
이	1467	2.2%
단	1437	2.1%
신	1434	2.1%
Other values (380)	49048	72.6%

Latin

Value	Count	Frequency (%)
e	198	16.1%
S	148	12.0%
C	106	8.6%
K	104	8.4%
D	88	7.1%
M	88	7.1%
L	57	4.6%
I	54	4.4%
E	52	4.2%
H	50	4.1%
Other values (17)	286	23.2%

Common

Value	Count	Frequency (%)
2	1118	21.5%
1	1066	20.5%
	1003	19.3%
3	446	8.6%
4	255	4.9%
5	236	4.5%
)	147	2.8%
(	147	2.8%
-	137	2.6%
7	131	2.5%
Other values (6)	507	9.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67543	91.3%
ASCII	6416	8.7%
Number Forms	8	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2565	3.8%
파	2496	3.7%
트	2353	3.5%
지	1828	2.7%
동	1725	2.6%
대	1710	2.5%
차	1480	2.2%
이	1467	2.2%
단	1437	2.1%
신	1434	2.1%
Other values (380)	49048	72.6%

ASCII

Value	Count	Frequency (%)
2	1118	17.4%
1	1066	16.6%
	1003	15.6%
3	446	7.0%
4	255	4.0%
5	236	3.7%
e	198	3.1%
S	148	2.3%
)	147	2.3%
(	147	2.3%
Other values (32)	1652	25.7%

Number Forms

Value	Count	Frequency (%)
Ⅰ	8	100.0%

아파트코드
Text

Distinct	2236
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	126 ?
Unique (%)	1.3%

Sample

1st row	A10023996
2nd row	A10024316
3rd row	A13783001
4th row	A13519005
5th row	A13984603

Value	Count	Frequency (%)
a13905202	14	0.1%
a15679107	13	0.1%
a13776508	12	0.1%
a13982005	11	0.1%
a10027553	11	0.1%
a13302206	11	0.1%
a14380414	11	0.1%
a13921005	11	0.1%
a13528103	11	0.1%
a15807606	11	0.1%
Other values (2226)	9884	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18509	20.6%
1	17426	19.4%
A	9998	11.1%
3	8674	9.6%
2	8343	9.3%
5	6308	7.0%
8	5626	6.3%
7	4746	5.3%
4	4073	4.5%
6	3346	3.7%
Other values (2)	2951	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18509	23.1%
1	17426	21.8%
3	8674	10.8%
2	8343	10.4%
5	6308	7.9%
8	5626	7.0%
7	4746	5.9%
4	4073	5.1%
6	3346	4.2%
9	2949	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9998	> 99.9%
B	2	< 0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18509	23.1%
1	17426	21.8%
3	8674	10.8%
2	8343	10.4%
5	6308	7.9%
8	5626	7.0%
7	4746	5.9%
4	4073	5.1%
6	3346	4.2%
9	2949	3.7%

Latin

Value	Count	Frequency (%)
A	9998	> 99.9%
B	2	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18509	20.6%
1	17426	19.4%
A	9998	11.1%
3	8674	9.6%
2	8343	9.3%
5	6308	7.0%
8	5626	6.3%
7	4746	5.3%
4	4073	4.5%
6	3346	3.7%
Other values (2)	2951	3.3%

비용명
Text

Distinct	87
Distinct (%)	0.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	10
Median length	9
Mean length	4.8051
Min length	2

Characters and Unicode

Total characters	48051
Distinct characters	120
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	고용보험료
2nd row	부과차익
3rd row	복리후생비
4th row	위탁관리수수료
5th row	감가상각비

Value	Count	Frequency (%)
연체료수익	244	2.4%
통신비	243	2.4%
급여	240	2.4%
소독비	238	2.4%
도서인쇄비	237	2.4%
사무용품비	237	2.4%
제수당	236	2.4%
승강기유지비	233	2.3%
퇴직급여	232	2.3%
세대전기료	232	2.3%
Other values (77)	7628	76.3%

Most occurring characters

Value	Count	Frequency (%)
비	5385	11.2%
수	3604	7.5%
료	2199	4.6%
익	1914	4.0%
용	1641	3.4%
기	1386	2.9%
대	1123	2.3%
리	887	1.8%
보	853	1.8%
험	806	1.7%
Other values (110)	28253	58.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	48051	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
비	5385	11.2%
수	3604	7.5%
료	2199	4.6%
익	1914	4.0%
용	1641	3.4%
기	1386	2.9%
대	1123	2.3%
리	887	1.8%
보	853	1.8%
험	806	1.7%
Other values (110)	28253	58.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	48051	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
비	5385	11.2%
수	3604	7.5%
료	2199	4.6%
익	1914	4.0%
용	1641	3.4%
기	1386	2.9%
대	1123	2.3%
리	887	1.8%
보	853	1.8%
험	806	1.7%
Other values (110)	28253	58.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	48051	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
비	5385	11.2%
수	3604	7.5%
료	2199	4.6%
익	1914	4.0%
용	1641	3.4%
기	1386	2.9%
대	1123	2.3%
리	887	1.8%
보	853	1.8%
험	806	1.7%
Other values (110)	28253	58.8%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202202	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202202
2nd row	202202
3rd row	202202
4th row	202202
5th row	202202

Common Values

Value	Count	Frequency (%)
202202	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202202	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7361
Distinct (%)	73.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	4024405.4

Minimum	-3041160
Maximum	5.7585519 × 10⁸
Zeros	666
Zeros (%)	6.7%
Negative	8
Negative (%)	0.1%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3041160
5-th percentile	0
Q1	100000
median	351035
Q3	1500252.5
95-th percentile	20389597
Maximum	5.7585519 × 10⁸
Range	5.7889635 × 10⁸
Interquartile range (IQR)	1400252.5

Descriptive statistics

Standard deviation	14833177
Coefficient of variation (CV)	3.6858059
Kurtosis	311.81506
Mean	4024405.4
Median Absolute Deviation (MAD)	327035
Skewness	12.787864
Sum	4.0244054 × 10¹⁰
Variance	2.2002315 × 10¹⁴
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	666	6.7%
200000	82	0.8%
300000	72	0.7%
100000	72	0.7%
30000	47	0.5%
150000	46	0.5%
400000	42	0.4%
50000	41	0.4%
48000	41	0.4%
500000	40	0.4%
Other values (7351)	8851	88.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-3041160	1	< 0.1%
-2000000	1	< 0.1%
-920000	1	< 0.1%
-766350	1	< 0.1%
-323400	1	< 0.1%
-100000	1	< 0.1%
-44300	1	< 0.1%
-4557	1	< 0.1%
0	666	6.7%
1	1	< 0.1%

Value	Count	Frequency (%)
575855190	1	< 0.1%
353828011	1	< 0.1%
273642060	1	< 0.1%
267969109	1	< 0.1%
225276784	1	< 0.1%
204074720	1	< 0.1%
194203956	1	< 0.1%
193937080	1	< 0.1%
193876970	1	< 0.1%
184251670	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.433
금액	0.433	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
388	개포더샵트리에	A10023996	고용보험료	202202	158890
1120	영등포 중흥S-클래스	A10024316	부과차익	202202	19410
49204	방배3차e편한세상	A13783001	복리후생비	202202	0
38987	강남데시앙파크	A13519005	위탁관리수수료	202202	319757
61258	월계삼창	A13984603	감가상각비	202202	128000
94431	은평뉴타운폭포동4단지제1	A41279930	세금과공과	202202	0
68255	광장청구	A14381513	입주자대표회의운영비	202202	180290
83572	흑석한강센트레빌	A15679107	퇴직급여	202202	1912050
24790	상봉건영캐스빌	A13122001	음식물처리비	202202	684560
92229	목동현대2차	A15882006	통신비	202202	32290

	아파트명	아파트코드	비용명	년월일	금액
11243	북한산힐스테이트7차제2 (임대)	A10028056	이자수익	202202	0
27127	도봉서광	A13201001	도서인쇄비	202202	143000
1334	공덕SK리더스뷰 1단지	A10024408	위탁관리수수료	202202	344159
37570	논현신동아	A13501004	경비비	202202	20457750
73615	삼성산주공3단지	A15101506	교통비	202202	6700
36379	고덕현대	A13480401	승강기유지비	202202	1008130
60599	상계한신	A13983608	건강보험료	202202	412190
87893	등촌임광	A15783701	장기수선비	202202	1658100
82867	사당동작삼성래미안아파트	A15609306	고용보험료	202202	213700
81753	상도경향렉스빌	A15603401	퇴직급여	202202	664310

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Lowercase Letter

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample