gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 1529 (15.3%) zeros	Zeros

Reproduction

Analysis started	2024-05-18 02:46:42.362893
Analysis finished	2024-05-18 02:46:43.976443
Duration	1.61 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2086
Distinct (%)	20.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.3876
Min length	2

Characters and Unicode

Total characters	73876
Distinct characters	430
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	88 ?
Unique (%)	0.9%

Sample

1st row	청년주택 와이엔타워
2nd row	장위참누리
3rd row	서울역한라비발디센트럴아파트
4th row	송파파인타운11단지
5th row	광장현대파크빌

Value	Count	Frequency (%)
아파트	194	1.8%
래미안	49	0.4%
아이파크	37	0.3%
e편한세상	29	0.3%
sk뷰	24	0.2%
코오롱하늘채아파트	18	0.2%
자이	17	0.2%
래미안밤섬리베뉴	17	0.2%
북한산	17	0.2%
신길삼두	16	0.1%
Other values (2167)	10506	96.2%

Most occurring characters

Value	Count	Frequency (%)
파	2720	3.7%
아	2704	3.7%
트	2545	3.4%
대	1751	2.4%
지	1665	2.3%
동	1530	2.1%
이	1507	2.0%
차	1429	1.9%
신	1267	1.7%
성	1252	1.7%
Other values (420)	55506	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67731	91.7%
Decimal Number	3324	4.5%
Space Separator	1009	1.4%
Uppercase Letter	956	1.3%
Lowercase Letter	329	0.4%
Close Punctuation	155	0.2%
Open Punctuation	155	0.2%
Dash Punctuation	123	0.2%
Other Punctuation	84	0.1%
Letter Number	10	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
파	2720	4.0%
아	2704	4.0%
트	2545	3.8%
대	1751	2.6%
지	1665	2.5%
동	1530	2.3%
이	1507	2.2%
차	1429	2.1%
신	1267	1.9%
성	1252	1.8%
Other values (375)	49361	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	145	15.2%
C	140	14.6%
K	107	11.2%
D	103	10.8%
M	103	10.8%
L	66	6.9%
H	51	5.3%
I	49	5.1%
E	44	4.6%
G	33	3.5%
Other values (7)	115	12.0%

Lowercase Letter

Value	Count	Frequency (%)
e	178	54.1%
l	36	10.9%
i	28	8.5%
s	23	7.0%
v	23	7.0%
k	20	6.1%
w	8	2.4%
h	5	1.5%
c	4	1.2%
g	2	0.6%

Decimal Number

Value	Count	Frequency (%)
2	1000	30.1%
1	996	30.0%
3	463	13.9%
4	223	6.7%
5	175	5.3%
6	129	3.9%
7	106	3.2%
8	99	3.0%
9	81	2.4%
0	52	1.6%

Other Punctuation

Value	Count	Frequency (%)
,	63	75.0%
.	21	25.0%

Space Separator

Value	Count	Frequency (%)
	1009	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	155	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	155	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	123	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	10	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67731	91.7%
Common	4850	6.6%
Latin	1295	1.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
파	2720	4.0%
아	2704	4.0%
트	2545	3.8%
대	1751	2.6%
지	1665	2.5%
동	1530	2.3%
이	1507	2.2%
차	1429	2.1%
신	1267	1.9%
성	1252	1.8%
Other values (375)	49361	72.9%

Latin

Value	Count	Frequency (%)
e	178	13.7%
S	145	11.2%
C	140	10.8%
K	107	8.3%
D	103	8.0%
M	103	8.0%
L	66	5.1%
H	51	3.9%
I	49	3.8%
E	44	3.4%
Other values (19)	309	23.9%

Common

Value	Count	Frequency (%)
	1009	20.8%
2	1000	20.6%
1	996	20.5%
3	463	9.5%
4	223	4.6%
5	175	3.6%
)	155	3.2%
(	155	3.2%
6	129	2.7%
-	123	2.5%
Other values (6)	422	8.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67731	91.7%
ASCII	6135	8.3%
Number Forms	10	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
파	2720	4.0%
아	2704	4.0%
트	2545	3.8%
대	1751	2.6%
지	1665	2.5%
동	1530	2.3%
이	1507	2.2%
차	1429	2.1%
신	1267	1.9%
성	1252	1.8%
Other values (375)	49361	72.9%

ASCII

Value	Count	Frequency (%)
	1009	16.4%
2	1000	16.3%
1	996	16.2%
3	463	7.5%
4	223	3.6%
e	178	2.9%
5	175	2.9%
)	155	2.5%
(	155	2.5%
S	145	2.4%
Other values (34)	1636	26.7%

Number Forms

Value	Count	Frequency (%)
Ⅰ	10	100.0%

아파트코드
Text

Distinct	2090
Distinct (%)	20.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	88 ?
Unique (%)	0.9%

Sample

1st row	A10023990
2nd row	A13614302
3rd row	A10026517
4th row	A13821002
5th row	A14381516

Value	Count	Frequency (%)
a15083701	16	0.2%
a15086601	15	0.1%
a13084803	13	0.1%
a15681503	13	0.1%
a14072702	13	0.1%
a12081703	13	0.1%
a12281701	12	0.1%
a10024552	12	0.1%
a12012202	12	0.1%
a13485401	12	0.1%
Other values (2080)	9869	98.7%

Most occurring characters

Value	Count	Frequency (%)
0	19279	21.4%
1	17596	19.6%
A	10000	11.1%
3	9033	10.0%
2	8462	9.4%
5	5887	6.5%
8	5162	5.7%
7	4313	4.8%
4	4031	4.5%
6	3369	3.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	19279	24.1%
1	17596	22.0%
3	9033	11.3%
2	8462	10.6%
5	5887	7.4%
8	5162	6.5%
7	4313	5.4%
4	4031	5.0%
6	3369	4.2%
9	2868	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	10000	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	19279	24.1%
1	17596	22.0%
3	9033	11.3%
2	8462	10.6%
5	5887	7.4%
8	5162	6.5%
7	4313	5.4%
4	4031	5.0%
6	3369	4.2%
9	2868	3.6%

Latin

Value	Count	Frequency (%)
A	10000	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	19279	21.4%
1	17596	19.6%
A	10000	11.1%
3	9033	10.0%
2	8462	9.4%
5	5887	6.5%
8	5162	5.7%
7	4313	4.8%
4	4031	4.5%
6	3369	3.7%

비용명
Text

Distinct	85
Distinct (%)	0.9%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	10
Median length	9
Mean length	4.9
Min length	2

Characters and Unicode

Total characters	49000
Distinct characters	120
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	검침수익
2nd row	충당부채전입이자비용
3rd row	사무용품비
4th row	주차장수익
5th row	공동가스료

Value	Count	Frequency (%)
급여	231	2.3%
퇴직급여	224	2.2%
교육비	223	2.2%
소독비	222	2.2%
연체료수익	215	2.1%
도서인쇄비	215	2.1%
보험료	214	2.1%
경비비	212	2.1%
세대전기료	211	2.1%
소모품비	211	2.1%
Other values (75)	7822	78.2%

Most occurring characters

Value	Count	Frequency (%)
비	5383	11.0%
수	3522	7.2%
료	2071	4.2%
익	2053	4.2%
용	1691	3.5%
기	1286	2.6%
대	1055	2.2%
리	860	1.8%
보	782	1.6%
험	741	1.5%
Other values (110)	29556	60.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	49000	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
비	5383	11.0%
수	3522	7.2%
료	2071	4.2%
익	2053	4.2%
용	1691	3.5%
기	1286	2.6%
대	1055	2.2%
리	860	1.8%
보	782	1.6%
험	741	1.5%
Other values (110)	29556	60.3%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	49000	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
비	5383	11.0%
수	3522	7.2%
료	2071	4.2%
익	2053	4.2%
용	1691	3.5%
기	1286	2.6%
대	1055	2.2%
리	860	1.8%
보	782	1.6%
험	741	1.5%
Other values (110)	29556	60.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	49000	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
비	5383	11.0%
수	3522	7.2%
료	2071	4.2%
익	2053	4.2%
용	1691	3.5%
기	1286	2.6%
대	1055	2.2%
리	860	1.8%
보	782	1.6%
험	741	1.5%
Other values (110)	29556	60.3%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202212	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202212
2nd row	202212
3rd row	202212
4th row	202212
5th row	202212

Common Values

Value	Count	Frequency (%)
202212	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202212	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7032
Distinct (%)	70.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	4224578.2

Minimum	-49368050
Maximum	7.8209429 × 10⁸
Zeros	1529
Zeros (%)	15.3%
Negative	22
Negative (%)	0.2%
Memory size	166.0 KiB

Quantile statistics

Minimum	-49368050
5-th percentile	0
Q1	49495
median	280000
Q3	1388295
95-th percentile	18873855
Maximum	7.8209429 × 10⁸
Range	8.3146234 × 10⁸
Interquartile range (IQR)	1338800

Descriptive statistics

Standard deviation	19587429
Coefficient of variation (CV)	4.6365408
Kurtosis	410.68106
Mean	4224578.2
Median Absolute Deviation (MAD)	280000
Skewness	15.90773
Sum	4.2245782 × 10¹⁰
Variance	3.8366739 × 10¹⁴
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	1529	15.3%
200000	80	0.8%
100000	66	0.7%
300000	52	0.5%
400000	32	0.3%
30000	31	0.3%
150000	31	0.3%
600000	31	0.3%
500000	30	0.3%
50000	27	0.3%
Other values (7022)	8091	80.9%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-49368050	1	< 0.1%
-26926030	1	< 0.1%
-13540546	1	< 0.1%
-2949560	1	< 0.1%
-1410454	1	< 0.1%
-601000	1	< 0.1%
-562820	1	< 0.1%
-517590	1	< 0.1%
-331890	1	< 0.1%
-315965	1	< 0.1%

Value	Count	Frequency (%)
782094290	1	< 0.1%
534926470	1	< 0.1%
447455018	1	< 0.1%
442813660	1	< 0.1%
307257130	1	< 0.1%
294496908	1	< 0.1%
285865605	1	< 0.1%
273556047	1	< 0.1%
266343970	1	< 0.1%
264398490	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.568
금액	0.568	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
1126	청년주택 와이엔타워	A10023990	검침수익	202212	116100
52201	장위참누리	A13614302	충당부채전입이자비용	202212	116284
9418	서울역한라비발디센트럴아파트	A10026517	사무용품비	202212	63690
60240	송파파인타운11단지	A13821002	주차장수익	202212	2289990
77401	광장현대파크빌	A14381516	공동가스료	202212	251120
60884	잠실우성4차	A13822902	주차장수익	202212	3488710
32391	방학한화성원	A13202306	검침수익	202212	144480
83261	신대림신동아파밀리에	A15095002	소독비	202212	230000
98997	가양2단지	A15780605	기타운영비용	202212	0
64593	상계주공10단지	A13920804	연차수당	202212	0

	아파트명	아파트코드	비용명	년월일	금액
56438	방배신동아	A13784907	급여	202212	18998390
95332	상도쌍용	A15683901	광고료수익	202212	585000
85325	우리유앤미	A15205001	청소비	202212	3174230
93360	사당유니드	A15609001	입주자대표회의운영비	202212	250000
70205	중계주공10단지	A13986004	세대수도료	202212	6870020
59367	가락상아1차	A13813004	주차장수익	202212	630000
82561	양평신동아아파트	A15086202	소모품비	202212	634220
86469	개봉삼호아파트	A15209202	소모품비	202212	308130
49746	돈암일신건영휴먼빌	A13606003	연체료수익	202212	14700
56619	방배2차현대홈타운	A13785201	퇴직급여	202212	1440000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample