gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2324 (23.2%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:55:45.000521
Analysis finished	2024-05-11 05:55:46.252271
Duration	1.25 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2118
Distinct (%)	21.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	21
Mean length	7.3526
Min length	2

Characters and Unicode

Total characters	73526
Distinct characters	432
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	106 ?
Unique (%)	1.1%

Sample

1st row	방배대우효령
2nd row	신내동성3차아파트
3rd row	보라매삼성쉐르빌
4th row	염창한화꿈에그린
5th row	신월수명산SK-VIEW

Value	Count	Frequency (%)
아파트	171	1.6%
래미안	35	0.3%
e편한세상	30	0.3%
아이파크	28	0.3%
sk뷰	22	0.2%
이편한세상	19	0.2%
답십리우성그린	16	0.1%
푸르지오	16	0.1%
경남아너스빌	13	0.1%
송파	13	0.1%
Other values (2200)	10448	96.6%

Most occurring characters

Value	Count	Frequency (%)
아	2452	3.3%
파	2446	3.3%
트	2360	3.2%
지	1784	2.4%
대	1709	2.3%
동	1609	2.2%
차	1507	2.0%
신	1465	2.0%
이	1403	1.9%
단	1369	1.9%
Other values (422)	55422	75.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67453	91.7%
Decimal Number	3554	4.8%
Space Separator	892	1.2%
Uppercase Letter	855	1.2%
Lowercase Letter	270	0.4%
Close Punctuation	137	0.2%
Open Punctuation	137	0.2%
Dash Punctuation	121	0.2%
Other Punctuation	101	0.1%
Letter Number	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2452	3.6%
파	2446	3.6%
트	2360	3.5%
지	1784	2.6%
대	1709	2.5%
동	1609	2.4%
차	1507	2.2%
신	1465	2.2%
이	1403	2.1%
단	1369	2.0%
Other values (377)	49349	73.2%

Uppercase Letter

Value	Count	Frequency (%)
S	157	18.4%
K	120	14.0%
C	109	12.7%
D	72	8.4%
M	72	8.4%
H	51	6.0%
L	50	5.8%
I	46	5.4%
E	44	5.1%
V	29	3.4%
Other values (7)	105	12.3%

Lowercase Letter

Value	Count	Frequency (%)
e	177	65.6%
s	19	7.0%
i	18	6.7%
k	16	5.9%
l	12	4.4%
w	11	4.1%
v	9	3.3%
h	4	1.5%
c	2	0.7%
g	1	0.4%

Decimal Number

Value	Count	Frequency (%)
1	1055	29.7%
2	1036	29.2%
3	473	13.3%
4	264	7.4%
5	195	5.5%
6	158	4.4%
8	105	3.0%
7	99	2.8%
9	97	2.7%
0	72	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	78	77.2%
.	23	22.8%

Space Separator

Value	Count	Frequency (%)
	892	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	137	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	137	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	121	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67453	91.7%
Common	4942	6.7%
Latin	1131	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2452	3.6%
파	2446	3.6%
트	2360	3.5%
지	1784	2.6%
대	1709	2.5%
동	1609	2.4%
차	1507	2.2%
신	1465	2.2%
이	1403	2.1%
단	1369	2.0%
Other values (377)	49349	73.2%

Latin

Value	Count	Frequency (%)
e	177	15.6%
S	157	13.9%
K	120	10.6%
C	109	9.6%
D	72	6.4%
M	72	6.4%
H	51	4.5%
L	50	4.4%
I	46	4.1%
E	44	3.9%
Other values (19)	233	20.6%

Common

Value	Count	Frequency (%)
1	1055	21.3%
2	1036	21.0%
	892	18.0%
3	473	9.6%
4	264	5.3%
5	195	3.9%
6	158	3.2%
)	137	2.8%
(	137	2.8%
-	121	2.4%
Other values (6)	474	9.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67453	91.7%
ASCII	6067	8.3%
Number Forms	6	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2452	3.6%
파	2446	3.6%
트	2360	3.5%
지	1784	2.6%
대	1709	2.5%
동	1609	2.4%
차	1507	2.2%
신	1465	2.2%
이	1403	2.1%
단	1369	2.0%
Other values (377)	49349	73.2%

ASCII

Value	Count	Frequency (%)
1	1055	17.4%
2	1036	17.1%
	892	14.7%
3	473	7.8%
4	264	4.4%
5	195	3.2%
e	177	2.9%
6	158	2.6%
S	157	2.6%
)	137	2.3%
Other values (34)	1523	25.1%

Number Forms

Value	Count	Frequency (%)
Ⅰ	6	100.0%

아파트코드
Text

Distinct	2122
Distinct (%)	21.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	106 ?
Unique (%)	1.1%

Sample

1st row	A13706303
2nd row	A13113004
3rd row	A15672002
4th row	A15786424
5th row	A15882201

Value	Count	Frequency (%)
a13003404	16	0.2%
a13681701	13	0.1%
a14072901	13	0.1%
a13984004	12	0.1%
a13822003	12	0.1%
a12182901	12	0.1%
a12010203	12	0.1%
a13824001	11	0.1%
a15205405	11	0.1%
a13983712	11	0.1%
Other values (2112)	9877	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18539	20.6%
1	17548	19.5%
A	9987	11.1%
3	8854	9.8%
2	8398	9.3%
5	6148	6.8%
8	5500	6.1%
7	4572	5.1%
4	4151	4.6%
6	3372	3.7%
Other values (2)	2931	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18539	23.2%
1	17548	21.9%
3	8854	11.1%
2	8398	10.5%
5	6148	7.7%
8	5500	6.9%
7	4572	5.7%
4	4151	5.2%
6	3372	4.2%
9	2918	3.6%

Uppercase Letter

Value	Count	Frequency (%)
A	9987	99.9%
B	13	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18539	23.2%
1	17548	21.9%
3	8854	11.1%
2	8398	10.5%
5	6148	7.7%
8	5500	6.9%
7	4572	5.7%
4	4151	5.2%
6	3372	4.2%
9	2918	3.6%

Latin

Value	Count	Frequency (%)
A	9987	99.9%
B	13	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18539	20.6%
1	17548	19.5%
A	9987	11.1%
3	8854	9.8%
2	8398	9.3%
5	6148	6.8%
8	5500	6.1%
7	4572	5.1%
4	4151	4.6%
6	3372	3.7%
Other values (2)	2931	3.3%

비용명
Text

Distinct	76
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9898
Min length	2

Characters and Unicode

Total characters	59898
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	퇴직급여충당부채
2nd row	선수관리비
3rd row	공동체활성화단체지원적립금
4th row	상여충당부채
5th row	퇴직급여충당부채

Value	Count	Frequency (%)
당기순이익	353	3.5%
미처분이익잉여금	340	3.4%
공동주택적립금	319	3.2%
선급비용	312	3.1%
관리비미수금	311	3.1%
연차수당충당부채	310	3.1%
퇴직급여충당부채	304	3.0%
장기수선충당부채	299	3.0%
예금	298	3.0%
예수금	298	3.0%
Other values (66)	6856	68.6%

Most occurring characters

Value	Count	Frequency (%)
금	4583	7.7%
당	3856	6.4%
수	3107	5.2%
충	3003	5.0%
비	2987	5.0%
부	2876	4.8%
채	2619	4.4%
기	2530	4.2%
선	1913	3.2%
예	1684	2.8%
Other values (97)	30740	51.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59898	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4583	7.7%
당	3856	6.4%
수	3107	5.2%
충	3003	5.0%
비	2987	5.0%
부	2876	4.8%
채	2619	4.4%
기	2530	4.2%
선	1913	3.2%
예	1684	2.8%
Other values (97)	30740	51.3%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59898	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4583	7.7%
당	3856	6.4%
수	3107	5.2%
충	3003	5.0%
비	2987	5.0%
부	2876	4.8%
채	2619	4.4%
기	2530	4.2%
선	1913	3.2%
예	1684	2.8%
Other values (97)	30740	51.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59898	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4583	7.7%
당	3856	6.4%
수	3107	5.2%
충	3003	5.0%
비	2987	5.0%
부	2876	4.8%
채	2619	4.4%
기	2530	4.2%
선	1913	3.2%
예	1684	2.8%
Other values (97)	30740	51.3%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202304	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202304
2nd row	202304
3rd row	202304
4th row	202304
5th row	202304

Common Values

Value	Count	Frequency (%)
202304	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202304	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7366
Distinct (%)	73.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	80351512

Minimum	-9.1137224 × 10⁸
Maximum	7.5041691 × 10⁹
Zeros	2324
Zeros (%)	23.2%
Negative	377
Negative (%)	3.8%
Memory size	166.0 KiB

Quantile statistics

Minimum	-9.1137224 × 10⁸
5-th percentile	0
Q1	0
median	3148610
Q3	37103310
95-th percentile	3.9892145 × 10⁸
Maximum	7.5041691 × 10⁹
Range	8.4155413 × 10⁹
Interquartile range (IQR)	37103310

Descriptive statistics

Standard deviation	2.9935315 × 10⁸
Coefficient of variation (CV)	3.7255446
Kurtosis	137.26621
Mean	80351512
Median Absolute Deviation (MAD)	3148610
Skewness	9.5049123
Sum	8.0351512 × 10¹¹
Variance	8.9612306 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2324	23.2%
250000	23	0.2%
500000	19	0.2%
300000	17	0.2%
484000	12	0.1%
242000	11	0.1%
5000000	10	0.1%
30000000	10	0.1%
1000000	10	0.1%
20000000	9	0.1%
Other values (7356)	7555	75.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-911372242	1	< 0.1%
-498514932	1	< 0.1%
-473885767	1	< 0.1%
-389001283	1	< 0.1%
-381131445	1	< 0.1%
-306567664	1	< 0.1%
-166462990	1	< 0.1%
-157879880	1	< 0.1%
-148374920	1	< 0.1%
-113445035	1	< 0.1%

Value	Count	Frequency (%)
7504169091	1	< 0.1%
7019180303	1	< 0.1%
5225770286	1	< 0.1%
4773892864	1	< 0.1%
4680055287	1	< 0.1%
4589417198	1	< 0.1%
4569925684	1	< 0.1%
4326345391	1	< 0.1%
4196283678	1	< 0.1%
3739007211	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.460
금액	0.460	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
35312	방배대우효령	A13706303	퇴직급여충당부채	202304	0
18364	신내동성3차아파트	A13113004	선수관리비	202304	137160000
60521	보라매삼성쉐르빌	A15672002	공동체활성화단체지원적립금	202304	1000000
64878	염창한화꿈에그린	A15786424	상여충당부채	202304	0
67289	신월수명산SK-VIEW	A15882201	퇴직급여충당부채	202304	37335920
65600	목동트윈빌	A15805502	주차장충당부채	202304	154916890
4713	래미안개포루체하임	A10025823	관리비예치금	202304	531820000
34919	석관중앙하이츠	A13681701	기타당좌자산	202304	0
45336	중계현대2차(4동)	A13985904	당기순이익	202304	17443339
16261	답십리동아	A13003406	비품감가상각누계액	202304	-5711310

	아파트명	아파트코드	비용명	년월일	금액
23094	도봉래미안	A13293505	선급비용	202304	14838680
61594	우장산롯데3차	A15701601	장기수선충당부채	202304	219703581
38034	거여1단지	A13811206	기타충당예금	202304	0
26395	신성둔촌미소지움1차	A13406205	기타당좌자산	202304	503000
51274	포레나 신길	A15005501	기타당좌자산	202304	615000
28679	청담대림	A13510006	공동주택적립금예금	202304	0
42185	동진신안	A13922907	가수금	202304	1000
45373	중계염광아름빌	A13985907	미부과관리비	202304	196577866
64887	염창현대1차	A15786426	비품	202304	9189900
19977	신내중앙하이츠	A13186907	기타당좌자산	202304	2197360

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample