gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2084 (20.8%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:01:44.694491
Analysis finished	2024-05-11 06:01:46.227522
Duration	1.53 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2109
Distinct (%)	21.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.1287
Min length	2

Characters and Unicode

Total characters	71287
Distinct characters	430
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	104 ?
Unique (%)	1.0%

Sample

1st row	염창동아3차
2nd row	한남하이츠
3rd row	목동5단지
4th row	구로한일유엔아이
5th row	래미안서초유니빌

Value	Count	Frequency (%)
아파트	101	1.0%
래미안	26	0.2%
왕십리	14	0.1%
올림픽파크한양수자인	14	0.1%
우리유앤미	13	0.1%
힐스테이트	13	0.1%
은평뉴타운상림마을6단지	13	0.1%
대치동부센트레빌	12	0.1%
송천센트레빌	12	0.1%
경남아너스빌	12	0.1%
Other values (2163)	10194	97.8%

Most occurring characters

Value	Count	Frequency (%)
아	2209	3.1%
파	2070	2.9%
트	1906	2.7%
지	1872	2.6%
대	1859	2.6%
동	1720	2.4%
차	1568	2.2%
단	1494	2.1%
신	1486	2.1%
성	1391	2.0%
Other values (420)	53712	75.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	65504	91.9%
Decimal Number	3925	5.5%
Uppercase Letter	596	0.8%
Space Separator	451	0.6%
Lowercase Letter	295	0.4%
Dash Punctuation	147	0.2%
Close Punctuation	128	0.2%
Open Punctuation	128	0.2%
Other Punctuation	107	0.2%
Letter Number	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2209	3.4%
파	2070	3.2%
트	1906	2.9%
지	1872	2.9%
대	1859	2.8%
동	1720	2.6%
차	1568	2.4%
단	1494	2.3%
신	1486	2.3%
성	1391	2.1%
Other values (374)	47929	73.2%

Uppercase Letter

Value	Count	Frequency (%)
S	109	18.3%
K	80	13.4%
L	58	9.7%
C	52	8.7%
H	47	7.9%
I	37	6.2%
E	34	5.7%
G	32	5.4%
M	28	4.7%
D	28	4.7%
Other values (7)	91	15.3%

Lowercase Letter

Value	Count	Frequency (%)
e	183	62.0%
l	32	10.8%
i	25	8.5%
v	19	6.4%
c	8	2.7%
k	7	2.4%
s	6	2.0%
w	6	2.0%
h	3	1.0%
a	3	1.0%

Decimal Number

Value	Count	Frequency (%)
2	1185	30.2%
1	1174	29.9%
3	484	12.3%
4	268	6.8%
5	220	5.6%
6	166	4.2%
7	126	3.2%
0	115	2.9%
8	101	2.6%
9	86	2.2%

Other Punctuation

Value	Count	Frequency (%)
,	88	82.2%
.	19	17.8%

Space Separator

Value	Count	Frequency (%)
	451	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	147	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	128	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	128	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	3	100.0%

Math Symbol

Value	Count	Frequency (%)
~	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	65504	91.9%
Common	4889	6.9%
Latin	894	1.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2209	3.4%
파	2070	3.2%
트	1906	2.9%
지	1872	2.9%
대	1859	2.8%
동	1720	2.6%
차	1568	2.4%
단	1494	2.3%
신	1486	2.3%
성	1391	2.1%
Other values (374)	47929	73.2%

Latin

Value	Count	Frequency (%)
e	183	20.5%
S	109	12.2%
K	80	8.9%
L	58	6.5%
C	52	5.8%
H	47	5.3%
I	37	4.1%
E	34	3.8%
l	32	3.6%
G	32	3.6%
Other values (19)	230	25.7%

Common

Value	Count	Frequency (%)
2	1185	24.2%
1	1174	24.0%
3	484	9.9%
	451	9.2%
4	268	5.5%
5	220	4.5%
6	166	3.4%
-	147	3.0%
)	128	2.6%
(	128	2.6%
Other values (7)	538	11.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	65504	91.9%
ASCII	5780	8.1%
Number Forms	3	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2209	3.4%
파	2070	3.2%
트	1906	2.9%
지	1872	2.9%
대	1859	2.8%
동	1720	2.6%
차	1568	2.4%
단	1494	2.3%
신	1486	2.3%
성	1391	2.1%
Other values (374)	47929	73.2%

ASCII

Value	Count	Frequency (%)
2	1185	20.5%
1	1174	20.3%
3	484	8.4%
	451	7.8%
4	268	4.6%
5	220	3.8%
e	183	3.2%
6	166	2.9%
-	147	2.5%
)	128	2.2%
Other values (35)	1374	23.8%

Number Forms

Value	Count	Frequency (%)
Ⅰ	3	100.0%

아파트코드
Text

Distinct	2116
Distinct (%)	21.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	104 ?
Unique (%)	1.0%

Sample

1st row	A15786227
2nd row	A13375901
3rd row	A15805504
4th row	A15205104
5th row	A13707010

Value	Count	Frequency (%)
a10027354	14	0.1%
a14272313	12	0.1%
a13184401	12	0.1%
a13822004	12	0.1%
a15606007	12	0.1%
a13528103	12	0.1%
a13481305	11	0.1%
a13922114	11	0.1%
a15884703	11	0.1%
a15681106	11	0.1%
Other values (2106)	9882	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18184	20.2%
1	17675	19.6%
A	9988	11.1%
3	8985	10.0%
2	7940	8.8%
5	6287	7.0%
8	5756	6.4%
7	4813	5.3%
4	3878	4.3%
6	3444	3.8%
Other values (2)	3050	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18184	22.7%
1	17675	22.1%
3	8985	11.2%
2	7940	9.9%
5	6287	7.9%
8	5756	7.2%
7	4813	6.0%
4	3878	4.8%
6	3444	4.3%
9	3038	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9988	99.9%
B	12	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18184	22.7%
1	17675	22.1%
3	8985	11.2%
2	7940	9.9%
5	6287	7.9%
8	5756	7.2%
7	4813	6.0%
4	3878	4.8%
6	3444	4.3%
9	3038	3.8%

Latin

Value	Count	Frequency (%)
A	9988	99.9%
B	12	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18184	20.2%
1	17675	19.6%
A	9988	11.1%
3	8985	10.0%
2	7940	8.8%
5	6287	7.0%
8	5756	6.4%
7	4813	5.3%
4	3878	4.3%
6	3444	3.8%
Other values (2)	3050	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.976
Min length	2

Characters and Unicode

Total characters	59760
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	연차수당충당부채
2nd row	장기수선충당예금
3rd row	기타시설운영충당부채
4th row	세대배부용비품
5th row	주차장충당부채

Value	Count	Frequency (%)
예금	339	3.4%
퇴직급여충당부채	330	3.3%
미처분이익잉여금	327	3.3%
예수금	326	3.3%
관리비미수금	323	3.2%
공동주택적립금	320	3.2%
연차수당충당부채	319	3.2%
선급비용	316	3.2%
장기수선충당예금	308	3.1%
당기순이익	299	3.0%
Other values (67)	6793	67.9%

Most occurring characters

Value	Count	Frequency (%)
금	4777	8.0%
당	3764	6.3%
수	3194	5.3%
충	3065	5.1%
비	2935	4.9%
부	2920	4.9%
채	2626	4.4%
기	2318	3.9%
선	1870	3.1%
예	1831	3.1%
Other values (97)	30460	51.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59760	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4777	8.0%
당	3764	6.3%
수	3194	5.3%
충	3065	5.1%
비	2935	4.9%
부	2920	4.9%
채	2626	4.4%
기	2318	3.9%
선	1870	3.1%
예	1831	3.1%
Other values (97)	30460	51.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59760	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4777	8.0%
당	3764	6.3%
수	3194	5.3%
충	3065	5.1%
비	2935	4.9%
부	2920	4.9%
채	2626	4.4%
기	2318	3.9%
선	1870	3.1%
예	1831	3.1%
Other values (97)	30460	51.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59760	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4777	8.0%
당	3764	6.3%
수	3194	5.3%
충	3065	5.1%
비	2935	4.9%
부	2920	4.9%
채	2626	4.4%
기	2318	3.9%
선	1870	3.1%
예	1831	3.1%
Other values (97)	30460	51.0%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

201905	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	201905
2nd row	201905
3rd row	201905
4th row	201905
5th row	201905

Common Values

Value	Count	Frequency (%)
201905	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
201905	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7558
Distinct (%)	75.6%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	74584700

Minimum	-4.09024 × 10⁹
Maximum	9.6344252 × 10⁹
Zeros	2084
Zeros (%)	20.8%
Negative	321
Negative (%)	3.2%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.09024 × 10⁹
5-th percentile	0
Q1	14970
median	3483560
Q3	32630885
95-th percentile	3.5725721 × 10⁸
Maximum	9.6344252 × 10⁹
Range	1.3724665 × 10¹⁰
Interquartile range (IQR)	32615915

Descriptive statistics

Standard deviation	3.1935374 × 10⁸
Coefficient of variation (CV)	4.2817594
Kurtosis	259.34772
Mean	74584700
Median Absolute Deviation (MAD)	3483560
Skewness	12.614355
Sum	7.45847 × 10¹¹
Variance	1.0198681 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2084	20.8%
500000	29	0.3%
250000	27	0.3%
1000000	16	0.2%
20000000	14	0.1%
10000000	12	0.1%
300000	11	0.1%
200000	10	0.1%
484000	10	0.1%
5000000	10	0.1%
Other values (7548)	7777	77.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-4090240000	1	< 0.1%
-626368591	1	< 0.1%
-262574210	1	< 0.1%
-219951880	1	< 0.1%
-161481980	1	< 0.1%
-134212500	1	< 0.1%
-132342706	1	< 0.1%
-124188940	1	< 0.1%
-122789896	1	< 0.1%
-120098530	1	< 0.1%

Value	Count	Frequency (%)
9634425246	1	< 0.1%
8689507436	1	< 0.1%
8398272240	1	< 0.1%
6014828955	1	< 0.1%
5785899932	1	< 0.1%
5364257365	1	< 0.1%
5189197570	1	< 0.1%
5054589351	1	< 0.1%
4739809247	1	< 0.1%
4653135158	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.658
금액	0.658	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
61585	염창동아3차	A15786227	연차수당충당부채	201905	9839940
19775	한남하이츠	A13375901	장기수선충당예금	201905	1922408794
62661	목동5단지	A15805504	기타시설운영충당부채	201905	0
51492	구로한일유엔아이	A15205104	세대배부용비품	201905	22000
31313	래미안서초유니빌	A13707010	주차장충당부채	201905	3138653
44583	미아현대	A14272307	수선유지비충당부채	201905	0
7112	토정한강삼성	A12106001	미부과관리비	201905	88461466
38952	공릉화랑타운	A13980010	미수금	201905	0
36428	거여현대2차	A13881401	미처분이익잉여금	201905	0
7969	공덕래미안5차	A12170603	장기수선충당부채적립금	201905	0

	아파트명	아파트코드	비용명	년월일	금액
26076	래미안대치하이스턴	A13528007	미부과관리비	201905	86018600
12395	래미안허브리츠	A13070301	상여충당부채	201905	0
9697	응암금호	A12201102	선급비용	201905	3753880
23383	강동현대홈타운	A13485301	미처분이익잉여금	201905	16614240
11502	래미안위브	A13003007	기타시설운영충당부채	201905	417606964
38642	상계미도	A13971501	선급금	201905	3190
7939	공덕래미안5차	A12170603	미수관리비예치금	201905	384000
39636	공릉대주파크빌	A13980706	비품	201905	6160700
45542	자양우성7차	A14319311	퇴직급여충당예금	201905	72881855
50082	삼성산주공3단지	A15101506	비품	201905	39446000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Close Punctuation

Open Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample