gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2052 (20.5%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:02:07.657790
Analysis finished	2024-05-11 06:02:08.957185
Duration	1.3 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2105
Distinct (%)	21.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.1509
Min length	2

Characters and Unicode

Total characters	71509
Distinct characters	431
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	104 ?
Unique (%)	1.0%

Sample

1st row	여의도화랑
2nd row	거여2단지(동아효성)
3rd row	제기현대
4th row	정릉풍림아이원
5th row	목동현대아파트

Value	Count	Frequency (%)
아파트	107	1.0%
래미안	26	0.2%
입주자대표회의	22	0.2%
강변힐스테이트	14	0.1%
힐스테이트	14	0.1%
상봉건영캐스빌	13	0.1%
신도림현대	12	0.1%
역삼2차아이파크	12	0.1%
원효산호	12	0.1%
대림코오롱	11	0.1%
Other values (2161)	10211	97.7%

Most occurring characters

Value	Count	Frequency (%)
아	2134	3.0%
파	2066	2.9%
대	1964	2.7%
지	1863	2.6%
트	1855	2.6%
동	1659	2.3%
차	1547	2.2%
신	1513	2.1%
단	1482	2.1%
성	1360	1.9%
Other values (421)	54066	75.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	65574	91.7%
Decimal Number	3916	5.5%
Uppercase Letter	666	0.9%
Space Separator	498	0.7%
Lowercase Letter	300	0.4%
Open Punctuation	146	0.2%
Close Punctuation	146	0.2%
Dash Punctuation	126	0.2%
Other Punctuation	126	0.2%
Letter Number	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2134	3.3%
파	2066	3.2%
대	1964	3.0%
지	1863	2.8%
트	1855	2.8%
동	1659	2.5%
차	1547	2.4%
신	1513	2.3%
단	1482	2.3%
성	1360	2.1%
Other values (375)	48131	73.4%

Uppercase Letter

Value	Count	Frequency (%)
S	119	17.9%
K	85	12.8%
C	66	9.9%
L	57	8.6%
H	56	8.4%
M	40	6.0%
D	40	6.0%
I	37	5.6%
E	35	5.3%
G	30	4.5%
Other values (7)	101	15.2%

Lowercase Letter

Value	Count	Frequency (%)
e	173	57.7%
l	36	12.0%
i	33	11.0%
v	21	7.0%
w	8	2.7%
a	7	2.3%
g	7	2.3%
s	6	2.0%
k	4	1.3%
h	3	1.0%

Decimal Number

Value	Count	Frequency (%)
1	1240	31.7%
2	1160	29.6%
3	530	13.5%
4	241	6.2%
5	195	5.0%
6	154	3.9%
7	111	2.8%
9	102	2.6%
0	93	2.4%
8	90	2.3%

Other Punctuation

Value	Count	Frequency (%)
,	108	85.7%
.	18	14.3%

Space Separator

Value	Count	Frequency (%)
	498	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	146	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	146	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	126	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	6	100.0%

Math Symbol

Value	Count	Frequency (%)
~	5	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	65574	91.7%
Common	4963	6.9%
Latin	972	1.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2134	3.3%
파	2066	3.2%
대	1964	3.0%
지	1863	2.8%
트	1855	2.8%
동	1659	2.5%
차	1547	2.4%
신	1513	2.3%
단	1482	2.3%
성	1360	2.1%
Other values (375)	48131	73.4%

Latin

Value	Count	Frequency (%)
e	173	17.8%
S	119	12.2%
K	85	8.7%
C	66	6.8%
L	57	5.9%
H	56	5.8%
M	40	4.1%
D	40	4.1%
I	37	3.8%
l	36	3.7%
Other values (19)	263	27.1%

Common

Value	Count	Frequency (%)
1	1240	25.0%
2	1160	23.4%
3	530	10.7%
	498	10.0%
4	241	4.9%
5	195	3.9%
6	154	3.1%
(	146	2.9%
)	146	2.9%
-	126	2.5%
Other values (7)	527	10.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	65574	91.7%
ASCII	5929	8.3%
Number Forms	6	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2134	3.3%
파	2066	3.2%
대	1964	3.0%
지	1863	2.8%
트	1855	2.8%
동	1659	2.5%
차	1547	2.4%
신	1513	2.3%
단	1482	2.3%
성	1360	2.1%
Other values (375)	48131	73.4%

ASCII

Value	Count	Frequency (%)
1	1240	20.9%
2	1160	19.6%
3	530	8.9%
	498	8.4%
4	241	4.1%
5	195	3.3%
e	173	2.9%
6	154	2.6%
(	146	2.5%
)	146	2.5%
Other values (35)	1446	24.4%

Number Forms

Value	Count	Frequency (%)
Ⅰ	6	100.0%

아파트코드
Text

Distinct	2111
Distinct (%)	21.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	104 ?
Unique (%)	1.0%

Sample

1st row	A15088802
2nd row	A13811202
3rd row	A13006002
4th row	A13610007
5th row	A15807211

Value	Count	Frequency (%)
a15704023	14	0.1%
a13122001	13	0.1%
a14085002	12	0.1%
a13579503	12	0.1%
a14085501	11	0.1%
a12013202	11	0.1%
a13508011	11	0.1%
a13407104	11	0.1%
a12187501	11	0.1%
a15081105	11	0.1%
Other values (2101)	9883	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18127	20.1%
1	17619	19.6%
A	9993	11.1%
3	9228	10.3%
2	8082	9.0%
5	6177	6.9%
8	5766	6.4%
7	4781	5.3%
4	3907	4.3%
6	3268	3.6%
Other values (2)	3052	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18127	22.7%
1	17619	22.0%
3	9228	11.5%
2	8082	10.1%
5	6177	7.7%
8	5766	7.2%
7	4781	6.0%
4	3907	4.9%
6	3268	4.1%
9	3045	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18127	22.7%
1	17619	22.0%
3	9228	11.5%
2	8082	10.1%
5	6177	7.7%
8	5766	7.2%
7	4781	6.0%
4	3907	4.9%
6	3268	4.1%
9	3045	3.8%

Latin

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18127	20.1%
1	17619	19.6%
A	9993	11.1%
3	9228	10.3%
2	8082	9.0%
5	6177	6.9%
8	5766	6.4%
7	4781	5.3%
4	3907	4.3%
6	3268	3.6%
Other values (2)	3052	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9382
Min length	2

Characters and Unicode

Total characters	59382
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	정화조관리비충당부채
2nd row	예수금
3rd row	현금
4th row	공동주택적립금
5th row	장기수선충당부채

Value	Count	Frequency (%)
당기순이익	339	3.4%
장기수선충당예금	336	3.4%
관리비미수금	335	3.4%
비품	331	3.3%
예금	326	3.3%
현금	323	3.2%
장기수선충당부채	316	3.2%
예수금	309	3.1%
연차수당충당부채	304	3.0%
선급비용	301	3.0%
Other values (67)	6780	67.8%

Most occurring characters

Value	Count	Frequency (%)
금	4738	8.0%
당	3817	6.4%
수	3177	5.4%
충	3107	5.2%
부	2905	4.9%
비	2884	4.9%
채	2617	4.4%
기	2406	4.1%
선	1901	3.2%
예	1807	3.0%
Other values (97)	30023	50.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59382	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4738	8.0%
당	3817	6.4%
수	3177	5.4%
충	3107	5.2%
부	2905	4.9%
비	2884	4.9%
채	2617	4.4%
기	2406	4.1%
선	1901	3.2%
예	1807	3.0%
Other values (97)	30023	50.6%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59382	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4738	8.0%
당	3817	6.4%
수	3177	5.4%
충	3107	5.2%
부	2905	4.9%
비	2884	4.9%
채	2617	4.4%
기	2406	4.1%
선	1901	3.2%
예	1807	3.0%
Other values (97)	30023	50.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59382	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4738	8.0%
당	3817	6.4%
수	3177	5.4%
충	3107	5.2%
부	2905	4.9%
비	2884	4.9%
채	2617	4.4%
기	2406	4.1%
선	1901	3.2%
예	1807	3.0%
Other values (97)	30023	50.6%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

201902	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	201902
2nd row	201902
3rd row	201902
4th row	201902
5th row	201902

Common Values

Value	Count	Frequency (%)
201902	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
201902	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7602
Distinct (%)	76.0%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	77495400

Minimum	-6.0342072 × 10⁸
Maximum	8.665062 × 10⁹
Zeros	2052
Zeros (%)	20.5%
Negative	349
Negative (%)	3.5%
Memory size	166.0 KiB

Quantile statistics

Minimum	-6.0342072 × 10⁸
5-th percentile	0
Q1	19792.5
median	3418405
Q3	34157213
95-th percentile	3.7300637 × 10⁸
Maximum	8.665062 × 10⁹
Range	9.2684827 × 10⁹
Interquartile range (IQR)	34137421

Descriptive statistics

Standard deviation	2.9751704 × 10⁸
Coefficient of variation (CV)	3.8391575
Kurtosis	197.00838
Mean	77495400
Median Absolute Deviation (MAD)	3418405
Skewness	11.023574
Sum	7.74954 × 10¹¹
Variance	8.8516391 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2052	20.5%
250000	28	0.3%
500000	23	0.2%
242000	18	0.2%
484000	17	0.2%
20000000	13	0.1%
200000	12	0.1%
30000000	11	0.1%
10000000	11	0.1%
300000	10	0.1%
Other values (7592)	7805	78.0%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-603420717	1	< 0.1%
-388641505	1	< 0.1%
-302145700	1	< 0.1%
-282000000	1	< 0.1%
-269035194	1	< 0.1%
-242139904	1	< 0.1%
-168377396	1	< 0.1%
-147921070	1	< 0.1%
-134212500	1	< 0.1%
-119103910	1	< 0.1%

Value	Count	Frequency (%)
8665061963	1	< 0.1%
7889746955	1	< 0.1%
6218827702	1	< 0.1%
5823064344	1	< 0.1%
4981704436	1	< 0.1%
4838592241	1	< 0.1%
4302804346	1	< 0.1%
3952505624	1	< 0.1%
3815214169	1	< 0.1%
3749981958	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.537
금액	0.537	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
49531	여의도화랑	A15088802	정화조관리비충당부채	201902	552000
33899	거여2단지(동아효성)	A13811202	예수금	201902	1557480
11604	제기현대	A13006002	현금	201902	173540
28541	정릉풍림아이원	A13610007	공동주택적립금	201902	102263893
62915	목동현대아파트	A15807211	장기수선충당부채	201902	1164230539
38836	청솔아파트8	A13980004	장기수선충당예금	201902	410832467
11029	뉴신사신성	A12289401	퇴직급여충당부채	201902	30053522
65101	은평뉴타운박석고개제12단지아파트	A41279911	미수금	201902	7370000
19147	신금호두산위브	A13309101	공동주택적립금	201902	-1603010
6235	DMC센트레빌	A12072801	비품감가상각누계액	201902	-16347880

	아파트명	아파트코드	비용명	년월일	금액
41303	중계현대2차(4동)	A13985904	현금	201902	69948
15774	신내새한아파트	A13187406	현금	201902	365720
14660	면목늘푸른동아아파트	A13183504	선급금	201902	178380
27527	대청	A13594007	주차장충당부채	201902	108451191
22421	고덕리엔파크2단지	A13410011	장기수선충당예금	201902	131880614
1374	서초푸르지오써밋	A10026941	기타충당부채	201902	550000
34939	송파파인타운7단지	A13821004	공동체활성화단체지원적립금	201902	0
26094	대치동부센트레빌	A13528103	수선유지비충당부채	201902	121690134
33599	신반포한신2차	A13790929	현금	201902	94532
36391	거여우방	A13881601	단기보증금	201902	750000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample