gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2340 (23.4%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:57:51.553742
Analysis finished	2024-05-11 05:57:52.677064
Duration	1.12 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2249
Distinct (%)	22.5%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	21
Mean length	7.3888
Min length	2

Characters and Unicode

Total characters	73888
Distinct characters	433
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	149 ?
Unique (%)	1.5%

Sample

1st row	도곡삼성래미안
2nd row	강일리버파크7단지
3rd row	꿈의숲코오롱하늘채아파트
4th row	마곡수명산파크2단지
5th row	상도효성해링턴플레이스

Value	Count	Frequency (%)
아파트	176	1.6%
래미안	29	0.3%
e편한세상	26	0.2%
아이파크	23	0.2%
브라운스톤	17	0.2%
푸르지오	17	0.2%
경남아너스빌	16	0.1%
sk뷰	15	0.1%
북한산	14	0.1%
이편한세상	14	0.1%
Other values (2331)	10419	96.8%

Most occurring characters

Value	Count	Frequency (%)
파	2522	3.4%
아	2513	3.4%
트	2327	3.1%
지	1797	2.4%
대	1768	2.4%
동	1645	2.2%
차	1528	2.1%
신	1404	1.9%
단	1402	1.9%
이	1393	1.9%
Other values (423)	55589	75.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67565	91.4%
Decimal Number	3658	5.0%
Space Separator	868	1.2%
Uppercase Letter	829	1.1%
Lowercase Letter	393	0.5%
Open Punctuation	164	0.2%
Close Punctuation	164	0.2%
Dash Punctuation	141	0.2%
Other Punctuation	103	0.1%
Letter Number	3	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
파	2522	3.7%
아	2513	3.7%
트	2327	3.4%
지	1797	2.7%
대	1768	2.6%
동	1645	2.4%
차	1528	2.3%
신	1404	2.1%
단	1402	2.1%
이	1393	2.1%
Other values (378)	49266	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	132	15.9%
C	117	14.1%
K	105	12.7%
M	78	9.4%
D	78	9.4%
L	57	6.9%
H	51	6.2%
I	46	5.5%
E	40	4.8%
V	30	3.6%
Other values (7)	95	11.5%

Lowercase Letter

Value	Count	Frequency (%)
e	214	54.5%
l	35	8.9%
i	33	8.4%
s	28	7.1%
v	23	5.9%
k	20	5.1%
h	13	3.3%
w	13	3.3%
c	8	2.0%
a	3	0.8%

Decimal Number

Value	Count	Frequency (%)
1	1079	29.5%
2	1055	28.8%
3	503	13.8%
4	267	7.3%
5	195	5.3%
6	171	4.7%
7	129	3.5%
9	95	2.6%
8	92	2.5%
0	72	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	78	75.7%
.	25	24.3%

Space Separator

Value	Count	Frequency (%)
	868	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	164	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	164	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	141	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67565	91.4%
Common	5098	6.9%
Latin	1225	1.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
파	2522	3.7%
아	2513	3.7%
트	2327	3.4%
지	1797	2.7%
대	1768	2.6%
동	1645	2.4%
차	1528	2.3%
신	1404	2.1%
단	1402	2.1%
이	1393	2.1%
Other values (378)	49266	72.9%

Latin

Value	Count	Frequency (%)
e	214	17.5%
S	132	10.8%
C	117	9.6%
K	105	8.6%
M	78	6.4%
D	78	6.4%
L	57	4.7%
H	51	4.2%
I	46	3.8%
E	40	3.3%
Other values (19)	307	25.1%

Common

Value	Count	Frequency (%)
1	1079	21.2%
2	1055	20.7%
	868	17.0%
3	503	9.9%
4	267	5.2%
5	195	3.8%
6	171	3.4%
(	164	3.2%
)	164	3.2%
-	141	2.8%
Other values (6)	491	9.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67565	91.4%
ASCII	6320	8.6%
Number Forms	3	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
파	2522	3.7%
아	2513	3.7%
트	2327	3.4%
지	1797	2.7%
대	1768	2.6%
동	1645	2.4%
차	1528	2.3%
신	1404	2.1%
단	1402	2.1%
이	1393	2.1%
Other values (378)	49266	72.9%

ASCII

Value	Count	Frequency (%)
1	1079	17.1%
2	1055	16.7%
	868	13.7%
3	503	8.0%
4	267	4.2%
e	214	3.4%
5	195	3.1%
6	171	2.7%
(	164	2.6%
)	164	2.6%
Other values (34)	1640	25.9%

Number Forms

Value	Count	Frequency (%)
Ⅰ	3	100.0%

아파트코드
Text

Distinct	2255
Distinct (%)	22.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	151 ?
Unique (%)	1.5%

Sample

1st row	A13550502
2nd row	A13410010
3rd row	A10026571
4th row	A15728004
5th row	A10027472

Value	Count	Frequency (%)
a13986306	13	0.1%
a13204505	12	0.1%
a15679104	12	0.1%
a13789002	12	0.1%
a14003001	12	0.1%
a13820006	12	0.1%
a13776510	11	0.1%
a15009402	11	0.1%
a13905105	11	0.1%
a13811205	11	0.1%
Other values (2245)	9883	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18645	20.7%
1	17461	19.4%
A	9994	11.1%
3	8772	9.7%
2	8223	9.1%
5	6170	6.9%
8	5519	6.1%
7	4699	5.2%
4	4076	4.5%
6	3353	3.7%
Other values (2)	3088	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18645	23.3%
1	17461	21.8%
3	8772	11.0%
2	8223	10.3%
5	6170	7.7%
8	5519	6.9%
7	4699	5.9%
4	4076	5.1%
6	3353	4.2%
9	3082	3.9%

Uppercase Letter

Value	Count	Frequency (%)
A	9994	99.9%
B	6	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18645	23.3%
1	17461	21.8%
3	8772	11.0%
2	8223	10.3%
5	6170	7.7%
8	5519	6.9%
7	4699	5.9%
4	4076	5.1%
6	3353	4.2%
9	3082	3.9%

Latin

Value	Count	Frequency (%)
A	9994	99.9%
B	6	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18645	20.7%
1	17461	19.4%
A	9994	11.1%
3	8772	9.7%
2	8223	9.1%
5	6170	6.9%
8	5519	6.1%
7	4699	5.2%
4	4076	4.5%
6	3353	3.7%
Other values (2)	3088	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9475
Min length	2

Characters and Unicode

Total characters	59475
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	세대배부용비품
2nd row	선수난방비
3rd row	기타유형자산
4th row	연차수당충당부채
5th row	소프트웨어

Value	Count	Frequency (%)
예금	350	3.5%
당기순이익	326	3.3%
가수금	317	3.2%
미처분이익잉여금	316	3.2%
공동주택적립금	309	3.1%
관리비미수금	307	3.1%
선급비용	306	3.1%
수선유지비충당부채	302	3.0%
비품	298	3.0%
미부과관리비	297	3.0%
Other values (67)	6872	68.7%

Most occurring characters

Value	Count	Frequency (%)
금	4679	7.9%
당	3797	6.4%
수	3093	5.2%
비	3021	5.1%
충	2988	5.0%
부	2882	4.8%
채	2561	4.3%
기	2429	4.1%
선	1895	3.2%
예	1763	3.0%
Other values (97)	30367	51.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59475	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4679	7.9%
당	3797	6.4%
수	3093	5.2%
비	3021	5.1%
충	2988	5.0%
부	2882	4.8%
채	2561	4.3%
기	2429	4.1%
선	1895	3.2%
예	1763	3.0%
Other values (97)	30367	51.1%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59475	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4679	7.9%
당	3797	6.4%
수	3093	5.2%
비	3021	5.1%
충	2988	5.0%
부	2882	4.8%
채	2561	4.3%
기	2429	4.1%
선	1895	3.2%
예	1763	3.0%
Other values (97)	30367	51.1%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59475	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4679	7.9%
당	3797	6.4%
수	3093	5.2%
비	3021	5.1%
충	2988	5.0%
부	2882	4.8%
채	2561	4.3%
기	2429	4.1%
선	1895	3.2%
예	1763	3.0%
Other values (97)	30367	51.1%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202206	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202206
2nd row	202206
3rd row	202206
4th row	202206
5th row	202206

Common Values

Value	Count	Frequency (%)
202206	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202206	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7350
Distinct (%)	73.5%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	73552745

Minimum	-4.09024 × 10⁹
Maximum	9.0859515 × 10⁹
Zeros	2340
Zeros (%)	23.4%
Negative	348
Negative (%)	3.5%
Memory size	166.0 KiB

Quantile statistics

Minimum	-4.09024 × 10⁹
5-th percentile	0
Q1	0
median	3061335
Q3	34484036
95-th percentile	3.713755 × 10⁸
Maximum	9.0859515 × 10⁹
Range	1.3176191 × 10¹⁰
Interquartile range (IQR)	34484036

Descriptive statistics

Standard deviation	2.9125409 × 10⁸
Coefficient of variation (CV)	3.9597991
Kurtosis	221.3177
Mean	73552745
Median Absolute Deviation (MAD)	3061335
Skewness	11.07729
Sum	7.3552745 × 10¹¹
Variance	8.4828947 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2340	23.4%
250000	28	0.3%
500000	23	0.2%
300000	17	0.2%
484000	15	0.1%
242000	15	0.1%
1000000	10	0.1%
10000000	8	0.1%
2000000	7	0.1%
100000	7	0.1%
Other values (7340)	7530	75.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-4090240000	1	< 0.1%
-483769996	1	< 0.1%
-283117556	1	< 0.1%
-263942701	1	< 0.1%
-230922000	1	< 0.1%
-211155798	1	< 0.1%
-195908810	1	< 0.1%
-190422700	1	< 0.1%
-179333490	1	< 0.1%
-174771277	1	< 0.1%

Value	Count	Frequency (%)
9085951481	1	< 0.1%
7549587613	1	< 0.1%
6575495178	1	< 0.1%
5430910202	1	< 0.1%
4397584477	1	< 0.1%
4347672691	1	< 0.1%
4109028170	1	< 0.1%
4108913314	1	< 0.1%
4032492352	1	< 0.1%
3945671117	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.287
금액	0.287	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
31324	도곡삼성래미안	A13550502	세대배부용비품	202206	1639200
27546	강일리버파크7단지	A13410010	선수난방비	202206	0
5106	꿈의숲코오롱하늘채아파트	A10026571	기타유형자산	202206	0
66627	마곡수명산파크2단지	A15728004	연차수당충당부채	202206	4223430
6540	상도효성해링턴플레이스	A10027472	소프트웨어	202206	1071120
13431	신수현대	A12185603	저장품	202206	95700
42514	잠실현대	A13886701	상여충당부채	202206	0
40378	가락1차현대아파트	A13820004	장기수선충당예금	202206	510328385
1622	종암sh빌아파트	A10024603	관리비미수금	202206	759010
4880	송파호반베르디움더퍼스트	A10026362	퇴직급여충당부채	202206	23401980

	아파트명	아파트코드	비용명	년월일	금액
42180	송파파크데일1단지	A13881701	장기수선충당부채	202206	202946251
18988	신내6단지	A13176901	주차장충당예금	202206	12818669
56977	봉천은천1단지	A15106101	공동주택적립금	202206	78459615
57494	신림건영1차	A15185704	당기순이익	202206	11218179
3567	항동하버라인3단지	A10025614	장기수선충당예금	202206	81489513
70569	목동10단지	A15873701	기타공동주택관리비충당부채	202206	71572503
55636	보라매두산위브	A15086001	선수관리비	202206	32240000
23873	마장세림	A13305007	선급금	202206	73120
33071	삼선1SH-VILLE	A13604301	관리비예치금	202206	32841000
3320	항동하버라인2단지	A10025387	퇴직급여충당부채	202206	68960720

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample