gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2419 (24.2%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:55:31.279310
Analysis finished	2024-05-11 05:55:32.513206
Duration	1.23 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2260
Distinct (%)	22.6%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	28
Median length	20
Mean length	7.4382
Min length	2

Characters and Unicode

Total characters	74382
Distinct characters	434
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	116 ?
Unique (%)	1.2%

Sample

1st row	올림픽훼밀리타운
2nd row	금호두산
3rd row	안암삼익
4th row	도곡경남
5th row	도봉삼환

Value	Count	Frequency (%)
아파트	190	1.7%
래미안	42	0.4%
아이파크	34	0.3%
e편한세상	30	0.3%
sk뷰	21	0.2%
경남아너스빌	20	0.2%
고덕	15	0.1%
이편한세상	15	0.1%
래미안밤섬리베뉴	14	0.1%
백련산	14	0.1%
Other values (2345)	10469	96.4%

Most occurring characters

Value	Count	Frequency (%)
아	2564	3.4%
파	2553	3.4%
트	2356	3.2%
지	1898	2.6%
대	1645	2.2%
동	1574	2.1%
단	1500	2.0%
신	1467	2.0%
차	1455	2.0%
이	1438	1.9%
Other values (424)	55932	75.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	67982	91.4%
Decimal Number	3721	5.0%
Space Separator	966	1.3%
Uppercase Letter	832	1.1%
Lowercase Letter	344	0.5%
Close Punctuation	149	0.2%
Open Punctuation	149	0.2%
Dash Punctuation	132	0.2%
Other Punctuation	100	0.1%
Letter Number	7	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2564	3.8%
파	2553	3.8%
트	2356	3.5%
지	1898	2.8%
대	1645	2.4%
동	1574	2.3%
단	1500	2.2%
신	1467	2.2%
차	1455	2.1%
이	1438	2.1%
Other values (379)	49532	72.9%

Uppercase Letter

Value	Count	Frequency (%)
S	135	16.2%
C	120	14.4%
K	100	12.0%
M	87	10.5%
D	87	10.5%
L	54	6.5%
H	54	6.5%
I	45	5.4%
E	37	4.4%
G	26	3.1%
Other values (7)	87	10.5%

Lowercase Letter

Value	Count	Frequency (%)
e	179	52.0%
l	32	9.3%
s	29	8.4%
i	29	8.4%
v	23	6.7%
k	21	6.1%
h	10	2.9%
w	9	2.6%
g	4	1.2%
c	4	1.2%

Decimal Number

Value	Count	Frequency (%)
1	1091	29.3%
2	1065	28.6%
3	509	13.7%
4	256	6.9%
5	226	6.1%
6	164	4.4%
7	123	3.3%
8	101	2.7%
9	98	2.6%
0	88	2.4%

Other Punctuation

Value	Count	Frequency (%)
,	78	78.0%
.	22	22.0%

Space Separator

Value	Count	Frequency (%)
	966	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	149	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	149	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	132	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	7	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	67982	91.4%
Common	5217	7.0%
Latin	1183	1.6%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2564	3.8%
파	2553	3.8%
트	2356	3.5%
지	1898	2.8%
대	1645	2.4%
동	1574	2.3%
단	1500	2.2%
신	1467	2.2%
차	1455	2.1%
이	1438	2.1%
Other values (379)	49532	72.9%

Latin

Value	Count	Frequency (%)
e	179	15.1%
S	135	11.4%
C	120	10.1%
K	100	8.5%
M	87	7.4%
D	87	7.4%
L	54	4.6%
H	54	4.6%
I	45	3.8%
E	37	3.1%
Other values (19)	285	24.1%

Common

Value	Count	Frequency (%)
1	1091	20.9%
2	1065	20.4%
	966	18.5%
3	509	9.8%
4	256	4.9%
5	226	4.3%
6	164	3.1%
)	149	2.9%
(	149	2.9%
-	132	2.5%
Other values (6)	510	9.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	67982	91.4%
ASCII	6393	8.6%
Number Forms	7	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2564	3.8%
파	2553	3.8%
트	2356	3.5%
지	1898	2.8%
대	1645	2.4%
동	1574	2.3%
단	1500	2.2%
신	1467	2.2%
차	1455	2.1%
이	1438	2.1%
Other values (379)	49532	72.9%

ASCII

Value	Count	Frequency (%)
1	1091	17.1%
2	1065	16.7%
	966	15.1%
3	509	8.0%
4	256	4.0%
5	226	3.5%
e	179	2.8%
6	164	2.6%
)	149	2.3%
(	149	2.3%
Other values (34)	1639	25.6%

Number Forms

Value	Count	Frequency (%)
Ⅰ	7	100.0%

아파트코드
Text

Distinct	2265
Distinct (%)	22.7%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	117 ?
Unique (%)	1.2%

Sample

1st row	A13820201
2nd row	A13380703
3rd row	A13607301
4th row	A13527008
5th row	A13201207

Value	Count	Frequency (%)
a12009304	13	0.1%
a15885514	13	0.1%
a12119004	13	0.1%
a15884703	12	0.1%
a15205301	12	0.1%
a41279923	12	0.1%
a15005001	12	0.1%
a13676101	11	0.1%
a15106001	11	0.1%
a14277601	11	0.1%
Other values (2255)	9880	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18543	20.6%
1	17536	19.5%
A	9989	11.1%
3	8810	9.8%
2	8287	9.2%
5	6207	6.9%
8	5483	6.1%
7	4654	5.2%
4	3946	4.4%
6	3356	3.7%
Other values (2)	3189	3.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18543	23.2%
1	17536	21.9%
3	8810	11.0%
2	8287	10.4%
5	6207	7.8%
8	5483	6.9%
7	4654	5.8%
4	3946	4.9%
6	3356	4.2%
9	3178	4.0%

Uppercase Letter

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18543	23.2%
1	17536	21.9%
3	8810	11.0%
2	8287	10.4%
5	6207	7.8%
8	5483	6.9%
7	4654	5.8%
4	3946	4.9%
6	3356	4.2%
9	3178	4.0%

Latin

Value	Count	Frequency (%)
A	9989	99.9%
B	11	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18543	20.6%
1	17536	19.5%
A	9989	11.1%
3	8810	9.8%
2	8287	9.2%
5	6207	6.9%
8	5483	6.1%
7	4654	5.2%
4	3946	4.4%
6	3356	3.7%
Other values (2)	3189	3.5%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	6.0126
Min length	2

Characters and Unicode

Total characters	60126
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	일반관리비충당부채
2nd row	기타당좌자산
3rd row	비품
4th row	저장품
5th row	기타인건비충당부채

Value	Count	Frequency (%)
퇴직급여충당부채	331	3.3%
미처분이익잉여금	312	3.1%
선급비용	308	3.1%
공동주택적립금	307	3.1%
예수금	304	3.0%
당기순이익	303	3.0%
장기수선충당부채	301	3.0%
미부과관리비	301	3.0%
예금	300	3.0%
가수금	299	3.0%
Other values (67)	6934	69.3%

Most occurring characters

Value	Count	Frequency (%)
금	4610	7.7%
당	3792	6.3%
비	3073	5.1%
수	3020	5.0%
충	3018	5.0%
부	2945	4.9%
채	2627	4.4%
기	2478	4.1%
선	1861	3.1%
예	1765	2.9%
Other values (97)	30937	51.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60126	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4610	7.7%
당	3792	6.3%
비	3073	5.1%
수	3020	5.0%
충	3018	5.0%
부	2945	4.9%
채	2627	4.4%
기	2478	4.1%
선	1861	3.1%
예	1765	2.9%
Other values (97)	30937	51.5%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60126	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4610	7.7%
당	3792	6.3%
비	3073	5.1%
수	3020	5.0%
충	3018	5.0%
부	2945	4.9%
채	2627	4.4%
기	2478	4.1%
선	1861	3.1%
예	1765	2.9%
Other values (97)	30937	51.5%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60126	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4610	7.7%
당	3792	6.3%
비	3073	5.1%
수	3020	5.0%
충	3018	5.0%
부	2945	4.9%
채	2627	4.4%
기	2478	4.1%
선	1861	3.1%
예	1765	2.9%
Other values (97)	30937	51.5%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202302	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202302
2nd row	202302
3rd row	202302
4th row	202302
5th row	202302

Common Values

Value	Count	Frequency (%)
202302	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202302	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7274
Distinct (%)	72.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	79846222

Minimum	-3.8900128 × 10⁸
Maximum	9.1331628 × 10⁹
Zeros	2419
Zeros (%)	24.2%
Negative	366
Negative (%)	3.7%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.8900128 × 10⁸
5-th percentile	0
Q1	0
median	2820777
Q3	37285740
95-th percentile	3.8858736 × 10⁸
Maximum	9.1331628 × 10⁹
Range	9.5221641 × 10⁹
Interquartile range (IQR)	37285740

Descriptive statistics

Standard deviation	3.2030266 × 10⁸
Coefficient of variation (CV)	4.0114943
Kurtosis	217.63251
Mean	79846222
Median Absolute Deviation (MAD)	2820777
Skewness	11.966017
Sum	7.9846222 × 10¹¹
Variance	1.025938 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2419	24.2%
250000	25	0.2%
500000	23	0.2%
300000	21	0.2%
1000000	14	0.1%
242000	10	0.1%
5000000	10	0.1%
20000000	10	0.1%
200000	10	0.1%
2000000	9	0.1%
Other values (7264)	7449	74.5%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-389001283	1	< 0.1%
-320228510	1	< 0.1%
-270960260	1	< 0.1%
-267511299	1	< 0.1%
-265018532	1	< 0.1%
-225095971	1	< 0.1%
-203175750	1	< 0.1%
-195908810	1	< 0.1%
-182920300	1	< 0.1%
-178315838	1	< 0.1%

Value	Count	Frequency (%)
9133162807	1	< 0.1%
8393858825	1	< 0.1%
7134274724	1	< 0.1%
6784507682	1	< 0.1%
5560181409	1	< 0.1%
5404984250	1	< 0.1%
5147081652	1	< 0.1%
4763166020	1	< 0.1%
4720150243	1	< 0.1%
4698630343	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.448
금액	0.448	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
41014	올림픽훼밀리타운	A13820201	일반관리비충당부채	202302	0
26226	금호두산	A13380703	기타당좌자산	202302	0
34106	안암삼익	A13607301	비품	202302	5806100
31402	도곡경남	A13527008	저장품	202302	200200
21424	도봉삼환	A13201207	기타인건비충당부채	202302	0
21748	방학신동아1단지	A13202312	선수수도료	202302	0
49028	상계한양	A13994302	미지급금	202302	163908150
69331	화곡대림아파트	A15788302	장기수선충당예금	202302	289800634
70529	신정동일하이빌	A15807315	비품	202302	49128600
2779	이편한세상서울대입구2차(5단지)	A10024894	수선유지비충당부채	202302	21748380

	아파트명	아파트코드	비용명	년월일	금액
34656	길음삼부	A13611004	기타의비유동부채	202302	1231200
22051	쌍문금호1차아파트	A13203408	선수수도료	202302	0
35393	길음서희스타힐스	A13613012	단기대여금	202302	20399055
33683	삼선1SH-VILLE	A13604301	예수금	202302	743960
11166	신촌럭키	A12017001	선급금	202302	649940
68323	등촌삼성한사랑	A15783905	예금	202302	109278650
3818	백련산 sk뷰 아이파크	A10025310	저장품	202302	957000
28555	고덕현대아파트	A13478601	공동주택적립금	202302	118950404
55798	대림우성	A15081503	연차수당충당부채	202302	8949886
62728	신대방경남아너스빌	A15601103	장기수선충당예금	202302	1034530827

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Close Punctuation

Open Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample