gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 35.10582981)	Skewed
`금액` has 2249 (22.5%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:47.347414
Analysis finished	2024-05-11 05:59:48.456150
Duration	1.11 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2208
Distinct (%)	22.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.2989
Min length	2

Characters and Unicode

Total characters	72989
Distinct characters	437
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	124 ?
Unique (%)	1.2%

Sample

1st row	공덕삼성임대
2nd row	번동신원
3rd row	온수힐스테이트
4th row	행당대림제2
5th row	목동현대아이파크

Value	Count	Frequency (%)
아파트	159	1.5%
래미안	25	0.2%
아이파크	25	0.2%
e편한세상	16	0.1%
sk뷰	16	0.1%
신반포	15	0.1%
고덕	14	0.1%
북한산	14	0.1%
2단지	13	0.1%
힐스테이트	13	0.1%
Other values (2276)	10359	97.1%

Most occurring characters

Value	Count	Frequency (%)
아	2464	3.4%
파	2374	3.3%
트	2201	3.0%
지	1846	2.5%
대	1844	2.5%
동	1689	2.3%
차	1503	2.1%
신	1491	2.0%
단	1442	2.0%
이	1348	1.8%
Other values (427)	54787	75.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66915	91.7%
Decimal Number	3712	5.1%
Space Separator	751	1.0%
Uppercase Letter	728	1.0%
Lowercase Letter	349	0.5%
Open Punctuation	143	0.2%
Close Punctuation	143	0.2%
Dash Punctuation	128	0.2%
Other Punctuation	112	0.2%
Letter Number	4	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2464	3.7%
파	2374	3.5%
트	2201	3.3%
지	1846	2.8%
대	1844	2.8%
동	1689	2.5%
차	1503	2.2%
신	1491	2.2%
단	1442	2.2%
이	1348	2.0%
Other values (381)	48713	72.8%

Uppercase Letter

Value	Count	Frequency (%)
S	120	16.5%
C	99	13.6%
K	95	13.0%
M	66	9.1%
D	66	9.1%
L	53	7.3%
H	38	5.2%
I	38	5.2%
G	37	5.1%
E	23	3.2%
Other values (7)	93	12.8%

Lowercase Letter

Value	Count	Frequency (%)
e	174	49.9%
l	42	12.0%
i	34	9.7%
s	24	6.9%
v	23	6.6%
k	15	4.3%
h	12	3.4%
w	7	2.0%
a	6	1.7%
c	6	1.7%

Decimal Number

Value	Count	Frequency (%)
1	1132	30.5%
2	1067	28.7%
3	496	13.4%
5	249	6.7%
4	237	6.4%
6	163	4.4%
7	119	3.2%
8	86	2.3%
9	84	2.3%
0	79	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	92	82.1%
.	20	17.9%

Space Separator

Value	Count	Frequency (%)
	751	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	143	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	143	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	128	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	4	100.0%

Math Symbol

Value	Count	Frequency (%)
~	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66915	91.7%
Common	4993	6.8%
Latin	1081	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2464	3.7%
파	2374	3.5%
트	2201	3.3%
지	1846	2.8%
대	1844	2.8%
동	1689	2.5%
차	1503	2.2%
신	1491	2.2%
단	1442	2.2%
이	1348	2.0%
Other values (381)	48713	72.8%

Latin

Value	Count	Frequency (%)
e	174	16.1%
S	120	11.1%
C	99	9.2%
K	95	8.8%
M	66	6.1%
D	66	6.1%
L	53	4.9%
l	42	3.9%
H	38	3.5%
I	38	3.5%
Other values (19)	290	26.8%

Common

Value	Count	Frequency (%)
1	1132	22.7%
2	1067	21.4%
	751	15.0%
3	496	9.9%
5	249	5.0%
4	237	4.7%
6	163	3.3%
(	143	2.9%
)	143	2.9%
-	128	2.6%
Other values (7)	484	9.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66915	91.7%
ASCII	6070	8.3%
Number Forms	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2464	3.7%
파	2374	3.5%
트	2201	3.3%
지	1846	2.8%
대	1844	2.8%
동	1689	2.5%
차	1503	2.2%
신	1491	2.2%
단	1442	2.2%
이	1348	2.0%
Other values (381)	48713	72.8%

ASCII

Value	Count	Frequency (%)
1	1132	18.6%
2	1067	17.6%
	751	12.4%
3	496	8.2%
5	249	4.1%
4	237	3.9%
e	174	2.9%
6	163	2.7%
(	143	2.4%
)	143	2.4%
Other values (35)	1515	25.0%

Number Forms

Value	Count	Frequency (%)
Ⅰ	4	100.0%

아파트코드
Text

Distinct	2216
Distinct (%)	22.2%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	124 ?
Unique (%)	1.2%

Sample

1st row	A12180404
2nd row	A14206306
3rd row	A15279101
4th row	A13377902
5th row	A15805102

Value	Count	Frequency (%)
a15210209	13	0.1%
a10027105	12	0.1%
a13985201	12	0.1%
a41279902	11	0.1%
a15284906	11	0.1%
a13386702	11	0.1%
a13986302	11	0.1%
a13790701	11	0.1%
a13981006	11	0.1%
a13380803	11	0.1%
Other values (2206)	9886	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18393	20.4%
1	17558	19.5%
A	9985	11.1%
3	8922	9.9%
2	8179	9.1%
5	6211	6.9%
8	5746	6.4%
7	4744	5.3%
4	3853	4.3%
6	3444	3.8%
Other values (2)	2965	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18393	23.0%
1	17558	21.9%
3	8922	11.2%
2	8179	10.2%
5	6211	7.8%
8	5746	7.2%
7	4744	5.9%
4	3853	4.8%
6	3444	4.3%
9	2950	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9985	99.9%
B	15	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18393	23.0%
1	17558	21.9%
3	8922	11.2%
2	8179	10.2%
5	6211	7.8%
8	5746	7.2%
7	4744	5.9%
4	3853	4.8%
6	3444	4.3%
9	2950	3.7%

Latin

Value	Count	Frequency (%)
A	9985	99.9%
B	15	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18393	20.4%
1	17558	19.5%
A	9985	11.1%
3	8922	9.9%
2	8179	9.1%
5	6211	6.9%
8	5746	6.4%
7	4744	5.3%
4	3853	4.3%
6	3444	3.8%
Other values (2)	2965	3.3%

비용명
Text

Distinct	76
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9971
Min length	2

Characters and Unicode

Total characters	59971
Distinct characters	106
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	예금
2nd row	미지급금
3rd row	비품
4th row	기타유동부채
5th row	선수전기료

Value	Count	Frequency (%)
퇴직급여충당부채	336	3.4%
당기순이익	332	3.3%
미처분이익잉여금	324	3.2%
비품	316	3.2%
관리비미수금	311	3.1%
예수금	308	3.1%
장기수선충당예금	308	3.1%
공동주택적립금	305	3.0%
예금	302	3.0%
장기수선충당부채	300	3.0%
Other values (66)	6858	68.6%

Most occurring characters

Value	Count	Frequency (%)
금	4635	7.7%
당	3793	6.3%
충	3099	5.2%
수	3066	5.1%
비	3025	5.0%
부	2970	5.0%
채	2681	4.5%
기	2417	4.0%
선	1868	3.1%
예	1767	2.9%
Other values (96)	30650	51.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59971	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4635	7.7%
당	3793	6.3%
충	3099	5.2%
수	3066	5.1%
비	3025	5.0%
부	2970	5.0%
채	2681	4.5%
기	2417	4.0%
선	1868	3.1%
예	1767	2.9%
Other values (96)	30650	51.1%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59971	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4635	7.7%
당	3793	6.3%
충	3099	5.2%
수	3066	5.1%
비	3025	5.0%
부	2970	5.0%
채	2681	4.5%
기	2417	4.0%
선	1868	3.1%
예	1767	2.9%
Other values (96)	30650	51.1%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59971	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4635	7.7%
당	3793	6.3%
충	3099	5.2%
수	3066	5.1%
비	3025	5.0%
부	2970	5.0%
채	2681	4.5%
기	2417	4.0%
선	1868	3.1%
예	1767	2.9%
Other values (96)	30650	51.1%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202010	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202010
2nd row	202010
3rd row	202010
4th row	202010
5th row	202010

Common Values

Value	Count	Frequency (%)
202010	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202010	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7420
Distinct (%)	74.2%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	73639584

Minimum	-3.7738628 × 10⁸
Maximum	2.1615869 × 10¹⁰
Zeros	2249
Zeros (%)	22.5%
Negative	329
Negative (%)	3.3%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.7738628 × 10⁸
5-th percentile	0
Q1	0
median	3678608
Q3	38513282
95-th percentile	3.4265629 × 10⁸
Maximum	2.1615869 × 10¹⁰
Range	2.1993255 × 10¹⁰
Interquartile range (IQR)	38513282

Descriptive statistics

Standard deviation	3.9628152 × 10⁸
Coefficient of variation (CV)	5.3813656
Kurtosis	1775.2319
Mean	73639584
Median Absolute Deviation (MAD)	3678608
Skewness	35.10583
Sum	7.3639584 × 10¹¹
Variance	1.5703904 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2249	22.5%
500000	30	0.3%
250000	21	0.2%
300000	17	0.2%
1000000	15	0.1%
242000	14	0.1%
200000	14	0.1%
30000000	9	0.1%
484000	9	0.1%
3000000	8	0.1%
Other values (7410)	7614	76.1%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-377386276	1	< 0.1%
-354885698	1	< 0.1%
-244322324	1	< 0.1%
-196126890	1	< 0.1%
-184997700	1	< 0.1%
-161866980	1	< 0.1%
-151802932	1	< 0.1%
-138881815	1	< 0.1%
-131194420	1	< 0.1%
-120561290	1	< 0.1%

Value	Count	Frequency (%)
21615869006	1	< 0.1%
21537672006	1	< 0.1%
8961727613	1	< 0.1%
5564067547	1	< 0.1%
5264091580	1	< 0.1%
4931323810	1	< 0.1%
4523353887	1	< 0.1%
4076152578	1	< 0.1%
3868711988	1	< 0.1%
3488531881	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.153
금액	0.153	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
11016	공덕삼성임대	A12180404	예금	202010	49069104
47280	번동신원	A14206306	미지급금	202010	13946290
56093	온수힐스테이트	A15279101	비품	202010	39355080
23018	행당대림제2	A13377902	기타유동부채	202010	0
65736	목동현대아이파크	A15805102	선수전기료	202010	1020774
20634	방학벽산2차	A13283405	공동체활성화단체지원적립금	202010	500000
30154	역삼래미안	A13592706	기타시설운영충당부채	202010	0
7935	문화촌현대	A12009305	기타유동부채	202010	26780
28279	우성캐릭터199 아파트	A13527003	미부과관리비	202010	113162375
30298	대청	A13594007	주차장충당부채	202010	0

	아파트명	아파트코드	비용명	년월일	금액
36207	잠원한신그린	A13790701	선급비용	202010	2863930
63101	방화동부센트레빌	A15722108	미부과관리비	202010	85268580
3620	힐스테이트 백련산4차 아파트	A10026834	전신전화가입권	202010	180000
26988	삼성롯데	A13509007	상여충당부채	202010	0
54941	신도림태영타운	A15205513	미처분이익잉여금	202010	0
33763	잠원동아	A13703027	저장품	202010	1655600
57211	구로현대상선	A15286802	공동주택적립금	202010	301133
4642	상도파크자이 아파트	A10027424	비품	202010	28722970
2011	신정이든채	A10025649	연차수당충당부채	202010	4384970
26176	강동현대홈타운	A13485301	당기순이익	202010	67694554

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample