gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2093 (20.9%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:00:32.860594
Analysis finished	2024-05-11 06:00:33.981728
Duration	1.12 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2170
Distinct (%)	21.7%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	24
Median length	21
Mean length	7.226
Min length	2

Characters and Unicode

Total characters	72260
Distinct characters	432
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	110 ?
Unique (%)	1.1%

Sample

1st row	아크로리버뷰 신반포
2nd row	치현마을동일스위트리버아파트
3rd row	삼각산아이원임대
4th row	방학우성2차
5th row	현대성우

Value	Count	Frequency (%)
아파트	115	1.1%
래미안	31	0.3%
힐스테이트	20	0.2%
입주자대표회의	15	0.1%
신동아파밀리에	14	0.1%
북한산	14	0.1%
고덕	13	0.1%
창동금용	13	0.1%
은평뉴타운상림마을6단지	13	0.1%
마곡수명산파크1단지	12	0.1%
Other values (2231)	10268	97.5%

Most occurring characters

Value	Count	Frequency (%)
아	2230	3.1%
파	2151	3.0%
트	1949	2.7%
대	1871	2.6%
지	1851	2.6%
동	1618	2.2%
차	1538	2.1%
신	1473	2.0%
단	1449	2.0%
이	1299	1.8%
Other values (422)	54831	75.9%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66151	91.5%
Decimal Number	3843	5.3%
Uppercase Letter	747	1.0%
Space Separator	580	0.8%
Lowercase Letter	355	0.5%
Dash Punctuation	154	0.2%
Close Punctuation	143	0.2%
Open Punctuation	143	0.2%
Other Punctuation	132	0.2%
Letter Number	8	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2230	3.4%
파	2151	3.3%
트	1949	2.9%
대	1871	2.8%
지	1851	2.8%
동	1618	2.4%
차	1538	2.3%
신	1473	2.2%
단	1449	2.2%
이	1299	2.0%
Other values (376)	48722	73.7%

Uppercase Letter

Value	Count	Frequency (%)
S	110	14.7%
C	102	13.7%
K	98	13.1%
M	61	8.2%
D	61	8.2%
L	56	7.5%
I	42	5.6%
H	38	5.1%
E	32	4.3%
G	31	4.1%
Other values (7)	116	15.5%

Lowercase Letter

Value	Count	Frequency (%)
e	211	59.4%
l	34	9.6%
i	29	8.2%
v	20	5.6%
s	15	4.2%
k	13	3.7%
c	10	2.8%
w	8	2.3%
h	7	2.0%
a	4	1.1%

Decimal Number

Value	Count	Frequency (%)
1	1222	31.8%
2	1068	27.8%
3	515	13.4%
4	266	6.9%
5	218	5.7%
6	169	4.4%
7	117	3.0%
0	97	2.5%
9	91	2.4%
8	80	2.1%

Other Punctuation

Value	Count	Frequency (%)
,	107	81.1%
.	25	18.9%

Space Separator

Value	Count	Frequency (%)
	580	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	154	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	143	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	143	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	8	100.0%

Math Symbol

Value	Count	Frequency (%)
~	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66151	91.5%
Common	4999	6.9%
Latin	1110	1.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2230	3.4%
파	2151	3.3%
트	1949	2.9%
대	1871	2.8%
지	1851	2.8%
동	1618	2.4%
차	1538	2.3%
신	1473	2.2%
단	1449	2.2%
이	1299	2.0%
Other values (376)	48722	73.7%

Latin

Value	Count	Frequency (%)
e	211	19.0%
S	110	9.9%
C	102	9.2%
K	98	8.8%
M	61	5.5%
D	61	5.5%
L	56	5.0%
I	42	3.8%
H	38	3.4%
l	34	3.1%
Other values (19)	297	26.8%

Common

Value	Count	Frequency (%)
1	1222	24.4%
2	1068	21.4%
	580	11.6%
3	515	10.3%
4	266	5.3%
5	218	4.4%
6	169	3.4%
-	154	3.1%
)	143	2.9%
(	143	2.9%
Other values (7)	521	10.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66151	91.5%
ASCII	6101	8.4%
Number Forms	8	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2230	3.4%
파	2151	3.3%
트	1949	2.9%
대	1871	2.8%
지	1851	2.8%
동	1618	2.4%
차	1538	2.3%
신	1473	2.2%
단	1449	2.2%
이	1299	2.0%
Other values (376)	48722	73.7%

ASCII

Value	Count	Frequency (%)
1	1222	20.0%
2	1068	17.5%
	580	9.5%
3	515	8.4%
4	266	4.4%
5	218	3.6%
e	211	3.5%
6	169	2.8%
-	154	2.5%
)	143	2.3%
Other values (35)	1555	25.5%

Number Forms

Value	Count	Frequency (%)
Ⅰ	8	100.0%

아파트코드
Text

Distinct	2176
Distinct (%)	21.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	111 ?
Unique (%)	1.1%

Sample

1st row	A10026227
2nd row	A15722304
3rd row	A14210001
4th row	A13282510
5th row	A14281701

Value	Count	Frequency (%)
a13204201	13	0.1%
a15728008	12	0.1%
a13611006	11	0.1%
a41279932	11	0.1%
a13078701	11	0.1%
a13302204	11	0.1%
a13528103	11	0.1%
a13309402	11	0.1%
a13812004	11	0.1%
a12220005	11	0.1%
Other values (2166)	9887	98.9%

Most occurring characters

Value	Count	Frequency (%)
0	18300	20.3%
1	17643	19.6%
A	9991	11.1%
3	9010	10.0%
2	8265	9.2%
5	6128	6.8%
8	5658	6.3%
7	4845	5.4%
4	3811	4.2%
6	3309	3.7%
Other values (2)	3040	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18300	22.9%
1	17643	22.1%
3	9010	11.3%
2	8265	10.3%
5	6128	7.7%
8	5658	7.1%
7	4845	6.1%
4	3811	4.8%
6	3309	4.1%
9	3031	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9991	99.9%
B	9	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18300	22.9%
1	17643	22.1%
3	9010	11.3%
2	8265	10.3%
5	6128	7.7%
8	5658	7.1%
7	4845	6.1%
4	3811	4.8%
6	3309	4.1%
9	3031	3.8%

Latin

Value	Count	Frequency (%)
A	9991	99.9%
B	9	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18300	20.3%
1	17643	19.6%
A	9991	11.1%
3	9010	10.0%
2	8265	9.2%
5	6128	6.8%
8	5658	6.3%
7	4845	5.4%
4	3811	4.2%
6	3309	3.7%
Other values (2)	3040	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	6.0008
Min length	2

Characters and Unicode

Total characters	60008
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	1 ?
Unique (%)	< 0.1%

Sample

1st row	미처분이익잉여금
2nd row	상여충당부채
3rd row	가수금
4th row	비품
5th row	기타충당부채

Value	Count	Frequency (%)
관리비미수금	340	3.4%
미처분이익잉여금	334	3.3%
선급비용	332	3.3%
퇴직급여충당부채	326	3.3%
장기수선충당부채	308	3.1%
당기순이익	305	3.0%
예금	299	3.0%
미부과관리비	298	3.0%
연차수당충당부채	294	2.9%
예수금	293	2.9%
Other values (67)	6871	68.7%

Most occurring characters

Value	Count	Frequency (%)
금	4687	7.8%
당	3747	6.2%
수	3187	5.3%
비	3052	5.1%
충	3044	5.1%
부	2966	4.9%
채	2650	4.4%
기	2356	3.9%
선	1941	3.2%
급	1706	2.8%
Other values (97)	30672	51.1%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	60008	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4687	7.8%
당	3747	6.2%
수	3187	5.3%
비	3052	5.1%
충	3044	5.1%
부	2966	4.9%
채	2650	4.4%
기	2356	3.9%
선	1941	3.2%
급	1706	2.8%
Other values (97)	30672	51.1%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	60008	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4687	7.8%
당	3747	6.2%
수	3187	5.3%
비	3052	5.1%
충	3044	5.1%
부	2966	4.9%
채	2650	4.4%
기	2356	3.9%
선	1941	3.2%
급	1706	2.8%
Other values (97)	30672	51.1%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	60008	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4687	7.8%
당	3747	6.2%
수	3187	5.3%
비	3052	5.1%
충	3044	5.1%
부	2966	4.9%
채	2650	4.4%
기	2356	3.9%
선	1941	3.2%
급	1706	2.8%
Other values (97)	30672	51.1%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202003	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202003
2nd row	202003
3rd row	202003
4th row	202003
5th row	202003

Common Values

Value	Count	Frequency (%)
202003	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202003	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7576
Distinct (%)	75.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	68755997

Minimum	-2.7179173 × 10⁸
Maximum	8.9543179 × 10⁹
Zeros	2093
Zeros (%)	20.9%
Negative	338
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-2.7179173 × 10⁸
5-th percentile	0
Q1	7020
median	3253005
Q3	34539869
95-th percentile	3.38867 × 10⁸
Maximum	8.9543179 × 10⁹
Range	9.2261096 × 10⁹
Interquartile range (IQR)	34532849

Descriptive statistics

Standard deviation	2.7535831 × 10⁸
Coefficient of variation (CV)	4.0048624
Kurtosis	361.92368
Mean	68755997
Median Absolute Deviation (MAD)	3253005
Skewness	14.868295
Sum	6.8755997 × 10¹¹
Variance	7.5822197 × 10¹⁶
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2093	20.9%
250000	25	0.2%
500000	22	0.2%
2000000	19	0.2%
242000	17	0.2%
3000000	14	0.1%
484000	13	0.1%
1000000	13	0.1%
300000	13	0.1%
30000000	8	0.1%
Other values (7566)	7763	77.6%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-271791731	1	< 0.1%
-240552040	1	< 0.1%
-137360880	1	< 0.1%
-94069330	1	< 0.1%
-88304416	1	< 0.1%
-86645150	1	< 0.1%
-83223130	1	< 0.1%
-78679350	1	< 0.1%
-78673037	1	< 0.1%
-73095100	1	< 0.1%

Value	Count	Frequency (%)
8954317866	1	< 0.1%
8880669168	1	< 0.1%
8003008457	1	< 0.1%
6503858315	1	< 0.1%
5907120472	1	< 0.1%
4789802542	1	< 0.1%
3757061363	1	< 0.1%
3467278462	1	< 0.1%
3000016676	1	< 0.1%
2849974319	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.384
금액	0.384	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
2011	아크로리버뷰 신반포	A10026227	미처분이익잉여금	202003	0
62635	치현마을동일스위트리버아파트	A15722304	상여충당부채	202003	0
46711	삼각산아이원임대	A14210001	가수금	202003	5419802
19684	방학우성2차	A13282510	비품	202003	0
47291	현대성우	A14281701	기타충당부채	202003	0
66940	목동1단지	A15875101	비품	202003	55782045
63884	등촌라인	A15783806	예수금	202003	1426219
30060	돈암범양	A13606102	선수관리비	202003	79455000
19365	방학동부센트레빌	A13272102	장기수선충당부채	202003	261435643
58493	보라매코오롱하늘채	A15602002	경비비충당부채	202003	1439410

	아파트명	아파트코드	비용명	년월일	금액
9671	마포래미안푸르지오	A12175203	수선유지비충당부채	202003	0
7746	연희성원	A12071101	비품감가상각누계액	202003	-14151650
31458	동일하이빌뉴시티	A13613011	기타유형자산	202003	2561500
38293	가락대림아파트	A13880204	당기순이익	202003	7108743
34625	방배1차현대	A13785203	장기수선충당부채	202003	382472336
61337	등촌8단지주공아파트	A15703301	미수수익	202003	2720
6072	명륜아남1차	A11052201	미수금	202003	2966090
5023	용두 롯데 캐슬리치 아파트	A10028080	장기수선충당부채	202003	160166324
27921	래미안대치하이스턴	A13528007	미수금	202003	0
47443	화양현대	A14313001	선수관리비	202003	43536000

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Close Punctuation

Open Punctuation

Letter Number

Math Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample