gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` is highly skewed (γ1 = 27.55818781)	Skewed
`금액` has 2245 (22.4%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 05:59:08.595582
Analysis finished	2024-05-11 05:59:09.541305
Duration	0.95 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2230
Distinct (%)	22.3%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	21
Median length	19
Mean length	7.2722
Min length	2

Characters and Unicode

Total characters	72722
Distinct characters	437
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	144 ?
Unique (%)	1.4%

Sample

1st row	상암월드컵1단지
2nd row	노원 센트럴푸르지오
3rd row	가락삼익맨션
4th row	묵동금호어울림
5th row	신길남서울

Value	Count	Frequency (%)
아파트	143	1.3%
래미안	26	0.2%
아이파크	17	0.2%
래미안수유	16	0.2%
래미안밤섬리베뉴	14	0.1%
e편한세상	14	0.1%
신도림현대	14	0.1%
고덕	14	0.1%
신반포	13	0.1%
경남아너스빌	13	0.1%
Other values (2298)	10321	97.3%

Most occurring characters

Value	Count	Frequency (%)
아	2530	3.5%
파	2440	3.4%
트	2244	3.1%
대	1811	2.5%
지	1794	2.5%
동	1661	2.3%
차	1512	2.1%
신	1443	2.0%
단	1408	1.9%
성	1309	1.8%
Other values (427)	54570	75.0%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	66726	91.8%
Decimal Number	3684	5.1%
Uppercase Letter	724	1.0%
Space Separator	685	0.9%
Lowercase Letter	318	0.4%
Open Punctuation	156	0.2%
Close Punctuation	156	0.2%
Dash Punctuation	136	0.2%
Other Punctuation	131	0.2%
Letter Number	6	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2530	3.8%
파	2440	3.7%
트	2244	3.4%
대	1811	2.7%
지	1794	2.7%
동	1661	2.5%
차	1512	2.3%
신	1443	2.2%
단	1408	2.1%
성	1309	2.0%
Other values (382)	48574	72.8%

Uppercase Letter

Value	Count	Frequency (%)
S	133	18.4%
K	91	12.6%
C	85	11.7%
L	69	9.5%
H	55	7.6%
M	52	7.2%
D	52	7.2%
I	38	5.2%
E	34	4.7%
G	26	3.6%
Other values (7)	89	12.3%

Lowercase Letter

Value	Count	Frequency (%)
e	202	63.5%
l	26	8.2%
s	21	6.6%
i	19	6.0%
v	14	4.4%
h	12	3.8%
k	11	3.5%
c	4	1.3%
a	3	0.9%
w	3	0.9%

Decimal Number

Value	Count	Frequency (%)
1	1098	29.8%
2	1097	29.8%
3	493	13.4%
4	242	6.6%
5	210	5.7%
6	152	4.1%
7	125	3.4%
9	104	2.8%
8	89	2.4%
0	74	2.0%

Other Punctuation

Value	Count	Frequency (%)
,	107	81.7%
.	24	18.3%

Space Separator

Value	Count	Frequency (%)
	685	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	156	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	156	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	136	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	6	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	66726	91.8%
Common	4948	6.8%
Latin	1048	1.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2530	3.8%
파	2440	3.7%
트	2244	3.4%
대	1811	2.7%
지	1794	2.7%
동	1661	2.5%
차	1512	2.3%
신	1443	2.2%
단	1408	2.1%
성	1309	2.0%
Other values (382)	48574	72.8%

Latin

Value	Count	Frequency (%)
e	202	19.3%
S	133	12.7%
K	91	8.7%
C	85	8.1%
L	69	6.6%
H	55	5.2%
M	52	5.0%
D	52	5.0%
I	38	3.6%
E	34	3.2%
Other values (19)	237	22.6%

Common

Value	Count	Frequency (%)
1	1098	22.2%
2	1097	22.2%
	685	13.8%
3	493	10.0%
4	242	4.9%
5	210	4.2%
(	156	3.2%
)	156	3.2%
6	152	3.1%
-	136	2.7%
Other values (6)	523	10.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	66726	91.8%
ASCII	5990	8.2%
Number Forms	6	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2530	3.8%
파	2440	3.7%
트	2244	3.4%
대	1811	2.7%
지	1794	2.7%
동	1661	2.5%
차	1512	2.3%
신	1443	2.2%
단	1408	2.1%
성	1309	2.0%
Other values (382)	48574	72.8%

ASCII

Value	Count	Frequency (%)
1	1098	18.3%
2	1097	18.3%
	685	11.4%
3	493	8.2%
4	242	4.0%
5	210	3.5%
e	202	3.4%
(	156	2.6%
)	156	2.6%
6	152	2.5%
Other values (34)	1499	25.0%

Number Forms

Value	Count	Frequency (%)
Ⅰ	6	100.0%

아파트코드
Text

Distinct	2235
Distinct (%)	22.4%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	144 ?
Unique (%)	1.4%

Sample

1st row	A12127007
2nd row	A10025133
3rd row	A13885306
4th row	A13114103
5th row	A15085805

Value	Count	Frequency (%)
a14207202	16	0.2%
a13986306	13	0.1%
a13707016	12	0.1%
a12119004	12	0.1%
a13282510	12	0.1%
a13704404	12	0.1%
a12287204	11	0.1%
a13410001	11	0.1%
a12012203	11	0.1%
a15277302	11	0.1%
Other values (2225)	9879	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18396	20.4%
1	17600	19.6%
A	9995	11.1%
3	8892	9.9%
2	8229	9.1%
5	6266	7.0%
8	5674	6.3%
7	4770	5.3%
4	3924	4.4%
6	3299	3.7%
Other values (2)	2955	3.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18396	23.0%
1	17600	22.0%
3	8892	11.1%
2	8229	10.3%
5	6266	7.8%
8	5674	7.1%
7	4770	6.0%
4	3924	4.9%
6	3299	4.1%
9	2950	3.7%

Uppercase Letter

Value	Count	Frequency (%)
A	9995	> 99.9%
B	5	< 0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18396	23.0%
1	17600	22.0%
3	8892	11.1%
2	8229	10.3%
5	6266	7.8%
8	5674	7.1%
7	4770	6.0%
4	3924	4.9%
6	3299	4.1%
9	2950	3.7%

Latin

Value	Count	Frequency (%)
A	9995	> 99.9%
B	5	< 0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18396	20.4%
1	17600	19.6%
A	9995	11.1%
3	8892	9.9%
2	8229	9.1%
5	6266	7.0%
8	5674	6.3%
7	4770	5.3%
4	3924	4.4%
6	3299	3.7%
Other values (2)	2955	3.3%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	10
Mean length	5.9783
Min length	2

Characters and Unicode

Total characters	59783
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	미수금
2nd row	단기보증금
3rd row	청소비충당부채
4th row	비품감가상각누계액
5th row	장기수선충당예금

Value	Count	Frequency (%)
장기수선충당부채	326	3.3%
미처분이익잉여금	325	3.2%
예금	319	3.2%
연차수당충당부채	315	3.1%
비품	304	3.0%
예수금	298	3.0%
수선유지비충당부채	294	2.9%
퇴직급여충당부채	293	2.9%
공동주택적립금	293	2.9%
관리비미수금	290	2.9%
Other values (67)	6943	69.4%

Most occurring characters

Value	Count	Frequency (%)
금	4615	7.7%
당	3884	6.5%
수	3263	5.5%
충	3089	5.2%
비	3055	5.1%
부	3007	5.0%
채	2706	4.5%
기	2451	4.1%
선	1949	3.3%
예	1759	2.9%
Other values (97)	30005	50.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59783	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4615	7.7%
당	3884	6.5%
수	3263	5.5%
충	3089	5.2%
비	3055	5.1%
부	3007	5.0%
채	2706	4.5%
기	2451	4.1%
선	1949	3.3%
예	1759	2.9%
Other values (97)	30005	50.2%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59783	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4615	7.7%
당	3884	6.5%
수	3263	5.5%
충	3089	5.2%
비	3055	5.1%
부	3007	5.0%
채	2706	4.5%
기	2451	4.1%
선	1949	3.3%
예	1759	2.9%
Other values (97)	30005	50.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59783	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4615	7.7%
당	3884	6.5%
수	3263	5.5%
충	3089	5.2%
비	3055	5.1%
부	3007	5.0%
채	2706	4.5%
기	2451	4.1%
선	1949	3.3%
예	1759	2.9%
Other values (97)	30005	50.2%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

202105	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	202105
2nd row	202105
3rd row	202105
4th row	202105
5th row	202105

Common Values

Value	Count	Frequency (%)
202105	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
202105	10000	100.0%

금액
Real number (ℝ)

SKEWED ZEROS

Distinct	7405
Distinct (%)	74.1%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	75847863

Minimum	-3.7940018 × 10⁸
Maximum	2.2280004 × 10¹⁰
Zeros	2245
Zeros (%)	22.4%
Negative	309
Negative (%)	3.1%
Memory size	166.0 KiB

Quantile statistics

Minimum	-3.7940018 × 10⁸
5-th percentile	0
Q1	0
median	2991602
Q3	31624531
95-th percentile	3.550007 × 10⁸
Maximum	2.2280004 × 10¹⁰
Range	2.2659404 × 10¹⁰
Interquartile range (IQR)	31624531

Descriptive statistics

Standard deviation	3.7480733 × 10⁸
Coefficient of variation (CV)	4.9415675
Kurtosis	1343.5004
Mean	75847863
Median Absolute Deviation (MAD)	2991602
Skewness	27.558188
Sum	7.5847863 × 10¹¹
Variance	1.4048054 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2245	22.4%
500000	28	0.3%
250000	19	0.2%
300000	16	0.2%
100000	12	0.1%
10000000	12	0.1%
200000	12	0.1%
484000	11	0.1%
242000	10	0.1%
20000000	10	0.1%
Other values (7395)	7625	76.2%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-379400176	1	< 0.1%
-282715690	1	< 0.1%
-244221450	1	< 0.1%
-188835170	1	< 0.1%
-174861035	1	< 0.1%
-164419830	1	< 0.1%
-151047428	1	< 0.1%
-105100222	1	< 0.1%
-102572890	1	< 0.1%
-94109790	1	< 0.1%

Value	Count	Frequency (%)
22280004067	1	< 0.1%
9030477783	2	< 0.1%
7730136534	1	< 0.1%
6187259197	1	< 0.1%
5463363925	1	< 0.1%
5243342839	1	< 0.1%
4686700289	1	< 0.1%
4406242565	1	< 0.1%
3995284043	1	< 0.1%
3830644491	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.175
금액	0.175	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
10946	상암월드컵1단지	A12127007	미수금	202105	0
1724	노원 센트럴푸르지오	A10025133	단기보증금	202105	972000
41098	가락삼익맨션	A13885306	청소비충당부채	202105	24530650
17232	묵동금호어울림	A13114103	비품감가상각누계액	202105	-8136750
54260	신길남서울	A15085805	장기수선충당예금	202105	335359375
50419	구의현대7단지	A14320001	전신전화가입권	202105	0
29589	개포4차우성	A13527013	선급비용	202105	898130
69810	목동금호베스트빌	A15880905	선급비용	202105	32100608
53487	영등포아트자이	A15076702	선급금	202105	296150
22725	행당두산	A13307001	공동주택적립금	202105	0

	아파트명	아파트코드	비용명	년월일	금액
51970	당산현대3차	A15004406	저장품	202105	87450
12770	대주피오레아파트	A12201001	퇴직급여충당예금	202105	0
1486	보라매 sk뷰	A10025070	비품	202105	50976560
58785	신도림쌍용플래티넘노블	A15283801	안전진단비충당부채	202105	1215680
63518	사당극동	A15681503	주차장충당예금	202105	0
62616	동작상떼빌주상복합	A15670001	장기수선충당부채	202105	981800486
49329	미아경남아너스빌	A14272306	기타충당부채	202105	71741498
12631	월드컵참누리	A12187906	주차장충당부채	202105	12659691
63133	흑석한강푸르지오	A15679108	기타유동부채	202105	48134318
29651	도곡개포한신아파트	A13527016	퇴직급여충당예금	202105	304511700

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Open Punctuation

Close Punctuation

Dash Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample