gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	10000
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	488.3 KiB
Average record size in memory	50.0 B

Variable types

Text	3
Categorical	1
Numeric	1

Dataset

Description	파일 다운로드
Author	서울특별시
URL	https://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

`년월일` has constant value ""	Constant
`금액` has 2184 (21.8%) zeros	Zeros

Reproduction

Analysis started	2024-05-11 06:01:07.201009
Analysis finished	2024-05-11 06:01:08.273315
Duration	1.07 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

아파트명
Text

Distinct	2110
Distinct (%)	21.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	22
Median length	20
Mean length	7.1796
Min length	2

Characters and Unicode

Total characters	71796
Distinct characters	430
Distinct categories	10 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	111 ?
Unique (%)	1.1%

Sample

1st row	우장산에스케이뷰
2nd row	갈현베르빌주상복합아파트
3rd row	묵동신안2차
4th row	래미안아름숲
5th row	상계현대1차

Value	Count	Frequency (%)
아파트	114	1.1%
래미안	23	0.2%
래미안밤섬리베뉴	17	0.2%
힐스테이트	17	0.2%
신반포한신5지구(12,13,18차	13	0.1%
무학현대	12	0.1%
은평뉴타운구파발9-2단지	12	0.1%
상암월드컵파크9단지	12	0.1%
래미안라센트	12	0.1%
북한산	12	0.1%
Other values (2170)	10286	97.7%

Most occurring characters

Value	Count	Frequency (%)
아	2168	3.0%
파	2167	3.0%
트	1967	2.7%
지	1879	2.6%
대	1878	2.6%
동	1629	2.3%
차	1567	2.2%
신	1552	2.2%
단	1484	2.1%
성	1416	2.0%
Other values (420)	54089	75.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	65649	91.4%
Decimal Number	3943	5.5%
Uppercase Letter	643	0.9%
Space Separator	582	0.8%
Lowercase Letter	373	0.5%
Dash Punctuation	160	0.2%
Open Punctuation	149	0.2%
Close Punctuation	149	0.2%
Other Punctuation	139	0.2%
Letter Number	9	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
아	2168	3.3%
파	2167	3.3%
트	1967	3.0%
지	1879	2.9%
대	1878	2.9%
동	1629	2.5%
차	1567	2.4%
신	1552	2.4%
단	1484	2.3%
성	1416	2.2%
Other values (375)	47942	73.0%

Uppercase Letter

Value	Count	Frequency (%)
S	106	16.5%
K	90	14.0%
C	74	11.5%
L	49	7.6%
H	43	6.7%
M	42	6.5%
D	42	6.5%
I	39	6.1%
G	31	4.8%
E	30	4.7%
Other values (7)	97	15.1%

Lowercase Letter

Value	Count	Frequency (%)
e	193	51.7%
l	38	10.2%
i	34	9.1%
v	29	7.8%
s	22	5.9%
k	21	5.6%
w	13	3.5%
c	12	3.2%
h	7	1.9%
a	2	0.5%

Decimal Number

Value	Count	Frequency (%)
1	1242	31.5%
2	1125	28.5%
3	520	13.2%
4	270	6.8%
5	220	5.6%
6	160	4.1%
8	105	2.7%
9	104	2.6%
7	102	2.6%
0	95	2.4%

Other Punctuation

Value	Count	Frequency (%)
,	116	83.5%
.	23	16.5%

Space Separator

Value	Count	Frequency (%)
	582	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	160	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	149	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	149	100.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	9	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	65649	91.4%
Common	5122	7.1%
Latin	1025	1.4%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
아	2168	3.3%
파	2167	3.3%
트	1967	3.0%
지	1879	2.9%
대	1878	2.9%
동	1629	2.5%
차	1567	2.4%
신	1552	2.4%
단	1484	2.3%
성	1416	2.2%
Other values (375)	47942	73.0%

Latin

Value	Count	Frequency (%)
e	193	18.8%
S	106	10.3%
K	90	8.8%
C	74	7.2%
L	49	4.8%
H	43	4.2%
M	42	4.1%
D	42	4.1%
I	39	3.8%
l	38	3.7%
Other values (19)	309	30.1%

Common

Value	Count	Frequency (%)
1	1242	24.2%
2	1125	22.0%
	582	11.4%
3	520	10.2%
4	270	5.3%
5	220	4.3%
6	160	3.1%
-	160	3.1%
(	149	2.9%
)	149	2.9%
Other values (6)	545	10.6%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	65649	91.4%
ASCII	6138	8.5%
Number Forms	9	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
아	2168	3.3%
파	2167	3.3%
트	1967	3.0%
지	1879	2.9%
대	1878	2.9%
동	1629	2.5%
차	1567	2.4%
신	1552	2.4%
단	1484	2.3%
성	1416	2.2%
Other values (375)	47942	73.0%

ASCII

Value	Count	Frequency (%)
1	1242	20.2%
2	1125	18.3%
	582	9.5%
3	520	8.5%
4	270	4.4%
5	220	3.6%
e	193	3.1%
6	160	2.6%
-	160	2.6%
(	149	2.4%
Other values (34)	1517	24.7%

Number Forms

Value	Count	Frequency (%)
Ⅰ	9	100.0%

아파트코드
Text

Distinct	2115
Distinct (%)	21.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	9
Median length	9
Mean length	9
Min length	9

Characters and Unicode

Total characters	90000
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	112 ?
Unique (%)	1.1%

Sample

1st row	A15701002
2nd row	A12271402
3rd row	A13185502
4th row	A13002002
5th row	A13983707

Value	Count	Frequency (%)
a13790726	13	0.1%
a13385802	12	0.1%
a13676702	12	0.1%
a41279920	12	0.1%
a13671209	12	0.1%
a12179504	12	0.1%
a13820006	11	0.1%
a13922110	11	0.1%
a15807705	11	0.1%
a15205513	11	0.1%
Other values (2105)	9883	98.8%

Most occurring characters

Value	Count	Frequency (%)
0	18223	20.2%
1	17694	19.7%
A	9993	11.1%
3	9033	10.0%
2	8186	9.1%
5	6186	6.9%
8	5790	6.4%
7	4809	5.3%
4	3677	4.1%
6	3377	3.8%
Other values (2)	3032	3.4%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	80000	88.9%
Uppercase Letter	10000	11.1%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
0	18223	22.8%
1	17694	22.1%
3	9033	11.3%
2	8186	10.2%
5	6186	7.7%
8	5790	7.2%
7	4809	6.0%
4	3677	4.6%
6	3377	4.2%
9	3025	3.8%

Uppercase Letter

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	80000	88.9%
Latin	10000	11.1%

Most frequent character per script

Common

Value	Count	Frequency (%)
0	18223	22.8%
1	17694	22.1%
3	9033	11.3%
2	8186	10.2%
5	6186	7.7%
8	5790	7.2%
7	4809	6.0%
4	3677	4.6%
6	3377	4.2%
9	3025	3.8%

Latin

Value	Count	Frequency (%)
A	9993	99.9%
B	7	0.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	90000	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
0	18223	20.2%
1	17694	19.7%
A	9993	11.1%
3	9033	10.0%
2	8186	9.1%
5	6186	6.9%
8	5790	6.4%
7	4809	5.3%
4	3677	4.1%
6	3377	3.8%
Other values (2)	3032	3.4%

비용명
Text

Distinct	77
Distinct (%)	0.8%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

Length

Max length	13
Median length	9
Mean length	5.9704
Min length	2

Characters and Unicode

Total characters	59704
Distinct characters	107
Distinct categories	1 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	미수관리비예치금
2nd row	단기차입금
3rd row	미지급금
4th row	연차수당충당부채
5th row	미부과관리비

Value	Count	Frequency (%)
관리비미수금	330	3.3%
가수금	330	3.3%
예수금	319	3.2%
선급비용	317	3.2%
퇴직급여충당부채	316	3.2%
연차수당충당부채	308	3.1%
미처분이익잉여금	307	3.1%
예금	307	3.1%
현금	294	2.9%
당기순이익	294	2.9%
Other values (67)	6878	68.8%

Most occurring characters

Value	Count	Frequency (%)
금	4724	7.9%
당	3767	6.3%
수	3199	5.4%
충	3091	5.2%
비	2992	5.0%
부	2973	5.0%
채	2663	4.5%
기	2321	3.9%
선	1879	3.1%
예	1753	2.9%
Other values (97)	30342	50.8%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59704	100.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
금	4724	7.9%
당	3767	6.3%
수	3199	5.4%
충	3091	5.2%
비	2992	5.0%
부	2973	5.0%
채	2663	4.5%
기	2321	3.9%
선	1879	3.1%
예	1753	2.9%
Other values (97)	30342	50.8%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59704	100.0%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
금	4724	7.9%
당	3767	6.3%
수	3199	5.4%
충	3091	5.2%
비	2992	5.0%
부	2973	5.0%
채	2663	4.5%
기	2321	3.9%
선	1879	3.1%
예	1753	2.9%
Other values (97)	30342	50.8%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59704	100.0%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
금	4724	7.9%
당	3767	6.3%
수	3199	5.4%
충	3091	5.2%
비	2992	5.0%
부	2973	5.0%
채	2663	4.5%
기	2321	3.9%
선	1879	3.1%
예	1753	2.9%
Other values (97)	30342	50.8%

년월일
Categorical

CONSTANT

Distinct	1
Distinct (%)	< 0.1%
Missing	0
Missing (%)	0.0%
Memory size	156.2 KiB

201910	10000

Length

Max length	6
Median length	6
Mean length	6
Min length	6

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	201910
2nd row	201910
3rd row	201910
4th row	201910
5th row	201910

Common Values

Value	Count	Frequency (%)
201910	10000	100.0%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
201910	10000	100.0%

금액
Real number (ℝ)

ZEROS

Distinct	7471
Distinct (%)	74.7%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	69850424

Minimum	-6.3136859 × 10⁸
Maximum	1.3613033 × 10¹⁰
Zeros	2184
Zeros (%)	21.8%
Negative	342
Negative (%)	3.4%
Memory size	166.0 KiB

Quantile statistics

Minimum	-6.3136859 × 10⁸
5-th percentile	0
Q1	0
median	3555366
Q3	34578065
95-th percentile	3.0648018 × 10⁸
Maximum	1.3613033 × 10¹⁰
Range	1.4244402 × 10¹⁰
Interquartile range (IQR)	34578065

Descriptive statistics

Standard deviation	3.297648 × 10⁸
Coefficient of variation (CV)	4.7210135
Kurtosis	527.9957
Mean	69850424
Median Absolute Deviation (MAD)	3555366
Skewness	18.604884
Sum	6.9850424 × 10¹¹
Variance	1.0874482 × 10¹⁷
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
0	2184	21.8%
500000	27	0.3%
250000	21	0.2%
484000	14	0.1%
250400	13	0.1%
20000000	13	0.1%
200000	13	0.1%
300000	12	0.1%
30000000	11	0.1%
242000	10	0.1%
Other values (7461)	7682	76.8%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
-631368591	1	< 0.1%
-265628480	1	< 0.1%
-243231024	1	< 0.1%
-238861320	1	< 0.1%
-227163515	1	< 0.1%
-222480180	1	< 0.1%
-204350320	1	< 0.1%
-169714594	1	< 0.1%
-151940030	1	< 0.1%
-132654576	1	< 0.1%

Value	Count	Frequency (%)
13613033099	1	< 0.1%
9822153902	1	< 0.1%
8736155155	1	< 0.1%
7943651403	1	< 0.1%
7402618872	1	< 0.1%
7251883796	1	< 0.1%
6250746161	1	< 0.1%
5327680530	1	< 0.1%
4252955297	1	< 0.1%
4203099737	1	< 0.1%

금액

금액

Phik (φk)

Heatmap
Table

	비용명	금액
비용명	1.000	0.379
금액	0.379	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	아파트명	아파트코드	비용명	년월일	금액
58305	우장산에스케이뷰	A15701002	미수관리비예치금	201910	0
11127	갈현베르빌주상복합아파트	A12271402	단기차입금	201910	0
15665	묵동신안2차	A13185502	미지급금	201910	13486073
11858	래미안아름숲	A13002002	연차수당충당부채	201910	9506690
40722	상계현대1차	A13983707	미부과관리비	201910	53455180
39988	불암현대	A13981208	가수금	201910	11291965
14665	상봉프레미어스엠코	A13122002	수선유지비충당부채	201910	27528990
57752	대방2차현대	A15681104	선수관리비	201910	45676800
26153	도곡경남	A13527008	공동주택적립금예금	201910	0
30174	종암선경아파트	A13671203	장기수선충당예금	201910	364865373

	아파트명	아파트코드	비용명	년월일	금액
16648	도봉동아에코빌	A13201206	미부과관리비	201910	91482897
27446	도곡대림	A13586101	공동주택적립금예금	201910	18407731
28057	개포한신	A13594402	장기수선충당부채	201910	954549991
46788	영등포푸르지오	A15003002	경비비충당부채	201910	28231212
18751	쌍문삼익	A13286304	미처분이익잉여금	201910	0
60708	가양4단지	A15780705	단기차입금	201910	0
25428	강남엘에이치1단지	A13519007	기타유형자산감가상각누계액	201910	-15134194
11728	역촌센트레빌	A12289501	저장품	201910	280000
2926	위례2차아이파크아파트	A10027553	공동주택적립금	201910	4148229
48990	대림신동아	A15081606	수선유지비충당부채	201910	4031029

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Lowercase Letter

Decimal Number

Other Punctuation

Space Separator

Dash Punctuation

Open Punctuation

Close Punctuation

Letter Number

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Number Forms

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Most occurring scripts

Most frequent character per script

Hangul

Most occurring blocks

Most frequent character per block

Hangul

Common Values

Length

Common Values (Plot)

Interactions

Correlations

Missing values

Sample