Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2150 (21.5%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:59.713021
Analysis finished2024-05-11 06:00:00.767087
Duration1.05 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2186
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:01.034320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2404
Min length2

Characters and Unicode

Total characters72404
Distinct characters435
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique115 ?
Unique (%)1.1%

Sample

1st row면목두산2.3차
2nd row독산현대
3rd row신내경남아너스빌
4th row트리마제
5th row녹천역두산위브아파트
ValueCountFrequency (%)
아파트 129
 
1.2%
래미안 27
 
0.3%
아이파크 19
 
0.2%
힐스테이트 17
 
0.2%
서울숲2차푸르지오임대 15
 
0.1%
신도림현대 14
 
0.1%
e편한세상신촌아파트 13
 
0.1%
도화현대1차아파트 12
 
0.1%
마포래미안푸르지오 12
 
0.1%
신반포 12
 
0.1%
Other values (2252) 10331
97.5%
2024-05-11T15:00:01.676429image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2460
 
3.4%
2413
 
3.3%
2161
 
3.0%
1857
 
2.6%
1797
 
2.5%
1657
 
2.3%
1496
 
2.1%
1471
 
2.0%
1443
 
2.0%
1356
 
1.9%
Other values (425) 54293
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66453
91.8%
Decimal Number 3663
 
5.1%
Uppercase Letter 746
 
1.0%
Space Separator 678
 
0.9%
Lowercase Letter 319
 
0.4%
Open Punctuation 143
 
0.2%
Close Punctuation 143
 
0.2%
Dash Punctuation 129
 
0.2%
Other Punctuation 119
 
0.2%
Math Symbol 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2460
 
3.7%
2413
 
3.6%
2161
 
3.3%
1857
 
2.8%
1797
 
2.7%
1657
 
2.5%
1496
 
2.3%
1471
 
2.2%
1443
 
2.2%
1356
 
2.0%
Other values (379) 48342
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 121
16.2%
C 105
14.1%
K 99
13.3%
M 67
9.0%
D 67
9.0%
L 54
7.2%
H 48
 
6.4%
I 37
 
5.0%
E 32
 
4.3%
V 25
 
3.4%
Other values (7) 91
12.2%
Lowercase Letter
ValueCountFrequency (%)
e 195
61.1%
l 30
 
9.4%
i 27
 
8.5%
v 18
 
5.6%
k 12
 
3.8%
s 12
 
3.8%
w 8
 
2.5%
c 6
 
1.9%
a 4
 
1.3%
g 4
 
1.3%
Decimal Number
ValueCountFrequency (%)
1 1140
31.1%
2 1045
28.5%
3 490
13.4%
4 251
 
6.9%
5 201
 
5.5%
6 153
 
4.2%
7 128
 
3.5%
8 99
 
2.7%
9 83
 
2.3%
0 73
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 100
84.0%
. 19
 
16.0%
Space Separator
ValueCountFrequency (%)
678
100.0%
Open Punctuation
ValueCountFrequency (%)
( 143
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 129
100.0%
Math Symbol
ValueCountFrequency (%)
~ 7
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66453
91.8%
Common 4882
 
6.7%
Latin 1069
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2460
 
3.7%
2413
 
3.6%
2161
 
3.3%
1857
 
2.8%
1797
 
2.7%
1657
 
2.5%
1496
 
2.3%
1471
 
2.2%
1443
 
2.2%
1356
 
2.0%
Other values (379) 48342
72.7%
Latin
ValueCountFrequency (%)
e 195
18.2%
S 121
11.3%
C 105
9.8%
K 99
9.3%
M 67
 
6.3%
D 67
 
6.3%
L 54
 
5.1%
H 48
 
4.5%
I 37
 
3.5%
E 32
 
3.0%
Other values (19) 244
22.8%
Common
ValueCountFrequency (%)
1 1140
23.4%
2 1045
21.4%
678
13.9%
3 490
10.0%
4 251
 
5.1%
5 201
 
4.1%
6 153
 
3.1%
( 143
 
2.9%
) 143
 
2.9%
- 129
 
2.6%
Other values (7) 509
10.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66453
91.8%
ASCII 5947
 
8.2%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2460
 
3.7%
2413
 
3.6%
2161
 
3.3%
1857
 
2.8%
1797
 
2.7%
1657
 
2.5%
1496
 
2.3%
1471
 
2.2%
1443
 
2.2%
1356
 
2.0%
Other values (379) 48342
72.7%
ASCII
ValueCountFrequency (%)
1 1140
19.2%
2 1045
17.6%
678
11.4%
3 490
 
8.2%
4 251
 
4.2%
5 201
 
3.4%
e 195
 
3.3%
6 153
 
2.6%
( 143
 
2.4%
) 143
 
2.4%
Other values (35) 1508
25.4%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2193
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:02.113839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique115 ?
Unique (%)1.1%

Sample

1st rowA13188406
2nd rowA15381303
3rd rowA13113006
4th rowA10026988
5th rowA10027121
ValueCountFrequency (%)
a10026370 13
 
0.1%
a12181406 12
 
0.1%
a12175203 12
 
0.1%
a15681503 12
 
0.1%
a13592604 11
 
0.1%
a13986306 11
 
0.1%
a15286809 11
 
0.1%
a13408003 11
 
0.1%
a15603203 11
 
0.1%
a13481305 11
 
0.1%
Other values (2183) 9885
98.9%
2024-05-11T15:00:02.829078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18309
20.3%
1 17614
19.6%
A 9984
11.1%
3 8951
9.9%
2 8255
9.2%
5 6163
 
6.8%
8 5717
 
6.4%
7 4818
 
5.4%
4 3816
 
4.2%
6 3370
 
3.7%
Other values (2) 3003
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18309
22.9%
1 17614
22.0%
3 8951
11.2%
2 8255
10.3%
5 6163
 
7.7%
8 5717
 
7.1%
7 4818
 
6.0%
4 3816
 
4.8%
6 3370
 
4.2%
9 2987
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9984
99.8%
B 16
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18309
22.9%
1 17614
22.0%
3 8951
11.2%
2 8255
10.3%
5 6163
 
7.7%
8 5717
 
7.1%
7 4818
 
6.0%
4 3816
 
4.8%
6 3370
 
4.2%
9 2987
 
3.7%
Latin
ValueCountFrequency (%)
A 9984
99.8%
B 16
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18309
20.3%
1 17614
19.6%
A 9984
11.1%
3 8951
9.9%
2 8255
9.2%
5 6163
 
6.8%
8 5717
 
6.4%
7 4818
 
5.4%
4 3816
 
4.2%
6 3370
 
3.7%
Other values (2) 3003
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:03.174710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9464
Min length2

Characters and Unicode

Total characters59464
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row비품감가상각누계액
2nd row비품
3rd row가지급금
4th row미처분이익잉여금
5th row미수관리비예치금
ValueCountFrequency (%)
예금 345
 
3.5%
선급비용 339
 
3.4%
미처분이익잉여금 332
 
3.3%
퇴직급여충당부채 332
 
3.3%
예수금 330
 
3.3%
공동주택적립금 323
 
3.2%
당기순이익 317
 
3.2%
비품 307
 
3.1%
수선유지비충당부채 303
 
3.0%
장기수선충당부채 295
 
2.9%
Other values (67) 6777
67.8%
2024-05-11T15:00:03.755292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4655
 
7.8%
3752
 
6.3%
3089
 
5.2%
3038
 
5.1%
3005
 
5.1%
2959
 
5.0%
2662
 
4.5%
2367
 
4.0%
1942
 
3.3%
1749
 
2.9%
Other values (97) 30246
50.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59464
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4655
 
7.8%
3752
 
6.3%
3089
 
5.2%
3038
 
5.1%
3005
 
5.1%
2959
 
5.0%
2662
 
4.5%
2367
 
4.0%
1942
 
3.3%
1749
 
2.9%
Other values (97) 30246
50.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59464
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4655
 
7.8%
3752
 
6.3%
3089
 
5.2%
3038
 
5.1%
3005
 
5.1%
2959
 
5.0%
2662
 
4.5%
2367
 
4.0%
1942
 
3.3%
1749
 
2.9%
Other values (97) 30246
50.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59464
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4655
 
7.8%
3752
 
6.3%
3089
 
5.2%
3038
 
5.1%
3005
 
5.1%
2959
 
5.0%
2662
 
4.5%
2367
 
4.0%
1942
 
3.3%
1749
 
2.9%
Other values (97) 30246
50.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202008
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202008
2nd row202008
3rd row202008
4th row202008
5th row202008

Common Values

ValueCountFrequency (%)
202008 10000
100.0%

Length

2024-05-11T15:00:03.943200image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:00:04.057927image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202008 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7517
Distinct (%)75.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72024041
Minimum-3.8900128 × 108
Maximum1.1724908 × 1010
Zeros2150
Zeros (%)21.5%
Negative334
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T15:00:04.188612image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3.8900128 × 108
5-th percentile0
Q1386
median3566920
Q334730592
95-th percentile3.5346555 × 108
Maximum1.1724908 × 1010
Range1.2113909 × 1010
Interquartile range (IQR)34730206

Descriptive statistics

Standard deviation2.8428218 × 108
Coefficient of variation (CV)3.9470456
Kurtosis371.56458
Mean72024041
Median Absolute Deviation (MAD)3566920
Skewness14.023306
Sum7.2024041 × 1011
Variance8.0816356 × 1016
MonotonicityNot monotonic
2024-05-11T15:00:04.391622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2150
 
21.5%
500000 26
 
0.3%
250000 19
 
0.2%
300000 16
 
0.2%
1000000 14
 
0.1%
484000 14
 
0.1%
242000 11
 
0.1%
100000 11
 
0.1%
5000000 10
 
0.1%
2000000 10
 
0.1%
Other values (7507) 7719
77.2%
ValueCountFrequency (%)
-389001283 1
< 0.1%
-275911534 1
< 0.1%
-241396750 1
< 0.1%
-235006308 1
< 0.1%
-178797700 1
< 0.1%
-174573633 1
< 0.1%
-173098610 1
< 0.1%
-162096680 1
< 0.1%
-161866980 1
< 0.1%
-156648860 1
< 0.1%
ValueCountFrequency (%)
11724907627 1
< 0.1%
6016150639 1
< 0.1%
5173392238 1
< 0.1%
5064404578 1
< 0.1%
4788302087 1
< 0.1%
4725614202 1
< 0.1%
3959218905 1
< 0.1%
3939755056 1
< 0.1%
3893339909 1
< 0.1%
3834291108 1
< 0.1%

Interactions

2024-05-11T15:00:00.316469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:00:04.534161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.420
금액0.4201.000

Missing values

2024-05-11T15:00:00.513063image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:00:00.684944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
18234면목두산2.3차A13188406비품감가상각누계액202008-4935255
58077독산현대A15381303비품202008925780
16005신내경남아너스빌A13113006가지급금202008120070
3700트리마제A10026988미처분이익잉여금2020080
3875녹천역두산위브아파트A10027121미수관리비예치금2020081120000
9812상암월드컵파크3단지A12127003상여충당부채2020083245699
54141신림푸르지오A15190705미부과관리비202008255476916
52462롯데캐슬아이비A15088915예수금202008687978
26219마일스디오빌A13501002퇴직급여충당부채2020080
43987월계청백3단지A13985105미지급금20200817877550
아파트명아파트코드비용명년월일금액
25104고덕리엔파크1단지A13410012저장품20200877850
22208성수2차대우A13372101예금20200898703252
43521상계대림e-편한세상A13983803비품감가상각누계액202008-11805619
12150북한산힐스테이트3차A12204004연차수당충당부채20200827701030
28840청담건영아파트A13576201미부과관리비20200833454981
64180한사랑2차삼성아파트(등촌동)A15783907현금202008109684
60943사당우성2단지A15681502임대보증금2020081000000
29865역삼래미안A13592706주차장충당예금2020080
66444삼성쉐르빌1 아파트A15807603기타투자자산202008175507520
43136상계보람A13982604선수난방비20200811085093