Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 20.53426665)Skewed
금액 has 1210 (12.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:52:20.327292
Analysis finished2024-05-11 06:52:23.411089
Duration3.08 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2193
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:23.820338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3907
Min length2

Characters and Unicode

Total characters73907
Distinct characters431
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique116 ?
Unique (%)1.2%

Sample

1st row상계은빛2단지
2nd row래미안트리베라2차
3rd row고척우성꿈동산
4th row개포한신
5th row개화산동부센트레빌
ValueCountFrequency (%)
아파트 215
 
2.0%
래미안 53
 
0.5%
e편한세상 31
 
0.3%
아이파크 30
 
0.3%
신반포 20
 
0.2%
sk뷰 19
 
0.2%
북한산 18
 
0.2%
송파 17
 
0.2%
꿈의숲 15
 
0.1%
푸르지오 14
 
0.1%
Other values (2276) 10519
96.1%
2024-05-11T06:52:24.961453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2737
 
3.7%
2708
 
3.7%
2559
 
3.5%
1746
 
2.4%
1723
 
2.3%
1670
 
2.3%
1520
 
2.1%
1510
 
2.0%
1461
 
2.0%
1283
 
1.7%
Other values (421) 54990
74.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67594
91.5%
Decimal Number 3466
 
4.7%
Space Separator 1014
 
1.4%
Uppercase Letter 960
 
1.3%
Lowercase Letter 296
 
0.4%
Close Punctuation 157
 
0.2%
Open Punctuation 157
 
0.2%
Other Punctuation 133
 
0.2%
Dash Punctuation 126
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2737
 
4.0%
2708
 
4.0%
2559
 
3.8%
1746
 
2.6%
1723
 
2.5%
1670
 
2.5%
1520
 
2.2%
1510
 
2.2%
1461
 
2.2%
1283
 
1.9%
Other values (376) 48677
72.0%
Uppercase Letter
ValueCountFrequency (%)
S 150
15.6%
C 131
13.6%
K 117
12.2%
D 80
8.3%
M 80
8.3%
L 79
8.2%
H 65
6.8%
E 55
 
5.7%
I 48
 
5.0%
A 32
 
3.3%
Other values (7) 123
12.8%
Lowercase Letter
ValueCountFrequency (%)
e 181
61.1%
l 25
 
8.4%
i 22
 
7.4%
k 15
 
5.1%
v 14
 
4.7%
s 12
 
4.1%
c 8
 
2.7%
w 7
 
2.4%
h 4
 
1.4%
a 4
 
1.4%
Decimal Number
ValueCountFrequency (%)
1 1044
30.1%
2 1021
29.5%
3 478
13.8%
4 226
 
6.5%
5 194
 
5.6%
6 155
 
4.5%
8 99
 
2.9%
7 98
 
2.8%
9 86
 
2.5%
0 65
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 114
85.7%
. 19
 
14.3%
Space Separator
ValueCountFrequency (%)
1014
100.0%
Close Punctuation
ValueCountFrequency (%)
) 157
100.0%
Open Punctuation
ValueCountFrequency (%)
( 157
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 126
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67594
91.5%
Common 5053
 
6.8%
Latin 1260
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2737
 
4.0%
2708
 
4.0%
2559
 
3.8%
1746
 
2.6%
1723
 
2.5%
1670
 
2.5%
1520
 
2.2%
1510
 
2.2%
1461
 
2.2%
1283
 
1.9%
Other values (376) 48677
72.0%
Latin
ValueCountFrequency (%)
e 181
14.4%
S 150
11.9%
C 131
10.4%
K 117
9.3%
D 80
 
6.3%
M 80
 
6.3%
L 79
 
6.3%
H 65
 
5.2%
E 55
 
4.4%
I 48
 
3.8%
Other values (19) 274
21.7%
Common
ValueCountFrequency (%)
1 1044
20.7%
2 1021
20.2%
1014
20.1%
3 478
9.5%
4 226
 
4.5%
5 194
 
3.8%
) 157
 
3.1%
( 157
 
3.1%
6 155
 
3.1%
- 126
 
2.5%
Other values (6) 481
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67594
91.5%
ASCII 6309
 
8.5%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2737
 
4.0%
2708
 
4.0%
2559
 
3.8%
1746
 
2.6%
1723
 
2.5%
1670
 
2.5%
1520
 
2.2%
1510
 
2.2%
1461
 
2.2%
1283
 
1.9%
Other values (376) 48677
72.0%
ASCII
ValueCountFrequency (%)
1 1044
16.5%
2 1021
16.2%
1014
16.1%
3 478
 
7.6%
4 226
 
3.6%
5 194
 
3.1%
e 181
 
2.9%
) 157
 
2.5%
( 157
 
2.5%
6 155
 
2.5%
Other values (34) 1682
26.7%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2199
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:26.046783image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique117 ?
Unique (%)1.2%

Sample

1st rowA13983815
2nd rowA14272308
3rd rowA15283702
4th rowA13594402
5th rowA15722102
ValueCountFrequency (%)
a13402305 13
 
0.1%
a10027118 13
 
0.1%
a14272309 13
 
0.1%
a10026104 12
 
0.1%
a13885303 11
 
0.1%
a13813010 11
 
0.1%
a13672102 11
 
0.1%
a15606002 11
 
0.1%
a13707009 11
 
0.1%
a13920506 11
 
0.1%
Other values (2189) 9883
98.8%
2024-05-11T06:52:27.487801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18801
20.9%
1 17445
19.4%
A 10000
11.1%
3 8729
9.7%
2 8161
9.1%
5 6264
 
7.0%
8 5700
 
6.3%
7 4611
 
5.1%
4 4082
 
4.5%
6 3454
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18801
23.5%
1 17445
21.8%
3 8729
10.9%
2 8161
10.2%
5 6264
 
7.8%
8 5700
 
7.1%
7 4611
 
5.8%
4 4082
 
5.1%
6 3454
 
4.3%
9 2753
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18801
23.5%
1 17445
21.8%
3 8729
10.9%
2 8161
10.2%
5 6264
 
7.8%
8 5700
 
7.1%
7 4611
 
5.8%
4 4082
 
5.1%
6 3454
 
4.3%
9 2753
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18801
20.9%
1 17445
19.4%
A 10000
11.1%
3 8729
9.7%
2 8161
9.1%
5 6264
 
7.0%
8 5700
 
6.3%
7 4611
 
5.1%
4 4082
 
4.5%
6 3454
 
3.8%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:28.275150image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8434
Min length2

Characters and Unicode

Total characters48434
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row연차수당
2nd row입주자대표회의운영비
3rd row승강기운영비
4th row산재보험료
5th row이자수익
ValueCountFrequency (%)
교육비 250
 
2.5%
소독비 241
 
2.4%
수선유지비 226
 
2.3%
보험료 226
 
2.3%
급여 225
 
2.2%
이자수익 222
 
2.2%
퇴직급여 220
 
2.2%
도서인쇄비 219
 
2.2%
경비비 218
 
2.2%
세대전기료 216
 
2.2%
Other values (76) 7737
77.4%
2024-05-11T06:52:29.729855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5356
 
11.1%
3615
 
7.5%
2147
 
4.4%
2016
 
4.2%
1616
 
3.3%
1357
 
2.8%
1076
 
2.2%
851
 
1.8%
818
 
1.7%
803
 
1.7%
Other values (110) 28779
59.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48434
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5356
 
11.1%
3615
 
7.5%
2147
 
4.4%
2016
 
4.2%
1616
 
3.3%
1357
 
2.8%
1076
 
2.2%
851
 
1.8%
818
 
1.7%
803
 
1.7%
Other values (110) 28779
59.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48434
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5356
 
11.1%
3615
 
7.5%
2147
 
4.4%
2016
 
4.2%
1616
 
3.3%
1357
 
2.8%
1076
 
2.2%
851
 
1.8%
818
 
1.7%
803
 
1.7%
Other values (110) 28779
59.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48434
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5356
 
11.1%
3615
 
7.5%
2147
 
4.4%
2016
 
4.2%
1616
 
3.3%
1357
 
2.8%
1076
 
2.2%
851
 
1.8%
818
 
1.7%
803
 
1.7%
Other values (110) 28779
59.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202205
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202205
2nd row202205
3rd row202205
4th row202205
5th row202205

Common Values

ValueCountFrequency (%)
202205 10000
100.0%

Length

2024-05-11T06:52:30.253664image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:52:30.575784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202205 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7042
Distinct (%)70.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3157158.7
Minimum-1.8263583 × 108
Maximum5.8196023 × 108
Zeros1210
Zeros (%)12.1%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:52:31.035367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1.8263583 × 108
5-th percentile0
Q177169.75
median314110
Q31500000
95-th percentile15739297
Maximum5.8196023 × 108
Range7.6459606 × 108
Interquartile range (IQR)1422830.2

Descriptive statistics

Standard deviation12490141
Coefficient of variation (CV)3.9561333
Kurtosis788.15423
Mean3157158.7
Median Absolute Deviation (MAD)314110
Skewness20.534267
Sum3.1571587 × 1010
Variance1.5600361 × 1014
MonotonicityNot monotonic
2024-05-11T06:52:31.672409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1210
 
12.1%
200000 101
 
1.0%
300000 64
 
0.6%
100000 59
 
0.6%
400000 40
 
0.4%
150000 34
 
0.3%
50000 31
 
0.3%
500000 28
 
0.3%
120000 27
 
0.3%
60000 24
 
0.2%
Other values (7032) 8382
83.8%
ValueCountFrequency (%)
-182635834 1
 
< 0.1%
-6110000 1
 
< 0.1%
-4000000 1
 
< 0.1%
-600000 1
 
< 0.1%
-554940 1
 
< 0.1%
-480050 1
 
< 0.1%
-354560 1
 
< 0.1%
-806 1
 
< 0.1%
0 1210
12.1%
1 1
 
< 0.1%
ValueCountFrequency (%)
581960227 1
< 0.1%
512196477 1
< 0.1%
246526100 1
< 0.1%
216938882 1
< 0.1%
193861730 1
< 0.1%
164199283 1
< 0.1%
162291842 1
< 0.1%
153247010 1
< 0.1%
145615000 1
< 0.1%
136080000 1
< 0.1%

Interactions

2024-05-11T06:52:21.894427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:52:32.161862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.253
금액0.2531.000

Missing values

2024-05-11T06:52:22.542650image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:52:23.083211image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
65638상계은빛2단지A13983815연차수당2022051392460
71033래미안트리베라2차A14272308입주자대표회의운영비2022051500000
83766고척우성꿈동산A15283702승강기운영비2022050
46375개포한신A13594402산재보험료202205163160
92654개화산동부센트레빌A15722102이자수익2022050
10866강남효성해링턴코트A10027558지급수수료202205229350
46123일원동 수서아파트A13593801알뜰시장수익202205327710
27929용마동아아파트A13183303복리후생비202205112600
63905공릉대아2차A13980604위탁관리수수료202205199180
42349강남신동아파밀리에1단지A13519001선거관리위원회운영비20220550000
아파트명아파트코드비용명년월일금액
4274휘경 해모로 프레스티지 아파트A10025015경비비20220510635690
1481디에이치반포라클라스A10024254승강기유지비2022057001500
10582강남더샵포레스트A10027446음식물처리비202205837960
55339신반포한신2차A13790929세대전기료20220571859400
12786LH강남아이파크아파트A10028172산재보험료20220526100
27812용마금호타운A13181203건강보험료202205482610
90365사당4-3우성A15681501연차수당202205979390
21668신사이랜드타운A12208002도서인쇄비202205139700
43977도곡렉슬A13527203보험료2022058663360
31126창동대우A13204204퇴직급여2022052194120