Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1211 (12.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:57:03.666310
Analysis finished2024-05-11 06:57:05.533075
Duration1.87 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2176
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:05.964335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.2199
Min length2

Characters and Unicode

Total characters72199
Distinct characters432
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st row대림아파트201동
2nd row거여우방
3rd row사당우성2단지
4th row신길삼성
5th row브라운스톤동선
ValueCountFrequency (%)
아파트 129
 
1.2%
래미안 36
 
0.3%
아이파크 18
 
0.2%
신내 16
 
0.2%
신도림현대 15
 
0.1%
암사선사현대 15
 
0.1%
e편한세상 15
 
0.1%
sk뷰 14
 
0.1%
힐스테이트 14
 
0.1%
고덕현대 14
 
0.1%
Other values (2240) 10321
97.3%
2024-05-11T06:57:07.093821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2385
 
3.3%
2224
 
3.1%
2010
 
2.8%
1886
 
2.6%
1798
 
2.5%
1663
 
2.3%
1522
 
2.1%
1513
 
2.1%
1490
 
2.1%
1334
 
1.8%
Other values (422) 54374
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66236
91.7%
Decimal Number 3782
 
5.2%
Uppercase Letter 667
 
0.9%
Space Separator 663
 
0.9%
Lowercase Letter 302
 
0.4%
Close Punctuation 142
 
0.2%
Open Punctuation 142
 
0.2%
Dash Punctuation 132
 
0.2%
Other Punctuation 122
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2385
 
3.6%
2224
 
3.4%
2010
 
3.0%
1886
 
2.8%
1798
 
2.7%
1663
 
2.5%
1522
 
2.3%
1513
 
2.3%
1490
 
2.2%
1334
 
2.0%
Other values (376) 48411
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 118
17.7%
K 102
15.3%
C 95
14.2%
M 49
7.3%
D 49
7.3%
L 46
 
6.9%
H 39
 
5.8%
I 34
 
5.1%
G 30
 
4.5%
E 25
 
3.7%
Other values (7) 80
12.0%
Lowercase Letter
ValueCountFrequency (%)
e 168
55.6%
l 30
 
9.9%
i 26
 
8.6%
k 17
 
5.6%
v 17
 
5.6%
s 13
 
4.3%
c 12
 
4.0%
a 6
 
2.0%
g 6
 
2.0%
w 5
 
1.7%
Decimal Number
ValueCountFrequency (%)
1 1140
30.1%
2 1085
28.7%
3 493
13.0%
4 271
 
7.2%
5 208
 
5.5%
6 172
 
4.5%
7 128
 
3.4%
9 106
 
2.8%
8 91
 
2.4%
0 88
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 100
82.0%
. 22
 
18.0%
Space Separator
ValueCountFrequency (%)
663
100.0%
Close Punctuation
ValueCountFrequency (%)
) 142
100.0%
Open Punctuation
ValueCountFrequency (%)
( 142
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 132
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66236
91.7%
Common 4987
 
6.9%
Latin 976
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2385
 
3.6%
2224
 
3.4%
2010
 
3.0%
1886
 
2.8%
1798
 
2.7%
1663
 
2.5%
1522
 
2.3%
1513
 
2.3%
1490
 
2.2%
1334
 
2.0%
Other values (376) 48411
73.1%
Latin
ValueCountFrequency (%)
e 168
17.2%
S 118
12.1%
K 102
10.5%
C 95
 
9.7%
M 49
 
5.0%
D 49
 
5.0%
L 46
 
4.7%
H 39
 
4.0%
I 34
 
3.5%
l 30
 
3.1%
Other values (19) 246
25.2%
Common
ValueCountFrequency (%)
1 1140
22.9%
2 1085
21.8%
663
13.3%
3 493
9.9%
4 271
 
5.4%
5 208
 
4.2%
6 172
 
3.4%
) 142
 
2.8%
( 142
 
2.8%
- 132
 
2.6%
Other values (7) 539
10.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66236
91.7%
ASCII 5956
 
8.2%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2385
 
3.6%
2224
 
3.4%
2010
 
3.0%
1886
 
2.8%
1798
 
2.7%
1663
 
2.5%
1522
 
2.3%
1513
 
2.3%
1490
 
2.2%
1334
 
2.0%
Other values (376) 48411
73.1%
ASCII
ValueCountFrequency (%)
1 1140
19.1%
2 1085
18.2%
663
11.1%
3 493
 
8.3%
4 271
 
4.6%
5 208
 
3.5%
6 172
 
2.9%
e 168
 
2.8%
) 142
 
2.4%
( 142
 
2.4%
Other values (35) 1472
24.7%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2183
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:08.032354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st rowA13079401
2nd rowA13881601
3rd rowA15681502
4th rowA15005603
5th rowA13603702
ValueCountFrequency (%)
a13405201 15
 
0.1%
a15601201 12
 
0.1%
a15805115 12
 
0.1%
a15608002 12
 
0.1%
a15086601 12
 
0.1%
a15386506 12
 
0.1%
a12185602 11
 
0.1%
a15701602 11
 
0.1%
a13589802 11
 
0.1%
a13790701 11
 
0.1%
Other values (2173) 9881
98.8%
2024-05-11T06:57:09.548491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18443
20.5%
1 17659
19.6%
A 9989
11.1%
3 8891
9.9%
2 8154
9.1%
5 6231
 
6.9%
8 5672
 
6.3%
7 4796
 
5.3%
4 3702
 
4.1%
6 3451
 
3.8%
Other values (2) 3012
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18443
23.1%
1 17659
22.1%
3 8891
11.1%
2 8154
10.2%
5 6231
 
7.8%
8 5672
 
7.1%
7 4796
 
6.0%
4 3702
 
4.6%
6 3451
 
4.3%
9 3001
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18443
23.1%
1 17659
22.1%
3 8891
11.1%
2 8154
10.2%
5 6231
 
7.8%
8 5672
 
7.1%
7 4796
 
6.0%
4 3702
 
4.6%
6 3451
 
4.3%
9 3001
 
3.8%
Latin
ValueCountFrequency (%)
A 9989
99.9%
B 11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18443
20.5%
1 17659
19.6%
A 9989
11.1%
3 8891
9.9%
2 8154
9.1%
5 6231
 
6.9%
8 5672
 
6.3%
7 4796
 
5.3%
4 3702
 
4.1%
6 3451
 
3.8%
Other values (2) 3012
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:10.272909image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8504
Min length2

Characters and Unicode

Total characters48504
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row소모품비
2nd row충당부채전입이자비용
3rd row수선유지비
4th row교육비
5th row경비비
ValueCountFrequency (%)
통신비 254
 
2.5%
보험료 241
 
2.4%
승강기유지비 239
 
2.4%
급여 225
 
2.2%
세대전기료 223
 
2.2%
소독비 219
 
2.2%
사무용품비 219
 
2.2%
입주자대표회의운영비 217
 
2.2%
청소비 215
 
2.1%
수선유지비 212
 
2.1%
Other values (77) 7736
77.4%
2024-05-11T06:57:11.402068image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5424
 
11.2%
3610
 
7.4%
2169
 
4.5%
1988
 
4.1%
1783
 
3.7%
1356
 
2.8%
1073
 
2.2%
821
 
1.7%
816
 
1.7%
777
 
1.6%
Other values (110) 28687
59.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48504
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5424
 
11.2%
3610
 
7.4%
2169
 
4.5%
1988
 
4.1%
1783
 
3.7%
1356
 
2.8%
1073
 
2.2%
821
 
1.7%
816
 
1.7%
777
 
1.6%
Other values (110) 28687
59.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48504
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5424
 
11.2%
3610
 
7.4%
2169
 
4.5%
1988
 
4.1%
1783
 
3.7%
1356
 
2.8%
1073
 
2.2%
821
 
1.7%
816
 
1.7%
777
 
1.6%
Other values (110) 28687
59.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48504
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5424
 
11.2%
3610
 
7.4%
2169
 
4.5%
1988
 
4.1%
1783
 
3.7%
1356
 
2.8%
1073
 
2.2%
821
 
1.7%
816
 
1.7%
777
 
1.6%
Other values (110) 28687
59.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202004
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202004
2nd row202004
3rd row202004
4th row202004
5th row202004

Common Values

ValueCountFrequency (%)
202004 10000
100.0%

Length

2024-05-11T06:57:11.954231image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:57:12.221970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202004 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6977
Distinct (%)69.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3416403.4
Minimum-36987000
Maximum5.0741818 × 108
Zeros1211
Zeros (%)12.1%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:57:12.575697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-36987000
5-th percentile0
Q171500
median323855
Q31530425
95-th percentile16403816
Maximum5.0741818 × 108
Range5.4440518 × 108
Interquartile range (IQR)1458925

Descriptive statistics

Standard deviation13232654
Coefficient of variation (CV)3.8732705
Kurtosis388.6578
Mean3416403.4
Median Absolute Deviation (MAD)323455
Skewness15.263924
Sum3.4164034 × 1010
Variance1.7510314 × 1014
MonotonicityNot monotonic
2024-05-11T06:57:13.152372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1211
 
12.1%
200000 89
 
0.9%
100000 74
 
0.7%
300000 71
 
0.7%
150000 44
 
0.4%
50000 40
 
0.4%
400000 38
 
0.4%
30000 35
 
0.4%
250000 30
 
0.3%
500000 29
 
0.3%
Other values (6967) 8339
83.4%
ValueCountFrequency (%)
-36987000 1
< 0.1%
-6836203 1
< 0.1%
-4886510 1
< 0.1%
-2326410 1
< 0.1%
-1979100 1
< 0.1%
-884840 1
< 0.1%
-475320 1
< 0.1%
-3000 1
< 0.1%
-2800 1
< 0.1%
-1179 1
< 0.1%
ValueCountFrequency (%)
507418183 1
< 0.1%
343409020 1
< 0.1%
320850020 1
< 0.1%
302983304 1
< 0.1%
295376647 1
< 0.1%
289050222 1
< 0.1%
231171686 1
< 0.1%
198001930 1
< 0.1%
187091750 1
< 0.1%
154291170 1
< 0.1%

Interactions

2024-05-11T06:57:04.634151image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:57:13.670646image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.509
금액0.5091.000

Missing values

2024-05-11T06:57:05.055749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:57:05.393752image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
21438대림아파트201동A13079401소모품비20200439300
55222거여우방A13881601충당부채전입이자비용2020040
86229사당우성2단지A15681502수선유지비20200410539000
70964신길삼성A15005603교육비2020040
42820브라운스톤동선A13603702경비비2020047713000
52659래미안송파파인탑A13817001공동전기료2020047592196
89864방화12단지(중앙)A15777501보험료202004477960
31004성수롯데캐슬A13312302선거관리위원회운영비2020040
38023청담2차현대아파트A13510202광고료수익2020040
5768돈암코오롱하늘채아파트A10027227도서인쇄비202004264000
아파트명아파트코드비용명년월일금액
70080영등포삼환A15003801도서인쇄비202004222620
85976대방2차현대A15681104퇴직급여202004952310
4402옥수파크힐스아파트A10026748승강기운영비202004849420
87780염창롯데캐슬A15704015고용안정사업수익2020041720000
87116등촌IPARKA15703204승강기수익2020041905000
361백련산 해모로 아파트A10024947경비비20200419333500
92523목동성원A15805105경비비2020044525396
36999아크로힐스논현A13501006입주자대표회의운영비202004789500
85136동작상떼빌주상복합A15670001도서인쇄비202004280000
44302길음뉴타운푸르지오아파트2,3단지A13611007차량유지비202004205406