Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2126 (21.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:00:26.224003
Analysis finished2024-05-11 06:00:27.356043
Duration1.13 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2183
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:27.554652image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length21
Mean length7.2174
Min length2

Characters and Unicode

Total characters72174
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st row왕십리텐즈힐2구역214동
2nd row문래미원아파트
3rd row공릉태릉우성
4th row방화삼성
5th row휘경주공2단지
ValueCountFrequency (%)
아파트 112
 
1.1%
래미안 33
 
0.3%
신동아파밀리에 17
 
0.2%
아이파크 16
 
0.2%
북한산 16
 
0.2%
힐스테이트 15
 
0.1%
창동주공2단지 15
 
0.1%
sk뷰 13
 
0.1%
e편한세상 13
 
0.1%
중계성원2차 13
 
0.1%
Other values (2246) 10295
97.5%
2024-05-11T15:00:28.140163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2398
 
3.3%
2308
 
3.2%
2087
 
2.9%
1872
 
2.6%
1797
 
2.5%
1700
 
2.4%
1524
 
2.1%
1464
 
2.0%
1463
 
2.0%
1355
 
1.9%
Other values (421) 54206
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66204
91.7%
Decimal Number 3754
 
5.2%
Uppercase Letter 743
 
1.0%
Space Separator 617
 
0.9%
Lowercase Letter 318
 
0.4%
Close Punctuation 140
 
0.2%
Open Punctuation 140
 
0.2%
Dash Punctuation 138
 
0.2%
Other Punctuation 110
 
0.2%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2398
 
3.6%
2308
 
3.5%
2087
 
3.2%
1872
 
2.8%
1797
 
2.7%
1700
 
2.6%
1524
 
2.3%
1464
 
2.2%
1463
 
2.2%
1355
 
2.0%
Other values (375) 48236
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 138
18.6%
K 98
13.2%
C 81
10.9%
L 61
8.2%
D 50
 
6.7%
M 50
 
6.7%
H 49
 
6.6%
E 44
 
5.9%
I 40
 
5.4%
G 32
 
4.3%
Other values (7) 100
13.5%
Lowercase Letter
ValueCountFrequency (%)
e 189
59.4%
l 30
 
9.4%
i 26
 
8.2%
v 19
 
6.0%
s 16
 
5.0%
k 15
 
4.7%
w 9
 
2.8%
c 6
 
1.9%
h 4
 
1.3%
a 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 1186
31.6%
2 1089
29.0%
3 481
12.8%
4 262
 
7.0%
5 202
 
5.4%
6 157
 
4.2%
7 113
 
3.0%
9 95
 
2.5%
8 94
 
2.5%
0 75
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 83
75.5%
. 27
 
24.5%
Space Separator
ValueCountFrequency (%)
617
100.0%
Close Punctuation
ValueCountFrequency (%)
) 140
100.0%
Open Punctuation
ValueCountFrequency (%)
( 140
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 138
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66204
91.7%
Common 4904
 
6.8%
Latin 1066
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2398
 
3.6%
2308
 
3.5%
2087
 
3.2%
1872
 
2.8%
1797
 
2.7%
1700
 
2.6%
1524
 
2.3%
1464
 
2.2%
1463
 
2.2%
1355
 
2.0%
Other values (375) 48236
72.9%
Latin
ValueCountFrequency (%)
e 189
17.7%
S 138
12.9%
K 98
 
9.2%
C 81
 
7.6%
L 61
 
5.7%
D 50
 
4.7%
M 50
 
4.7%
H 49
 
4.6%
E 44
 
4.1%
I 40
 
3.8%
Other values (19) 266
25.0%
Common
ValueCountFrequency (%)
1 1186
24.2%
2 1089
22.2%
617
12.6%
3 481
9.8%
4 262
 
5.3%
5 202
 
4.1%
6 157
 
3.2%
) 140
 
2.9%
( 140
 
2.9%
- 138
 
2.8%
Other values (7) 492
10.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66204
91.7%
ASCII 5965
 
8.3%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2398
 
3.6%
2308
 
3.5%
2087
 
3.2%
1872
 
2.8%
1797
 
2.7%
1700
 
2.6%
1524
 
2.3%
1464
 
2.2%
1463
 
2.2%
1355
 
2.0%
Other values (375) 48236
72.9%
ASCII
ValueCountFrequency (%)
1 1186
19.9%
2 1089
18.3%
617
10.3%
3 481
 
8.1%
4 262
 
4.4%
5 202
 
3.4%
e 189
 
3.2%
6 157
 
2.6%
) 140
 
2.3%
( 140
 
2.3%
Other values (35) 1502
25.2%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2190
Distinct (%)21.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:28.647455image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st rowA13373302
2nd rowA15009601
3rd rowA13980009
4th rowA15722001
5th rowA13087407
ValueCountFrequency (%)
a13204508 15
 
0.1%
a13986701 13
 
0.1%
a11081503 13
 
0.1%
a13676103 12
 
0.1%
a15210209 12
 
0.1%
a13377901 11
 
0.1%
a13770607 11
 
0.1%
a15883202 11
 
0.1%
a14272313 11
 
0.1%
a14320002 11
 
0.1%
Other values (2180) 9880
98.8%
2024-05-11T15:00:29.362523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18262
20.3%
1 17712
19.7%
A 9991
11.1%
3 8784
9.8%
2 8304
9.2%
5 6251
 
6.9%
8 5740
 
6.4%
7 4865
 
5.4%
4 3797
 
4.2%
6 3256
 
3.6%
Other values (2) 3038
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18262
22.8%
1 17712
22.1%
3 8784
11.0%
2 8304
10.4%
5 6251
 
7.8%
8 5740
 
7.2%
7 4865
 
6.1%
4 3797
 
4.7%
6 3256
 
4.1%
9 3029
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9991
99.9%
B 9
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18262
22.8%
1 17712
22.1%
3 8784
11.0%
2 8304
10.4%
5 6251
 
7.8%
8 5740
 
7.2%
7 4865
 
6.1%
4 3797
 
4.7%
6 3256
 
4.1%
9 3029
 
3.8%
Latin
ValueCountFrequency (%)
A 9991
99.9%
B 9
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18262
20.3%
1 17712
19.7%
A 9991
11.1%
3 8784
9.8%
2 8304
9.2%
5 6251
 
6.9%
8 5740
 
6.4%
7 4865
 
5.4%
4 3797
 
4.2%
6 3256
 
3.6%
Other values (2) 3038
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T15:00:29.665366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9073
Min length2

Characters and Unicode

Total characters59073
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row선급비용
2nd row관리비예치금
3rd row미처분이익잉여금
4th row주차장충당부채
5th row기타의비유동부채
ValueCountFrequency (%)
당기순이익 341
 
3.4%
예금 325
 
3.2%
비품 322
 
3.2%
퇴직급여충당부채 316
 
3.2%
선급비용 315
 
3.1%
현금 304
 
3.0%
예수금 304
 
3.0%
미부과관리비 303
 
3.0%
미처분이익잉여금 301
 
3.0%
관리비미수금 298
 
3.0%
Other values (67) 6871
68.7%
2024-05-11T15:00:30.131822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4686
 
7.9%
3730
 
6.3%
3180
 
5.4%
3032
 
5.1%
2990
 
5.1%
2927
 
5.0%
2611
 
4.4%
2370
 
4.0%
1925
 
3.3%
1741
 
2.9%
Other values (97) 29881
50.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59073
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4686
 
7.9%
3730
 
6.3%
3180
 
5.4%
3032
 
5.1%
2990
 
5.1%
2927
 
5.0%
2611
 
4.4%
2370
 
4.0%
1925
 
3.3%
1741
 
2.9%
Other values (97) 29881
50.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59073
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4686
 
7.9%
3730
 
6.3%
3180
 
5.4%
3032
 
5.1%
2990
 
5.1%
2927
 
5.0%
2611
 
4.4%
2370
 
4.0%
1925
 
3.3%
1741
 
2.9%
Other values (97) 29881
50.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59073
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4686
 
7.9%
3730
 
6.3%
3180
 
5.4%
3032
 
5.1%
2990
 
5.1%
2927
 
5.0%
2611
 
4.4%
2370
 
4.0%
1925
 
3.3%
1741
 
2.9%
Other values (97) 29881
50.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202004
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202004
2nd row202004
3rd row202004
4th row202004
5th row202004

Common Values

ValueCountFrequency (%)
202004 10000
100.0%

Length

2024-05-11T15:00:30.348377image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T15:00:30.503641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202004 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7528
Distinct (%)75.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69944404
Minimum-8.1896599 × 108
Maximum5.1882838 × 109
Zeros2126
Zeros (%)21.3%
Negative326
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T15:00:30.642977image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-8.1896599 × 108
5-th percentile0
Q12987.5
median3469435
Q336298634
95-th percentile3.4628359 × 108
Maximum5.1882838 × 109
Range6.0072498 × 109
Interquartile range (IQR)36295646

Descriptive statistics

Standard deviation2.4504074 × 108
Coefficient of variation (CV)3.5033645
Kurtosis101.14417
Mean69944404
Median Absolute Deviation (MAD)3469435
Skewness8.3687223
Sum6.9944404 × 1011
Variance6.0044964 × 1016
MonotonicityNot monotonic
2024-05-11T15:00:30.871510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2126
 
21.3%
500000 30
 
0.3%
250000 22
 
0.2%
300000 16
 
0.2%
10000000 13
 
0.1%
30000000 13
 
0.1%
3000000 12
 
0.1%
200000 12
 
0.1%
484000 12
 
0.1%
242000 11
 
0.1%
Other values (7518) 7733
77.3%
ValueCountFrequency (%)
-818965991 1
< 0.1%
-492888411 1
< 0.1%
-292192150 1
< 0.1%
-240875890 1
< 0.1%
-239487120 1
< 0.1%
-189742270 1
< 0.1%
-166397700 1
< 0.1%
-156075187 1
< 0.1%
-141551042 1
< 0.1%
-109974210 1
< 0.1%
ValueCountFrequency (%)
5188283800 2
< 0.1%
3990701541 1
< 0.1%
3628671451 1
< 0.1%
3612590323 1
< 0.1%
3419272874 1
< 0.1%
3361695523 1
< 0.1%
3295996232 2
< 0.1%
3243018406 1
< 0.1%
3158249491 1
< 0.1%
3029920900 1
< 0.1%

Interactions

2024-05-11T15:00:26.909970image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T15:00:31.034506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.504
금액0.5041.000

Missing values

2024-05-11T15:00:27.119792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T15:00:27.288749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
21794왕십리텐즈힐2구역214동A13373302선급비용2020042311490
50309문래미원아파트A15009601관리비예치금20200421728000
41325공릉태릉우성A13980009미처분이익잉여금2020040
62334방화삼성A15722001주차장충당부채2020043750619
15353휘경주공2단지A13087407기타의비유동부채2020040
64200등촌태진아름A15784402미부과관리비20200440204230
53366관악국제산장A15176701현금202004446290
36436마천우방A13812004기타공동주택관리비충당부채2020040
19370창동주공2단지A13204508선수수익2020040
25001길동우성2차A13481305경비비충당부채20200426060270
아파트명아파트코드비용명년월일금액
41404공릉우방4단지A13980012미지급비용202004131797360
24440강일리버파크9단지A13410007공동주택적립금20200461053413
32114삼선푸르지오아파트A13672101미지급금2020040
58776래미안상도3차A15603006미수금202004668140
55735신개봉삼환A15280602기타충당부채2020040
1851DMC센트럴아이파크 관리사무소A10025976수선유지비충당부채20200416307340
47362현대성우A14281701미수수익2020049520
9844마포래미안푸르지오A12175203단기보증금2020048076915
27375일원목련타운A13523005비품20200441909225
23546천호우성A13402103승강기유지비충당부채2020044500500