Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1453 (14.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:49:01.330895
Analysis finished2024-05-11 06:49:03.634932
Duration2.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2197
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:03.986552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.373
Min length2

Characters and Unicode

Total characters73730
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique129 ?
Unique (%)1.3%

Sample

1st row롯데캐슬클라시아
2nd row마포현대아파트
3rd row동부아파트
4th row마곡엠밸리4단지
5th row신내우남푸르미아
ValueCountFrequency (%)
아파트 215
 
2.0%
래미안 54
 
0.5%
아이파크 33
 
0.3%
e편한세상 25
 
0.2%
sk뷰 18
 
0.2%
힐스테이트 16
 
0.1%
고덕 16
 
0.1%
롯데캐슬아파트 15
 
0.1%
송파 15
 
0.1%
서울숲힐스테이트 14
 
0.1%
Other values (2285) 10541
96.2%
2024-05-11T06:49:05.210953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2666
 
3.6%
2619
 
3.6%
2539
 
3.4%
1666
 
2.3%
1652
 
2.2%
1645
 
2.2%
1473
 
2.0%
1465
 
2.0%
1438
 
2.0%
1268
 
1.7%
Other values (419) 55299
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67527
91.6%
Decimal Number 3331
 
4.5%
Space Separator 1050
 
1.4%
Uppercase Letter 941
 
1.3%
Lowercase Letter 334
 
0.5%
Open Punctuation 159
 
0.2%
Close Punctuation 159
 
0.2%
Dash Punctuation 118
 
0.2%
Other Punctuation 101
 
0.1%
Letter Number 10
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2666
 
3.9%
2619
 
3.9%
2539
 
3.8%
1666
 
2.5%
1652
 
2.4%
1645
 
2.4%
1473
 
2.2%
1465
 
2.2%
1438
 
2.1%
1268
 
1.9%
Other values (374) 49096
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 157
16.7%
C 145
15.4%
K 131
13.9%
M 101
10.7%
D 101
10.7%
L 46
 
4.9%
E 43
 
4.6%
I 41
 
4.4%
H 41
 
4.4%
V 33
 
3.5%
Other values (7) 102
10.8%
Lowercase Letter
ValueCountFrequency (%)
e 217
65.0%
i 26
 
7.8%
l 22
 
6.6%
v 17
 
5.1%
s 15
 
4.5%
k 13
 
3.9%
w 11
 
3.3%
g 4
 
1.2%
a 4
 
1.2%
h 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 1036
31.1%
1 953
28.6%
3 457
13.7%
4 234
 
7.0%
5 175
 
5.3%
6 129
 
3.9%
7 110
 
3.3%
8 104
 
3.1%
9 88
 
2.6%
0 45
 
1.4%
Other Punctuation
ValueCountFrequency (%)
, 81
80.2%
. 20
 
19.8%
Space Separator
ValueCountFrequency (%)
1050
100.0%
Open Punctuation
ValueCountFrequency (%)
( 159
100.0%
Close Punctuation
ValueCountFrequency (%)
) 159
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118
100.0%
Letter Number
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67527
91.6%
Common 4918
 
6.7%
Latin 1285
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2666
 
3.9%
2619
 
3.9%
2539
 
3.8%
1666
 
2.5%
1652
 
2.4%
1645
 
2.4%
1473
 
2.2%
1465
 
2.2%
1438
 
2.1%
1268
 
1.9%
Other values (374) 49096
72.7%
Latin
ValueCountFrequency (%)
e 217
16.9%
S 157
12.2%
C 145
11.3%
K 131
10.2%
M 101
 
7.9%
D 101
 
7.9%
L 46
 
3.6%
E 43
 
3.3%
I 41
 
3.2%
H 41
 
3.2%
Other values (19) 262
20.4%
Common
ValueCountFrequency (%)
1050
21.4%
2 1036
21.1%
1 953
19.4%
3 457
9.3%
4 234
 
4.8%
5 175
 
3.6%
( 159
 
3.2%
) 159
 
3.2%
6 129
 
2.6%
- 118
 
2.4%
Other values (6) 448
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67527
91.6%
ASCII 6193
 
8.4%
Number Forms 10
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2666
 
3.9%
2619
 
3.9%
2539
 
3.8%
1666
 
2.5%
1652
 
2.4%
1645
 
2.4%
1473
 
2.2%
1465
 
2.2%
1438
 
2.1%
1268
 
1.9%
Other values (374) 49096
72.7%
ASCII
ValueCountFrequency (%)
1050
17.0%
2 1036
16.7%
1 953
15.4%
3 457
 
7.4%
4 234
 
3.8%
e 217
 
3.5%
5 175
 
2.8%
( 159
 
2.6%
) 159
 
2.6%
S 157
 
2.5%
Other values (34) 1596
25.8%
Number Forms
ValueCountFrequency (%)
10
100.0%
Distinct2201
Distinct (%)22.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:06.244415image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique130 ?
Unique (%)1.3%

Sample

1st rowA10023926
2nd rowA12102005
3rd rowA13186401
4th rowA15721008
5th rowA13186502
ValueCountFrequency (%)
a13378001 14
 
0.1%
a13186907 14
 
0.1%
a13707203 13
 
0.1%
a13790902 12
 
0.1%
a14277601 12
 
0.1%
a10025410 12
 
0.1%
a12017001 12
 
0.1%
a10026053 12
 
0.1%
a13981606 12
 
0.1%
a14003101 11
 
0.1%
Other values (2191) 9876
98.8%
2024-05-11T06:49:08.064458image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18886
21.0%
1 17448
19.4%
A 10000
11.1%
3 8775
9.8%
2 8397
9.3%
5 6167
 
6.9%
8 5462
 
6.1%
7 4596
 
5.1%
4 3942
 
4.4%
6 3525
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18886
23.6%
1 17448
21.8%
3 8775
11.0%
2 8397
10.5%
5 6167
 
7.7%
8 5462
 
6.8%
7 4596
 
5.7%
4 3942
 
4.9%
6 3525
 
4.4%
9 2802
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18886
23.6%
1 17448
21.8%
3 8775
11.0%
2 8397
10.5%
5 6167
 
7.7%
8 5462
 
6.8%
7 4596
 
5.7%
4 3942
 
4.9%
6 3525
 
4.4%
9 2802
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18886
21.0%
1 17448
19.4%
A 10000
11.1%
3 8775
9.8%
2 8397
9.3%
5 6167
 
6.9%
8 5462
 
6.1%
7 4596
 
5.1%
4 3942
 
4.4%
6 3525
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:08.796003image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8086
Min length2

Characters and Unicode

Total characters48086
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row피복비
2nd row건강보험료
3rd row연차수당
4th row산재보험료
5th row음식물처리비
ValueCountFrequency (%)
경비비 239
 
2.4%
수선유지비 234
 
2.3%
교육비 226
 
2.3%
청소비 223
 
2.2%
세대전기료 222
 
2.2%
통신비 218
 
2.2%
이자수익 216
 
2.2%
급여 214
 
2.1%
사무용품비 213
 
2.1%
도서인쇄비 211
 
2.1%
Other values (76) 7784
77.8%
2024-05-11T06:49:10.492574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5421
 
11.3%
3631
 
7.6%
2152
 
4.5%
1972
 
4.1%
1341
 
2.8%
1315
 
2.7%
1102
 
2.3%
917
 
1.9%
781
 
1.6%
748
 
1.6%
Other values (110) 28706
59.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48086
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5421
 
11.3%
3631
 
7.6%
2152
 
4.5%
1972
 
4.1%
1341
 
2.8%
1315
 
2.7%
1102
 
2.3%
917
 
1.9%
781
 
1.6%
748
 
1.6%
Other values (110) 28706
59.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48086
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5421
 
11.3%
3631
 
7.6%
2152
 
4.5%
1972
 
4.1%
1341
 
2.8%
1315
 
2.7%
1102
 
2.3%
917
 
1.9%
781
 
1.6%
748
 
1.6%
Other values (110) 28706
59.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48086
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5421
 
11.3%
3631
 
7.6%
2152
 
4.5%
1972
 
4.1%
1341
 
2.8%
1315
 
2.7%
1102
 
2.3%
917
 
1.9%
781
 
1.6%
748
 
1.6%
Other values (110) 28706
59.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202308
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202308
2nd row202308
3rd row202308
4th row202308
5th row202308

Common Values

ValueCountFrequency (%)
202308 10000
100.0%

Length

2024-05-11T06:49:10.989356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:49:11.325860image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202308 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6891
Distinct (%)68.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4132252.9
Minimum-2328664
Maximum5.1588962 × 108
Zeros1453
Zeros (%)14.5%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:49:11.758156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2328664
5-th percentile0
Q160000
median297830
Q31463550
95-th percentile19677447
Maximum5.1588962 × 108
Range5.1821828 × 108
Interquartile range (IQR)1403550

Descriptive statistics

Standard deviation16668182
Coefficient of variation (CV)4.0336792
Kurtosis221.60543
Mean4132252.9
Median Absolute Deviation (MAD)297830
Skewness12.033191
Sum4.1322529 × 1010
Variance2.778283 × 1014
MonotonicityNot monotonic
2024-05-11T06:49:12.664094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1453
 
14.5%
62500 110
 
1.1%
200000 70
 
0.7%
300000 66
 
0.7%
35000 65
 
0.7%
100000 49
 
0.5%
150000 40
 
0.4%
400000 35
 
0.4%
220000 35
 
0.4%
30000 31
 
0.3%
Other values (6881) 8046
80.5%
ValueCountFrequency (%)
-2328664 1
 
< 0.1%
-1257570 1
 
< 0.1%
-375000 1
 
< 0.1%
-175790 1
 
< 0.1%
-28007 1
 
< 0.1%
-12491 1
 
< 0.1%
-272 1
 
< 0.1%
-20 1
 
< 0.1%
0 1453
14.5%
1 2
 
< 0.1%
ValueCountFrequency (%)
515889620 1
< 0.1%
428097949 1
< 0.1%
303473835 1
< 0.1%
300793871 1
< 0.1%
290649575 1
< 0.1%
284756280 1
< 0.1%
255498604 1
< 0.1%
246890800 1
< 0.1%
245615325 1
< 0.1%
242061864 1
< 0.1%

Interactions

2024-05-11T06:49:02.693529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:49:13.128550image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.431
금액0.4311.000

Missing values

2024-05-11T06:49:03.093768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:49:03.456559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
1546롯데캐슬클라시아A10023926피복비202308437490
18486마포현대아파트A12102005건강보험료202308578390
30015동부아파트A13186401연차수당202308584480
92811마곡엠밸리4단지A15721008산재보험료202308147580
30051신내우남푸르미아A13186502음식물처리비202308321260
36041성수금호3차A13311101소독비202308140000
94971등촌임광A15783701세대전기료20230815449706
34619창동한신A13292002세대전기료20230815751567
7529래미안블레스티지A10025675도서인쇄비202308544500
51619길음뉴타운7단지A13679403검침수익202308193070
아파트명아파트코드비용명년월일금액
96622화곡푸르지오A15792602청소비20230840599770
34098쌍문한양2,3,4차A13286110경비비20230856770290
96139염창관음삼성A15786321교통비2023083000
23067신사한신휴플러스A12208103장기수선비2023082032680
30641신내10단지A13187306청소비20230812117310
72597한강우성아파트A14319010광고료수익20230860000
42146역삼아이파크A13508009사무용품비202308351870
47850돈암포스코더샵A13606002보험료202308120380
53364반포리체A13776301복리후생비202308140000
30292신내라이프미성A13186905잡수익2023083333680