Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 23.58801595)Skewed
금액 has 1350 (13.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:50:34.140646
Analysis finished2024-05-11 06:50:36.546102
Duration2.41 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2148
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:50:37.114069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.4818
Min length2

Characters and Unicode

Total characters74818
Distinct characters432
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st row자양현대3차
2nd row구로현대
3rd row롯데캐슬골드
4th row도곡경남
5th row마곡수명산파크6단지
ValueCountFrequency (%)
아파트 203
 
1.9%
래미안 56
 
0.5%
e편한세상 34
 
0.3%
아이파크 34
 
0.3%
신반포 23
 
0.2%
sk뷰 21
 
0.2%
송파 21
 
0.2%
자이 17
 
0.2%
래미안밤섬리베뉴 17
 
0.2%
북한산 15
 
0.1%
Other values (2229) 10515
96.0%
2024-05-11T06:50:38.471485image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2742
 
3.7%
2723
 
3.6%
2623
 
3.5%
1822
 
2.4%
1668
 
2.2%
1644
 
2.2%
1516
 
2.0%
1412
 
1.9%
1370
 
1.8%
1366
 
1.8%
Other values (422) 55932
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 68618
91.7%
Decimal Number 3456
 
4.6%
Space Separator 1054
 
1.4%
Uppercase Letter 814
 
1.1%
Lowercase Letter 334
 
0.4%
Open Punctuation 141
 
0.2%
Close Punctuation 141
 
0.2%
Dash Punctuation 126
 
0.2%
Other Punctuation 125
 
0.2%
Letter Number 9
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2742
 
4.0%
2723
 
4.0%
2623
 
3.8%
1822
 
2.7%
1668
 
2.4%
1644
 
2.4%
1516
 
2.2%
1412
 
2.1%
1370
 
2.0%
1366
 
2.0%
Other values (377) 49732
72.5%
Uppercase Letter
ValueCountFrequency (%)
C 132
16.2%
S 128
15.7%
D 107
13.1%
M 107
13.1%
K 90
11.1%
E 46
 
5.7%
L 37
 
4.5%
H 28
 
3.4%
I 26
 
3.2%
G 25
 
3.1%
Other values (7) 88
10.8%
Lowercase Letter
ValueCountFrequency (%)
e 217
65.0%
l 22
 
6.6%
i 21
 
6.3%
s 17
 
5.1%
k 16
 
4.8%
v 13
 
3.9%
c 8
 
2.4%
h 5
 
1.5%
g 5
 
1.5%
a 5
 
1.5%
Decimal Number
ValueCountFrequency (%)
2 1032
29.9%
1 993
28.7%
3 495
14.3%
4 238
 
6.9%
5 189
 
5.5%
6 154
 
4.5%
7 111
 
3.2%
9 102
 
3.0%
8 81
 
2.3%
0 61
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 103
82.4%
. 22
 
17.6%
Space Separator
ValueCountFrequency (%)
1054
100.0%
Open Punctuation
ValueCountFrequency (%)
( 141
100.0%
Close Punctuation
ValueCountFrequency (%)
) 141
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 126
100.0%
Letter Number
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 68618
91.7%
Common 5043
 
6.7%
Latin 1157
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2742
 
4.0%
2723
 
4.0%
2623
 
3.8%
1822
 
2.7%
1668
 
2.4%
1644
 
2.4%
1516
 
2.2%
1412
 
2.1%
1370
 
2.0%
1366
 
2.0%
Other values (377) 49732
72.5%
Latin
ValueCountFrequency (%)
e 217
18.8%
C 132
11.4%
S 128
11.1%
D 107
9.2%
M 107
9.2%
K 90
 
7.8%
E 46
 
4.0%
L 37
 
3.2%
H 28
 
2.4%
I 26
 
2.2%
Other values (19) 239
20.7%
Common
ValueCountFrequency (%)
1054
20.9%
2 1032
20.5%
1 993
19.7%
3 495
9.8%
4 238
 
4.7%
5 189
 
3.7%
6 154
 
3.1%
( 141
 
2.8%
) 141
 
2.8%
- 126
 
2.5%
Other values (6) 480
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 68618
91.7%
ASCII 6191
 
8.3%
Number Forms 9
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2742
 
4.0%
2723
 
4.0%
2623
 
3.8%
1822
 
2.7%
1668
 
2.4%
1644
 
2.4%
1516
 
2.2%
1412
 
2.1%
1370
 
2.0%
1366
 
2.0%
Other values (377) 49732
72.5%
ASCII
ValueCountFrequency (%)
1054
17.0%
2 1032
16.7%
1 993
16.0%
3 495
 
8.0%
4 238
 
3.8%
e 217
 
3.5%
5 189
 
3.1%
6 154
 
2.5%
( 141
 
2.3%
) 141
 
2.3%
Other values (34) 1537
24.8%
Number Forms
ValueCountFrequency (%)
9
100.0%
Distinct2152
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:50:39.419532image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st rowA14319204
2nd rowA15288004
3rd rowA13872502
4th rowA13527008
5th rowA15728002
ValueCountFrequency (%)
a13984603 13
 
0.1%
a13285404 13
 
0.1%
a15288002 13
 
0.1%
a41279920 12
 
0.1%
a15284906 12
 
0.1%
a15086601 12
 
0.1%
a10024725 11
 
0.1%
a10027346 11
 
0.1%
a13982604 11
 
0.1%
a15105303 11
 
0.1%
Other values (2142) 9881
98.8%
2024-05-11T06:50:40.874969image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18628
20.7%
1 17368
19.3%
A 10000
11.1%
3 8676
9.6%
2 8528
9.5%
5 6259
 
7.0%
8 5405
 
6.0%
7 4571
 
5.1%
4 4069
 
4.5%
6 3456
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18628
23.3%
1 17368
21.7%
3 8676
10.8%
2 8528
10.7%
5 6259
 
7.8%
8 5405
 
6.8%
7 4571
 
5.7%
4 4069
 
5.1%
6 3456
 
4.3%
9 3040
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18628
23.3%
1 17368
21.7%
3 8676
10.8%
2 8528
10.7%
5 6259
 
7.8%
8 5405
 
6.8%
7 4571
 
5.7%
4 4069
 
5.1%
6 3456
 
4.3%
9 3040
 
3.8%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18628
20.7%
1 17368
19.3%
A 10000
11.1%
3 8676
9.6%
2 8528
9.5%
5 6259
 
7.0%
8 5405
 
6.0%
7 4571
 
5.1%
4 4069
 
4.5%
6 3456
 
3.8%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:50:41.640156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8458
Min length2

Characters and Unicode

Total characters48458
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row제수당
2nd row세대전기료
3rd row위탁관리수수료
4th row교육비
5th row광고료수익
ValueCountFrequency (%)
교육비 245
 
2.5%
통신비 239
 
2.4%
경비비 226
 
2.3%
수선유지비 225
 
2.2%
승강기유지비 223
 
2.2%
청소비 221
 
2.2%
소독비 220
 
2.2%
도서인쇄비 214
 
2.1%
퇴직급여 208
 
2.1%
사무용품비 207
 
2.1%
Other values (76) 7772
77.7%
2024-05-11T06:50:43.209086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5471
 
11.3%
3520
 
7.3%
2116
 
4.4%
1899
 
3.9%
1361
 
2.8%
1286
 
2.7%
1062
 
2.2%
876
 
1.8%
845
 
1.7%
791
 
1.6%
Other values (110) 29231
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48458
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5471
 
11.3%
3520
 
7.3%
2116
 
4.4%
1899
 
3.9%
1361
 
2.8%
1286
 
2.7%
1062
 
2.2%
876
 
1.8%
845
 
1.7%
791
 
1.6%
Other values (110) 29231
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48458
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5471
 
11.3%
3520
 
7.3%
2116
 
4.4%
1899
 
3.9%
1361
 
2.8%
1286
 
2.7%
1062
 
2.2%
876
 
1.8%
845
 
1.7%
791
 
1.6%
Other values (110) 29231
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48458
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5471
 
11.3%
3520
 
7.3%
2116
 
4.4%
1899
 
3.9%
1361
 
2.8%
1286
 
2.7%
1062
 
2.2%
876
 
1.8%
845
 
1.7%
791
 
1.6%
Other values (110) 29231
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202312
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202312
2nd row202312
3rd row202312
4th row202312
5th row202312

Common Values

ValueCountFrequency (%)
202312 10000
100.0%

Length

2024-05-11T06:50:43.862714image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:50:44.368528image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202312 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7169
Distinct (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5086888.5
Minimum-19176000
Maximum1.195329 × 109
Zeros1350
Zeros (%)13.5%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:50:44.882665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-19176000
5-th percentile0
Q164190
median330000
Q31584625
95-th percentile21950530
Maximum1.195329 × 109
Range1.214505 × 109
Interquartile range (IQR)1520435

Descriptive statistics

Standard deviation27113709
Coefficient of variation (CV)5.3301165
Kurtosis843.96062
Mean5086888.5
Median Absolute Deviation (MAD)330000
Skewness23.588016
Sum5.0868885 × 1010
Variance7.351532 × 1014
MonotonicityNot monotonic
2024-05-11T06:50:45.482249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1350
 
13.5%
200000 77
 
0.8%
300000 53
 
0.5%
100000 45
 
0.4%
50000 44
 
0.4%
400000 38
 
0.4%
250000 35
 
0.4%
150000 30
 
0.3%
110000 27
 
0.3%
30000 26
 
0.3%
Other values (7159) 8275
82.8%
ValueCountFrequency (%)
-19176000 1
< 0.1%
-6978000 1
< 0.1%
-1327814 1
< 0.1%
-1112310 1
< 0.1%
-614770 1
< 0.1%
-78000 1
< 0.1%
-59970 1
< 0.1%
-49800 1
< 0.1%
-48000 1
< 0.1%
-37000 1
< 0.1%
ValueCountFrequency (%)
1195328980 1
< 0.1%
1162954639 1
< 0.1%
819887410 1
< 0.1%
591668690 1
< 0.1%
469506502 1
< 0.1%
402805910 1
< 0.1%
401576030 1
< 0.1%
351480460 1
< 0.1%
351320680 1
< 0.1%
332080441 1
< 0.1%

Interactions

2024-05-11T06:50:35.400372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:50:45.804382image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.441
금액0.4411.000

Missing values

2024-05-11T06:50:35.884881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:50:36.317629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
72161자양현대3차A14319204제수당2023121828700
84379구로현대A15288004세대전기료20231225898450
58151롯데캐슬골드A13872502위탁관리수수료2023124095905
44169도곡경남A13527008교육비2023120
92221마곡수명산파크6단지A15728002광고료수익202312295920
15453효성주얼리시티아파트A11041001도서인쇄비202312206000
10044래미안 프리미어 팰리스A10026720지급수수료2023120
65737상계수락파크빌A13983810소독비202312270000
44858개포1차2차우성A13528105피복비202312355120
82000오류금호어울림A15210102피복비2023120
아파트명아파트코드비용명년월일금액
70313번동기산그린A14206305기타부대비20231217950
8551용산kcc스위첸A10025898감가상각비202312441500
79721봉천두산1,2단지A15106901국민연금202312654180
81353고척한일유앤아이A15208204제수당2023121691360
13257DMC파크뷰자이아파트A10027817세금과공과202312893490
33415방학명품ESA1단지A13285404국민연금202312421890
73240광장삼성1,2차A14381506소독비202312273000
74202여의도삼부A15001020재활용품수익2023120
54806우면동동고A13790003공동전기료2023121401530
75117신길삼성래미안A15005402소모품비202312704480