Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 655 (6.6%) zerosZeros

Reproduction

Analysis started2024-05-11 06:47:56.571574
Analysis finished2024-05-11 06:47:58.808777
Duration2.24 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2255
Distinct (%)22.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:47:59.356801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.4462
Min length2

Characters and Unicode

Total characters74462
Distinct characters434
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique153 ?
Unique (%)1.5%

Sample

1st row송파레미니스2단지
2nd row화곡2차보람
3rd row강동 리버스트 7단지 아파트
4th row성내삼성
5th row압구정현대아파트
ValueCountFrequency (%)
아파트 215
 
2.0%
래미안 49
 
0.4%
e편한세상 34
 
0.3%
sk뷰 29
 
0.3%
아이파크 27
 
0.2%
고덕 20
 
0.2%
신반포 19
 
0.2%
힐스테이트 19
 
0.2%
해모로 18
 
0.2%
푸르지오 17
 
0.2%
Other values (2341) 10542
95.9%
2024-05-11T06:48:01.152672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2692
 
3.6%
2650
 
3.6%
2491
 
3.3%
1830
 
2.5%
1647
 
2.2%
1638
 
2.2%
1504
 
2.0%
1459
 
2.0%
1424
 
1.9%
1421
 
1.9%
Other values (424) 55706
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 68071
91.4%
Decimal Number 3577
 
4.8%
Space Separator 1086
 
1.5%
Uppercase Letter 888
 
1.2%
Lowercase Letter 305
 
0.4%
Close Punctuation 148
 
0.2%
Open Punctuation 148
 
0.2%
Dash Punctuation 131
 
0.2%
Other Punctuation 103
 
0.1%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2692
 
4.0%
2650
 
3.9%
2491
 
3.7%
1830
 
2.7%
1647
 
2.4%
1638
 
2.4%
1504
 
2.2%
1459
 
2.1%
1424
 
2.1%
1421
 
2.1%
Other values (379) 49315
72.4%
Uppercase Letter
ValueCountFrequency (%)
S 147
16.6%
C 128
14.4%
K 101
11.4%
M 90
10.1%
D 90
10.1%
L 51
 
5.7%
E 49
 
5.5%
I 46
 
5.2%
H 46
 
5.2%
V 34
 
3.8%
Other values (7) 106
11.9%
Lowercase Letter
ValueCountFrequency (%)
e 185
60.7%
l 26
 
8.5%
i 22
 
7.2%
k 20
 
6.6%
s 19
 
6.2%
v 17
 
5.6%
w 7
 
2.3%
c 4
 
1.3%
a 2
 
0.7%
g 2
 
0.7%
Decimal Number
ValueCountFrequency (%)
1 1071
29.9%
2 1035
28.9%
3 463
12.9%
4 251
 
7.0%
5 225
 
6.3%
6 154
 
4.3%
7 127
 
3.6%
9 99
 
2.8%
8 92
 
2.6%
0 60
 
1.7%
Other Punctuation
ValueCountFrequency (%)
, 86
83.5%
. 17
 
16.5%
Space Separator
ValueCountFrequency (%)
1086
100.0%
Close Punctuation
ValueCountFrequency (%)
) 148
100.0%
Open Punctuation
ValueCountFrequency (%)
( 148
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 131
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 68071
91.4%
Common 5193
 
7.0%
Latin 1198
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2692
 
4.0%
2650
 
3.9%
2491
 
3.7%
1830
 
2.7%
1647
 
2.4%
1638
 
2.4%
1504
 
2.2%
1459
 
2.1%
1424
 
2.1%
1421
 
2.1%
Other values (379) 49315
72.4%
Latin
ValueCountFrequency (%)
e 185
15.4%
S 147
12.3%
C 128
10.7%
K 101
 
8.4%
M 90
 
7.5%
D 90
 
7.5%
L 51
 
4.3%
E 49
 
4.1%
I 46
 
3.8%
H 46
 
3.8%
Other values (19) 265
22.1%
Common
ValueCountFrequency (%)
1086
20.9%
1 1071
20.6%
2 1035
19.9%
3 463
8.9%
4 251
 
4.8%
5 225
 
4.3%
6 154
 
3.0%
) 148
 
2.8%
( 148
 
2.8%
- 131
 
2.5%
Other values (6) 481
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 68071
91.4%
ASCII 6386
 
8.6%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2692
 
4.0%
2650
 
3.9%
2491
 
3.7%
1830
 
2.7%
1647
 
2.4%
1638
 
2.4%
1504
 
2.2%
1459
 
2.1%
1424
 
2.1%
1421
 
2.1%
Other values (379) 49315
72.4%
ASCII
ValueCountFrequency (%)
1086
17.0%
1 1071
16.8%
2 1035
16.2%
3 463
 
7.3%
4 251
 
3.9%
5 225
 
3.5%
e 185
 
2.9%
6 154
 
2.4%
) 148
 
2.3%
( 148
 
2.3%
Other values (34) 1620
25.4%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2260
Distinct (%)22.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:02.542984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique155 ?
Unique (%)1.6%

Sample

1st rowA10026249
2nd rowA15770101
3rd rowA10024421
4th rowA13403101
5th rowA13589802
ValueCountFrequency (%)
a15701602 12
 
0.1%
a13981405 12
 
0.1%
a15180705 12
 
0.1%
a13778204 12
 
0.1%
a13383003 12
 
0.1%
a13592605 12
 
0.1%
a13876108 12
 
0.1%
a13817101 11
 
0.1%
a13202103 11
 
0.1%
a13986004 11
 
0.1%
Other values (2250) 9883
98.8%
2024-05-11T06:48:04.023287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18576
20.6%
1 17351
19.3%
A 9985
11.1%
3 8919
9.9%
2 8254
9.2%
5 6224
 
6.9%
8 5639
 
6.3%
7 4595
 
5.1%
4 4040
 
4.5%
6 3357
 
3.7%
Other values (2) 3060
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18576
23.2%
1 17351
21.7%
3 8919
11.1%
2 8254
10.3%
5 6224
 
7.8%
8 5639
 
7.0%
7 4595
 
5.7%
4 4040
 
5.1%
6 3357
 
4.2%
9 3045
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9985
99.9%
B 15
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18576
23.2%
1 17351
21.7%
3 8919
11.1%
2 8254
10.3%
5 6224
 
7.8%
8 5639
 
7.0%
7 4595
 
5.7%
4 4040
 
5.1%
6 3357
 
4.2%
9 3045
 
3.8%
Latin
ValueCountFrequency (%)
A 9985
99.9%
B 15
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18576
20.6%
1 17351
19.3%
A 9985
11.1%
3 8919
9.9%
2 8254
9.2%
5 6224
 
6.9%
8 5639
 
6.3%
7 4595
 
5.1%
4 4040
 
4.5%
6 3357
 
3.7%
Other values (2) 3060
 
3.4%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:04.772332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7833
Min length2

Characters and Unicode

Total characters47833
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row잡수익
2nd row건강보험료
3rd row청소비
4th row청소비
5th row세금과공과
ValueCountFrequency (%)
승강기유지비 251
 
2.5%
사무용품비 246
 
2.5%
교육비 244
 
2.4%
도서인쇄비 236
 
2.4%
입주자대표회의운영비 234
 
2.3%
연체료수익 234
 
2.3%
급여 233
 
2.3%
청소비 227
 
2.3%
건강보험료 226
 
2.3%
퇴직급여 225
 
2.2%
Other values (76) 7644
76.4%
2024-05-11T06:48:06.092784image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5353
 
11.2%
3573
 
7.5%
2259
 
4.7%
1903
 
4.0%
1393
 
2.9%
1274
 
2.7%
1114
 
2.3%
914
 
1.9%
886
 
1.9%
857
 
1.8%
Other values (110) 28307
59.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47833
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5353
 
11.2%
3573
 
7.5%
2259
 
4.7%
1903
 
4.0%
1393
 
2.9%
1274
 
2.7%
1114
 
2.3%
914
 
1.9%
886
 
1.9%
857
 
1.8%
Other values (110) 28307
59.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47833
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5353
 
11.2%
3573
 
7.5%
2259
 
4.7%
1903
 
4.0%
1393
 
2.9%
1274
 
2.7%
1114
 
2.3%
914
 
1.9%
886
 
1.9%
857
 
1.8%
Other values (110) 28307
59.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47833
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5353
 
11.2%
3573
 
7.5%
2259
 
4.7%
1903
 
4.0%
1393
 
2.9%
1274
 
2.7%
1114
 
2.3%
914
 
1.9%
886
 
1.9%
857
 
1.8%
Other values (110) 28307
59.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202303
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202303
2nd row202303
3rd row202303
4th row202303
5th row202303

Common Values

ValueCountFrequency (%)
202303 10000
100.0%

Length

2024-05-11T06:48:06.591876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:48:06.875437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202303 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7540
Distinct (%)75.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3878402.7
Minimum-2325450
Maximum6.1211066 × 108
Zeros655
Zeros (%)6.6%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:48:07.366476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2325450
5-th percentile0
Q197277.5
median351615
Q31604109.5
95-th percentile20632585
Maximum6.1211066 × 108
Range6.1443611 × 108
Interquartile range (IQR)1506832

Descriptive statistics

Standard deviation13762696
Coefficient of variation (CV)3.5485475
Kurtosis445.20392
Mean3878402.7
Median Absolute Deviation (MAD)333100
Skewness14.553384
Sum3.8784027 × 1010
Variance1.894118 × 1014
MonotonicityNot monotonic
2024-05-11T06:48:08.071519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 655
 
6.6%
35000 110
 
1.1%
200000 89
 
0.9%
100000 67
 
0.7%
150000 55
 
0.5%
300000 54
 
0.5%
400000 48
 
0.5%
50000 38
 
0.4%
30000 31
 
0.3%
220000 28
 
0.3%
Other values (7530) 8825
88.2%
ValueCountFrequency (%)
-2325450 1
 
< 0.1%
-1620650 1
 
< 0.1%
-1029560 1
 
< 0.1%
-738950 1
 
< 0.1%
-40000 1
 
< 0.1%
-602 1
 
< 0.1%
-71 1
 
< 0.1%
0 655
6.6%
1 1
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
612110660 1
< 0.1%
236217820 1
< 0.1%
220237377 1
< 0.1%
216071187 1
< 0.1%
214661050 1
< 0.1%
207239120 1
< 0.1%
204960771 1
< 0.1%
200714090 1
< 0.1%
185724820 1
< 0.1%
180963207 1
< 0.1%

Interactions

2024-05-11T06:47:57.826952image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:48:08.394690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.310
금액0.3101.000

Missing values

2024-05-11T06:47:58.290571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:47:58.677913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
8415송파레미니스2단지A10026249잡수익202303700
88293화곡2차보람A15770101건강보험료202303209990
2744강동 리버스트 7단지 아파트A10024421청소비20230313447886
36380성내삼성A13403101청소비20230316260386
43447압구정현대아파트A13589802세금과공과2023030
2575상도역롯데캐슬파크엘A10024365검침수익202303400000
12465강동역신동아파밀리에A10027948주차장수익2023031100000
35830한신무학A13385705알뜰시장수익202303480000
52259신반포4차A13790828연차수당2023031127170
40601강남엘에이치1단지A13519007세대전기료20230347525410
아파트명아파트코드비용명년월일금액
46980종암2차아이파크A13671204교육비202303190000
27517묵동신안1차A13185507수선유지비2023031141640
89469한사랑2차삼성아파트(등촌동)A15783907재활용품수익202303135600
31293방학삼성래미안1단지A13285406소독비202303350000
21870DMC자이1단지A12275501임대료수익2023031679091
34313행당한진타운A13377703광고료수익202303850000
61132상계한신3차A13982002급여20230312734790
56555풍납신성노바빌A13887301건강보험료202303430100
83436한강쌍용A15606005주차장수익202303797920
82290신대방현대A15601105소독비202303590000