Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1135 (11.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:54:25.752455
Analysis finished2024-05-11 06:54:27.487139
Duration1.73 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2171
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:27.693292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.261
Min length2

Characters and Unicode

Total characters72610
Distinct characters432
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st row거여현대2차
2nd row마곡엠밸리7단지
3rd row대치SK VIEW
4th row수서동익
5th row송파파인타운10단지
ValueCountFrequency (%)
아파트 180
 
1.7%
래미안 39
 
0.4%
e편한세상 36
 
0.3%
아이파크 30
 
0.3%
힐스테이트 20
 
0.2%
염창 19
 
0.2%
고덕 15
 
0.1%
상계미도 14
 
0.1%
월드컵참누리 13
 
0.1%
이편한세상 13
 
0.1%
Other values (2239) 10345
96.5%
2024-05-11T06:54:28.572203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2604
 
3.6%
2532
 
3.5%
2378
 
3.3%
1807
 
2.5%
1703
 
2.3%
1675
 
2.3%
1513
 
2.1%
1459
 
2.0%
1415
 
1.9%
1333
 
1.8%
Other values (422) 54191
74.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66729
91.9%
Decimal Number 3462
 
4.8%
Space Separator 806
 
1.1%
Uppercase Letter 756
 
1.0%
Lowercase Letter 358
 
0.5%
Open Punctuation 140
 
0.2%
Close Punctuation 140
 
0.2%
Dash Punctuation 119
 
0.2%
Other Punctuation 95
 
0.1%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2604
 
3.9%
2532
 
3.8%
2378
 
3.6%
1807
 
2.7%
1703
 
2.6%
1675
 
2.5%
1513
 
2.3%
1459
 
2.2%
1415
 
2.1%
1333
 
2.0%
Other values (377) 48310
72.4%
Uppercase Letter
ValueCountFrequency (%)
S 140
18.5%
C 100
13.2%
K 92
12.2%
M 72
9.5%
D 72
9.5%
L 67
8.9%
H 51
 
6.7%
G 39
 
5.2%
I 31
 
4.1%
E 22
 
2.9%
Other values (7) 70
9.3%
Lowercase Letter
ValueCountFrequency (%)
e 212
59.2%
i 26
 
7.3%
l 24
 
6.7%
k 21
 
5.9%
s 18
 
5.0%
c 16
 
4.5%
v 15
 
4.2%
g 7
 
2.0%
a 7
 
2.0%
w 7
 
2.0%
Decimal Number
ValueCountFrequency (%)
2 1058
30.6%
1 1006
29.1%
3 469
13.5%
4 246
 
7.1%
5 186
 
5.4%
6 130
 
3.8%
7 108
 
3.1%
9 99
 
2.9%
8 98
 
2.8%
0 62
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 75
78.9%
. 20
 
21.1%
Space Separator
ValueCountFrequency (%)
806
100.0%
Open Punctuation
ValueCountFrequency (%)
( 140
100.0%
Close Punctuation
ValueCountFrequency (%)
) 140
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 119
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66729
91.9%
Common 4762
 
6.6%
Latin 1119
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2604
 
3.9%
2532
 
3.8%
2378
 
3.6%
1807
 
2.7%
1703
 
2.6%
1675
 
2.5%
1513
 
2.3%
1459
 
2.2%
1415
 
2.1%
1333
 
2.0%
Other values (377) 48310
72.4%
Latin
ValueCountFrequency (%)
e 212
18.9%
S 140
12.5%
C 100
8.9%
K 92
 
8.2%
M 72
 
6.4%
D 72
 
6.4%
L 67
 
6.0%
H 51
 
4.6%
G 39
 
3.5%
I 31
 
2.8%
Other values (19) 243
21.7%
Common
ValueCountFrequency (%)
2 1058
22.2%
1 1006
21.1%
806
16.9%
3 469
9.8%
4 246
 
5.2%
5 186
 
3.9%
( 140
 
2.9%
) 140
 
2.9%
6 130
 
2.7%
- 119
 
2.5%
Other values (6) 462
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66729
91.9%
ASCII 5876
 
8.1%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2604
 
3.9%
2532
 
3.8%
2378
 
3.6%
1807
 
2.7%
1703
 
2.6%
1675
 
2.5%
1513
 
2.3%
1459
 
2.2%
1415
 
2.1%
1333
 
2.0%
Other values (377) 48310
72.4%
ASCII
ValueCountFrequency (%)
2 1058
18.0%
1 1006
17.1%
806
13.7%
3 469
 
8.0%
4 246
 
4.2%
e 212
 
3.6%
5 186
 
3.2%
( 140
 
2.4%
) 140
 
2.4%
S 140
 
2.4%
Other values (34) 1473
25.1%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2175
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:29.225428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st rowA13881401
2nd rowA15721007
3rd rowA10026924
4th rowA13588601
5th rowA13821005
ValueCountFrequency (%)
a13971501 14
 
0.1%
a15208202 13
 
0.1%
a12187906 13
 
0.1%
a10027461 12
 
0.1%
a13790703 12
 
0.1%
a12204002 12
 
0.1%
a13405002 12
 
0.1%
a14381415 12
 
0.1%
a15609303 12
 
0.1%
a10025649 11
 
0.1%
Other values (2165) 9877
98.8%
2024-05-11T06:54:30.213959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 19003
21.1%
1 17533
19.5%
A 10000
11.1%
3 8828
9.8%
2 8250
9.2%
5 6130
 
6.8%
8 5452
 
6.1%
7 4646
 
5.2%
4 3920
 
4.4%
6 3471
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 19003
23.8%
1 17533
21.9%
3 8828
11.0%
2 8250
10.3%
5 6130
 
7.7%
8 5452
 
6.8%
7 4646
 
5.8%
4 3920
 
4.9%
6 3471
 
4.3%
9 2767
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 19003
23.8%
1 17533
21.9%
3 8828
11.0%
2 8250
10.3%
5 6130
 
7.7%
8 5452
 
6.8%
7 4646
 
5.8%
4 3920
 
4.9%
6 3471
 
4.3%
9 2767
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 19003
21.1%
1 17533
19.5%
A 10000
11.1%
3 8828
9.8%
2 8250
9.2%
5 6130
 
6.8%
8 5452
 
6.1%
7 4646
 
5.2%
4 3920
 
4.4%
6 3471
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:30.688230image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9029
Min length2

Characters and Unicode

Total characters49029
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row수선유지비
2nd row수선유지비
3rd row승강기유지비
4th row제수당
5th row청소비
ValueCountFrequency (%)
승강기유지비 226
 
2.3%
도서인쇄비 226
 
2.3%
사무용품비 225
 
2.2%
세대수도료 218
 
2.2%
교육비 214
 
2.1%
퇴직급여 213
 
2.1%
소독비 210
 
2.1%
통신비 208
 
2.1%
입주자대표회의운영비 208
 
2.1%
보험료 206
 
2.1%
Other values (77) 7846
78.5%
2024-05-11T06:54:31.566012image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5371
 
11.0%
3607
 
7.4%
2119
 
4.3%
1997
 
4.1%
1759
 
3.6%
1294
 
2.6%
1074
 
2.2%
861
 
1.8%
816
 
1.7%
770
 
1.6%
Other values (110) 29361
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49029
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5371
 
11.0%
3607
 
7.4%
2119
 
4.3%
1997
 
4.1%
1759
 
3.6%
1294
 
2.6%
1074
 
2.2%
861
 
1.8%
816
 
1.7%
770
 
1.6%
Other values (110) 29361
59.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49029
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5371
 
11.0%
3607
 
7.4%
2119
 
4.3%
1997
 
4.1%
1759
 
3.6%
1294
 
2.6%
1074
 
2.2%
861
 
1.8%
816
 
1.7%
770
 
1.6%
Other values (110) 29361
59.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49029
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5371
 
11.0%
3607
 
7.4%
2119
 
4.3%
1997
 
4.1%
1759
 
3.6%
1294
 
2.6%
1074
 
2.2%
861
 
1.8%
816
 
1.7%
770
 
1.6%
Other values (110) 29361
59.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202106
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202106
2nd row202106
3rd row202106
4th row202106
5th row202106

Common Values

ValueCountFrequency (%)
202106 10000
100.0%

Length

2024-05-11T06:54:31.974473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:54:32.270847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202106 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7054
Distinct (%)70.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2883501.1
Minimum-14500000
Maximum2.4602328 × 108
Zeros1135
Zeros (%)11.3%
Negative11
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:54:32.625261image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-14500000
5-th percentile0
Q169392.5
median303875
Q31270032
95-th percentile15066682
Maximum2.4602328 × 108
Range2.6052328 × 108
Interquartile range (IQR)1200639.5

Descriptive statistics

Standard deviation9535876.3
Coefficient of variation (CV)3.307048
Kurtosis145.24296
Mean2883501.1
Median Absolute Deviation (MAD)300203.5
Skewness9.4770557
Sum2.8835011 × 1010
Variance9.0932937 × 1013
MonotonicityNot monotonic
2024-05-11T06:54:33.016680image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1135
 
11.3%
300000 79
 
0.8%
200000 79
 
0.8%
100000 57
 
0.6%
150000 55
 
0.5%
400000 37
 
0.4%
30000 35
 
0.4%
50000 35
 
0.4%
250000 31
 
0.3%
180000 31
 
0.3%
Other values (7044) 8426
84.3%
ValueCountFrequency (%)
-14500000 1
< 0.1%
-5841000 1
< 0.1%
-2923460 1
< 0.1%
-2278000 1
< 0.1%
-1207126 1
< 0.1%
-431020 1
< 0.1%
-397500 1
< 0.1%
-305825 1
< 0.1%
-52520 1
< 0.1%
-29730 1
< 0.1%
ValueCountFrequency (%)
246023278 1
< 0.1%
221059030 1
< 0.1%
207297517 1
< 0.1%
155174020 1
< 0.1%
155064000 1
< 0.1%
148211500 1
< 0.1%
145364046 1
< 0.1%
129769550 1
< 0.1%
123970120 1
< 0.1%
113809726 1
< 0.1%

Interactions

2024-05-11T06:54:26.632410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:54:33.196111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.565
금액0.5651.000

Missing values

2024-05-11T06:54:26.968666image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:54:27.347310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
59051거여현대2차A13881401수선유지비2021061237830
93277마곡엠밸리7단지A15721007수선유지비20210620242275
7359대치SK VIEWA10026924승강기유지비2021061056000
45031수서동익A13588601제수당2021061793000
57075송파파인타운10단지A13821005청소비2021068148900
24481이문쌍용A13082704고용보험료202106250590
88895본동한신휴플러스A15605102정화조관리비202106220000
25959면목금호어울림A13120704음식물처리비202106288780
72864광장11현대홈타운A14321001업무추진비202106100000
93313마곡엠밸리4단지A15721008장기수선비2021062491500
아파트명아파트코드비용명년월일금액
50619종암삼성래미안A13686302위탁관리수수료202106753940
8239래미안 서초에스티지A10027221고용안정사업수익202106209030
14091DMC래미안e편한세상A12013003소독비2021060
10823용마산하늘채아파트A10028033수도광열비2021060
97391목동대림2차A15805109수선유지비2021062002000
54662잠원한신로얄A13790706제수당2021061347500
52424서초네이처힐3단지A13778205부과차익2021064836
73334광장현대3단지아파트A14381415주차장수익2021065110455
36865천호우성A13402103지급수수료20210662800
86816독산동한양수자인아파트A15370301입주자대표회의운영비202106414550