Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 245 (2.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:57:46.878823
Analysis finished2024-05-11 06:57:48.976616
Duration2.1 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2065
Distinct (%)20.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:49.404213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2279
Min length2

Characters and Unicode

Total characters72279
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)0.9%

Sample

1st row마곡수명산파크2단지
2nd row아크로힐스논현
3rd row벽산라이브파크2차
4th row이튼타워리버3차
5th row삼성롯데캐슬프레미어
ValueCountFrequency (%)
아파트 142
 
1.3%
래미안 27
 
0.3%
북한산 21
 
0.2%
신반포 19
 
0.2%
아이파크 19
 
0.2%
고덕 14
 
0.1%
왕십리 14
 
0.1%
이문현대 13
 
0.1%
신도림현대 13
 
0.1%
2단지 13
 
0.1%
Other values (2126) 10356
97.2%
2024-05-11T06:57:51.063876image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2337
 
3.2%
2259
 
3.1%
1992
 
2.8%
1772
 
2.5%
1760
 
2.4%
1652
 
2.3%
1530
 
2.1%
1504
 
2.1%
1412
 
2.0%
1322
 
1.8%
Other values (419) 54739
75.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66152
91.5%
Decimal Number 3791
 
5.2%
Space Separator 717
 
1.0%
Uppercase Letter 716
 
1.0%
Lowercase Letter 351
 
0.5%
Dash Punctuation 153
 
0.2%
Close Punctuation 147
 
0.2%
Open Punctuation 147
 
0.2%
Other Punctuation 98
 
0.1%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2337
 
3.5%
2259
 
3.4%
1992
 
3.0%
1772
 
2.7%
1760
 
2.7%
1652
 
2.5%
1530
 
2.3%
1504
 
2.3%
1412
 
2.1%
1322
 
2.0%
Other values (374) 48612
73.5%
Uppercase Letter
ValueCountFrequency (%)
S 127
17.7%
K 100
14.0%
C 81
11.3%
L 62
8.7%
I 48
 
6.7%
H 45
 
6.3%
M 43
 
6.0%
D 43
 
6.0%
E 37
 
5.2%
G 37
 
5.2%
Other values (7) 93
13.0%
Lowercase Letter
ValueCountFrequency (%)
e 204
58.1%
l 34
 
9.7%
i 27
 
7.7%
v 21
 
6.0%
c 20
 
5.7%
k 17
 
4.8%
s 10
 
2.8%
w 5
 
1.4%
a 5
 
1.4%
g 5
 
1.4%
Decimal Number
ValueCountFrequency (%)
1 1156
30.5%
2 1141
30.1%
3 484
12.8%
4 260
 
6.9%
5 217
 
5.7%
6 173
 
4.6%
9 102
 
2.7%
0 88
 
2.3%
8 86
 
2.3%
7 84
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 81
82.7%
. 17
 
17.3%
Space Separator
ValueCountFrequency (%)
717
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 153
100.0%
Close Punctuation
ValueCountFrequency (%)
) 147
100.0%
Open Punctuation
ValueCountFrequency (%)
( 147
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66152
91.5%
Common 5053
 
7.0%
Latin 1074
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2337
 
3.5%
2259
 
3.4%
1992
 
3.0%
1772
 
2.7%
1760
 
2.7%
1652
 
2.5%
1530
 
2.3%
1504
 
2.3%
1412
 
2.1%
1322
 
2.0%
Other values (374) 48612
73.5%
Latin
ValueCountFrequency (%)
e 204
19.0%
S 127
11.8%
K 100
 
9.3%
C 81
 
7.5%
L 62
 
5.8%
I 48
 
4.5%
H 45
 
4.2%
M 43
 
4.0%
D 43
 
4.0%
E 37
 
3.4%
Other values (19) 284
26.4%
Common
ValueCountFrequency (%)
1 1156
22.9%
2 1141
22.6%
717
14.2%
3 484
9.6%
4 260
 
5.1%
5 217
 
4.3%
6 173
 
3.4%
- 153
 
3.0%
) 147
 
2.9%
( 147
 
2.9%
Other values (6) 458
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66152
91.5%
ASCII 6120
 
8.5%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2337
 
3.5%
2259
 
3.4%
1992
 
3.0%
1772
 
2.7%
1760
 
2.7%
1652
 
2.5%
1530
 
2.3%
1504
 
2.3%
1412
 
2.1%
1322
 
2.0%
Other values (374) 48612
73.5%
ASCII
ValueCountFrequency (%)
1 1156
18.9%
2 1141
18.6%
717
11.7%
3 484
 
7.9%
4 260
 
4.2%
5 217
 
3.5%
e 204
 
3.3%
6 173
 
2.8%
- 153
 
2.5%
) 147
 
2.4%
Other values (34) 1468
24.0%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2071
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:52.134599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)0.9%

Sample

1st rowA15728004
2nd rowA13501006
3rd rowA14272310
4th rowA14319306
5th rowA13509010
ValueCountFrequency (%)
a13082703 13
 
0.1%
a13187702 13
 
0.1%
a10027744 12
 
0.1%
a13006003 12
 
0.1%
a10027424 11
 
0.1%
a13885306 11
 
0.1%
a15370103 11
 
0.1%
a11054101 11
 
0.1%
a13283405 11
 
0.1%
a15805302 11
 
0.1%
Other values (2061) 9884
98.8%
2024-05-11T06:57:53.867519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18350
20.4%
1 17467
19.4%
A 9981
11.1%
3 8902
9.9%
2 8157
9.1%
5 6310
 
7.0%
8 5784
 
6.4%
7 4821
 
5.4%
4 3792
 
4.2%
6 3461
 
3.8%
Other values (2) 2975
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18350
22.9%
1 17467
21.8%
3 8902
11.1%
2 8157
10.2%
5 6310
 
7.9%
8 5784
 
7.2%
7 4821
 
6.0%
4 3792
 
4.7%
6 3461
 
4.3%
9 2956
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9981
99.8%
B 19
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18350
22.9%
1 17467
21.8%
3 8902
11.1%
2 8157
10.2%
5 6310
 
7.9%
8 5784
 
7.2%
7 4821
 
6.0%
4 3792
 
4.7%
6 3461
 
4.3%
9 2956
 
3.7%
Latin
ValueCountFrequency (%)
A 9981
99.8%
B 19
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18350
20.4%
1 17467
19.4%
A 9981
11.1%
3 8902
9.9%
2 8157
9.1%
5 6310
 
7.0%
8 5784
 
6.4%
7 4821
 
5.4%
4 3792
 
4.2%
6 3461
 
3.8%
Other values (2) 2975
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:57:54.674886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7861
Min length2

Characters and Unicode

Total characters47861
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row기타부대비
2nd row장기수선비
3rd row지급수수료
4th row승강기유지비
5th row경비비
ValueCountFrequency (%)
급여 257
 
2.6%
소독비 246
 
2.5%
세대전기료 245
 
2.5%
연체료수익 236
 
2.4%
장기수선비 236
 
2.4%
사무용품비 233
 
2.3%
보험료 233
 
2.3%
퇴직급여 232
 
2.3%
경비비 229
 
2.3%
도서인쇄비 229
 
2.3%
Other values (77) 7624
76.2%
2024-05-11T06:57:55.944790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5395
 
11.3%
3490
 
7.3%
2212
 
4.6%
1809
 
3.8%
1611
 
3.4%
1361
 
2.8%
1121
 
2.3%
921
 
1.9%
863
 
1.8%
815
 
1.7%
Other values (110) 28263
59.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47861
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5395
 
11.3%
3490
 
7.3%
2212
 
4.6%
1809
 
3.8%
1611
 
3.4%
1361
 
2.8%
1121
 
2.3%
921
 
1.9%
863
 
1.8%
815
 
1.7%
Other values (110) 28263
59.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47861
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5395
 
11.3%
3490
 
7.3%
2212
 
4.6%
1809
 
3.8%
1611
 
3.4%
1361
 
2.8%
1121
 
2.3%
921
 
1.9%
863
 
1.8%
815
 
1.7%
Other values (110) 28263
59.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47861
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5395
 
11.3%
3490
 
7.3%
2212
 
4.6%
1809
 
3.8%
1611
 
3.4%
1361
 
2.8%
1121
 
2.3%
921
 
1.9%
863
 
1.8%
815
 
1.7%
Other values (110) 28263
59.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202001
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202001
2nd row202001
3rd row202001
4th row202001
5th row202001

Common Values

ValueCountFrequency (%)
202001 10000
100.0%

Length

2024-05-11T06:57:56.512648image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:57:56.936347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202001 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7639
Distinct (%)76.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3967345.7
Minimum-882000
Maximum5.9248784 × 108
Zeros245
Zeros (%)2.5%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:57:57.396351image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-882000
5-th percentile3196.5
Q1120400
median417145
Q31771392.5
95-th percentile19010691
Maximum5.9248784 × 108
Range5.9336984 × 108
Interquartile range (IQR)1650992.5

Descriptive statistics

Standard deviation15023734
Coefficient of variation (CV)3.7868476
Kurtosis389.44934
Mean3967345.7
Median Absolute Deviation (MAD)369145
Skewness14.755736
Sum3.9673457 × 1010
Variance2.2571257 × 1014
MonotonicityNot monotonic
2024-05-11T06:57:57.993623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 245
 
2.5%
200000 100
 
1.0%
100000 70
 
0.7%
300000 68
 
0.7%
150000 42
 
0.4%
400000 40
 
0.4%
120000 40
 
0.4%
48000 39
 
0.4%
500000 39
 
0.4%
110000 38
 
0.4%
Other values (7629) 9279
92.8%
ValueCountFrequency (%)
-882000 1
 
< 0.1%
-654570 1
 
< 0.1%
-550000 1
 
< 0.1%
-212880 1
 
< 0.1%
-119320 1
 
< 0.1%
-3409 1
 
< 0.1%
-19 1
 
< 0.1%
-2 1
 
< 0.1%
0 245
2.5%
1 1
 
< 0.1%
ValueCountFrequency (%)
592487840 1
< 0.1%
459715300 1
< 0.1%
321080550 1
< 0.1%
242998650 1
< 0.1%
232399016 1
< 0.1%
226800000 1
< 0.1%
213927130 1
< 0.1%
213359799 1
< 0.1%
207305380 1
< 0.1%
192341368 1
< 0.1%

Interactions

2024-05-11T06:57:48.010273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:57:58.342542image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.416
금액0.4161.000

Missing values

2024-05-11T06:57:48.365780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:57:48.800009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
77012마곡수명산파크2단지A15728004기타부대비202001254780
31632아크로힐스논현A13501006장기수선비2020014925280
57537벽산라이브파크2차A14272310지급수수료202001900
58537이튼타워리버3차A14319306승강기유지비2020011337790
32145삼성롯데캐슬프레미어A13509010경비비20200152649330
80365목동롯데캐슬위너A15805303통신비202001125167
48986진로유통조합대림A13922002수선유지비202001566430
27318서울숲2차푸르지오A13378102청소비20200110682320
81992신정이펜하우스1단지(총세대 기준)A15870701광고료수익202001280000
61139신대림한솔솔파크A15007002안전진단실시비20200162500
아파트명아파트코드비용명년월일금액
35807역삼e-편한세상A13592605재활용품비용202001624000
33176엘에이치강남브리즈힐A13520004식대202001198000
80624경남아너스빌A15807001검침수익202001125555
38407동일하이빌뉴시티A13613011소모품비202001210000
43594신반포 한신 25,26,27차 아파트A13790716산재보험료202001119840
63673신길우성2차A15086007세대전기료20200119914770
72367노량진우성A15605002음식물처리비2020012996670
13331망원2차대림A12182401연체료수익20200163470
69452구로중앙하이츠아파트A15285804광고료수익202001108700
39084삼선푸르지오아파트A13672101재활용품비용202001251000