Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1504 (15.0%) zerosZeros

Reproduction

Analysis started2024-05-11 06:56:07.419923
Analysis finished2024-05-11 06:56:09.188898
Duration1.77 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2112
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:09.442924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1447
Min length2

Characters and Unicode

Total characters71447
Distinct characters425
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique105 ?
Unique (%)1.1%

Sample

1st row문래현대3차
2nd row엠브이아파트
3rd row응암푸르지오
4th row방화월드메르디앙
5th row래미안하이리버
ValueCountFrequency (%)
아파트 163
 
1.5%
래미안 33
 
0.3%
아이파크 21
 
0.2%
북한산 16
 
0.2%
관리사무소 16
 
0.2%
고덕 16
 
0.2%
e편한세상 15
 
0.1%
가양대림경동 14
 
0.1%
신길우성2차 13
 
0.1%
고덕현대 13
 
0.1%
Other values (2176) 10332
97.0%
2024-05-11T06:56:10.398758image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2478
 
3.5%
2416
 
3.4%
2161
 
3.0%
1815
 
2.5%
1621
 
2.3%
1600
 
2.2%
1539
 
2.2%
1471
 
2.1%
1383
 
1.9%
1351
 
1.9%
Other values (415) 53612
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65551
91.7%
Decimal Number 3483
 
4.9%
Uppercase Letter 780
 
1.1%
Space Separator 712
 
1.0%
Lowercase Letter 338
 
0.5%
Close Punctuation 154
 
0.2%
Open Punctuation 154
 
0.2%
Other Punctuation 141
 
0.2%
Dash Punctuation 121
 
0.2%
Math Symbol 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2478
 
3.8%
2416
 
3.7%
2161
 
3.3%
1815
 
2.8%
1621
 
2.5%
1600
 
2.4%
1539
 
2.3%
1471
 
2.2%
1383
 
2.1%
1351
 
2.1%
Other values (369) 47716
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 132
16.9%
C 112
14.4%
K 108
13.8%
D 73
9.4%
M 73
9.4%
L 48
 
6.2%
H 48
 
6.2%
I 35
 
4.5%
G 26
 
3.3%
E 22
 
2.8%
Other values (7) 103
13.2%
Lowercase Letter
ValueCountFrequency (%)
e 181
53.6%
l 44
 
13.0%
i 36
 
10.7%
v 27
 
8.0%
s 13
 
3.8%
k 12
 
3.6%
w 10
 
3.0%
c 4
 
1.2%
a 4
 
1.2%
g 4
 
1.2%
Decimal Number
ValueCountFrequency (%)
1 1040
29.9%
2 1029
29.5%
3 494
14.2%
4 250
 
7.2%
5 171
 
4.9%
6 149
 
4.3%
7 109
 
3.1%
8 107
 
3.1%
9 78
 
2.2%
0 56
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 121
85.8%
. 20
 
14.2%
Space Separator
ValueCountFrequency (%)
712
100.0%
Close Punctuation
ValueCountFrequency (%)
) 154
100.0%
Open Punctuation
ValueCountFrequency (%)
( 154
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 121
100.0%
Math Symbol
ValueCountFrequency (%)
~ 11
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65551
91.7%
Common 4776
 
6.7%
Latin 1120
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2478
 
3.8%
2416
 
3.7%
2161
 
3.3%
1815
 
2.8%
1621
 
2.5%
1600
 
2.4%
1539
 
2.3%
1471
 
2.2%
1383
 
2.1%
1351
 
2.1%
Other values (369) 47716
72.8%
Latin
ValueCountFrequency (%)
e 181
16.2%
S 132
11.8%
C 112
10.0%
K 108
9.6%
D 73
 
6.5%
M 73
 
6.5%
L 48
 
4.3%
H 48
 
4.3%
l 44
 
3.9%
i 36
 
3.2%
Other values (19) 265
23.7%
Common
ValueCountFrequency (%)
1 1040
21.8%
2 1029
21.5%
712
14.9%
3 494
10.3%
4 250
 
5.2%
5 171
 
3.6%
) 154
 
3.2%
( 154
 
3.2%
6 149
 
3.1%
- 121
 
2.5%
Other values (7) 502
10.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65551
91.7%
ASCII 5894
 
8.2%
Number Forms 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2478
 
3.8%
2416
 
3.7%
2161
 
3.3%
1815
 
2.8%
1621
 
2.5%
1600
 
2.4%
1539
 
2.3%
1471
 
2.2%
1383
 
2.1%
1351
 
2.1%
Other values (369) 47716
72.8%
ASCII
ValueCountFrequency (%)
1 1040
17.6%
2 1029
17.5%
712
12.1%
3 494
 
8.4%
4 250
 
4.2%
e 181
 
3.1%
5 171
 
2.9%
) 154
 
2.6%
( 154
 
2.6%
6 149
 
2.5%
Other values (35) 1560
26.5%
Number Forms
ValueCountFrequency (%)
2
100.0%
Distinct2119
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:11.030187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique105 ?
Unique (%)1.1%

Sample

1st rowA15009502
2nd rowA13780201
3rd rowA12201103
4th rowA15773501
5th rowA13380302
ValueCountFrequency (%)
a15780703 14
 
0.1%
a12179004 13
 
0.1%
a15086007 13
 
0.1%
a13592604 12
 
0.1%
a13006002 12
 
0.1%
a12284501 11
 
0.1%
a13283405 11
 
0.1%
a12170601 11
 
0.1%
a13790703 11
 
0.1%
a13790730 11
 
0.1%
Other values (2109) 9881
98.8%
2024-05-11T06:56:12.118543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18880
21.0%
1 17540
19.5%
A 10000
11.1%
3 9007
10.0%
2 8214
9.1%
5 6199
 
6.9%
8 5506
 
6.1%
7 4670
 
5.2%
4 3755
 
4.2%
6 3483
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18880
23.6%
1 17540
21.9%
3 9007
11.3%
2 8214
10.3%
5 6199
 
7.7%
8 5506
 
6.9%
7 4670
 
5.8%
4 3755
 
4.7%
6 3483
 
4.4%
9 2746
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18880
23.6%
1 17540
21.9%
3 9007
11.3%
2 8214
10.3%
5 6199
 
7.7%
8 5506
 
6.9%
7 4670
 
5.8%
4 3755
 
4.7%
6 3483
 
4.4%
9 2746
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18880
21.0%
1 17540
19.5%
A 10000
11.1%
3 9007
10.0%
2 8214
9.1%
5 6199
 
6.9%
8 5506
 
6.1%
7 4670
 
5.2%
4 3755
 
4.2%
6 3483
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:12.687687image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8553
Min length2

Characters and Unicode

Total characters48553
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row퇴직급여
2nd row경비비
3rd row보험료
4th row청소비
5th row주차장수익
ValueCountFrequency (%)
입주자대표회의운영비 229
 
2.3%
국민연금 221
 
2.2%
청소비 221
 
2.2%
사무용품비 221
 
2.2%
도서인쇄비 219
 
2.2%
이자수익 218
 
2.2%
통신비 218
 
2.2%
경비비 215
 
2.1%
급여 214
 
2.1%
제수당 207
 
2.1%
Other values (77) 7817
78.2%
2024-05-11T06:56:14.019384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5433
 
11.2%
3529
 
7.3%
2044
 
4.2%
1901
 
3.9%
1666
 
3.4%
1257
 
2.6%
1025
 
2.1%
802
 
1.7%
783
 
1.6%
765
 
1.6%
Other values (110) 29348
60.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48553
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5433
 
11.2%
3529
 
7.3%
2044
 
4.2%
1901
 
3.9%
1666
 
3.4%
1257
 
2.6%
1025
 
2.1%
802
 
1.7%
783
 
1.6%
765
 
1.6%
Other values (110) 29348
60.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48553
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5433
 
11.2%
3529
 
7.3%
2044
 
4.2%
1901
 
3.9%
1666
 
3.4%
1257
 
2.6%
1025
 
2.1%
802
 
1.7%
783
 
1.6%
765
 
1.6%
Other values (110) 29348
60.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48553
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5433
 
11.2%
3529
 
7.3%
2044
 
4.2%
1901
 
3.9%
1666
 
3.4%
1257
 
2.6%
1025
 
2.1%
802
 
1.7%
783
 
1.6%
765
 
1.6%
Other values (110) 29348
60.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202008
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202008
2nd row202008
3rd row202008
4th row202008
5th row202008

Common Values

ValueCountFrequency (%)
202008 10000
100.0%

Length

2024-05-11T06:56:14.798459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:56:15.370759image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202008 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6745
Distinct (%)67.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2990488.4
Minimum-2285555
Maximum2.6313896 × 108
Zeros1504
Zeros (%)15.0%
Negative17
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:56:15.803418image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2285555
5-th percentile0
Q154370
median297750
Q31291320
95-th percentile15073074
Maximum2.6313896 × 108
Range2.6542452 × 108
Interquartile range (IQR)1236950

Descriptive statistics

Standard deviation10821443
Coefficient of variation (CV)3.6186208
Kurtosis167.24411
Mean2990488.4
Median Absolute Deviation (MAD)297750
Skewness10.543401
Sum2.9904884 × 1010
Variance1.1710364 × 1014
MonotonicityNot monotonic
2024-05-11T06:56:16.444291image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1504
 
15.0%
62500 92
 
0.9%
200000 86
 
0.9%
300000 69
 
0.7%
100000 47
 
0.5%
400000 43
 
0.4%
50000 36
 
0.4%
150000 34
 
0.3%
500000 32
 
0.3%
600000 31
 
0.3%
Other values (6735) 8026
80.3%
ValueCountFrequency (%)
-2285555 1
< 0.1%
-1984600 1
< 0.1%
-1542360 1
< 0.1%
-1533360 1
< 0.1%
-1000000 1
< 0.1%
-945420 1
< 0.1%
-715880 1
< 0.1%
-172000 1
< 0.1%
-130490 1
< 0.1%
-27940 2
< 0.1%
ValueCountFrequency (%)
263138960 1
< 0.1%
246441156 1
< 0.1%
228130089 1
< 0.1%
220544962 1
< 0.1%
213847930 1
< 0.1%
197728136 1
< 0.1%
186968662 1
< 0.1%
167732680 1
< 0.1%
158727664 1
< 0.1%
156945830 1
< 0.1%

Interactions

2024-05-11T06:56:08.333690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:56:16.758115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.433
금액0.4331.000

Missing values

2024-05-11T06:56:08.720865image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:56:09.048451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
75726문래현대3차A15009502퇴직급여2020081050000
52115엠브이아파트A13780201경비비2020086503750
18985응암푸르지오A12201103보험료202008507740
94881방화월드메르디앙A15773501청소비2020084959980
34785래미안하이리버A13380302주차장수익2020083705000
22397휘경동일스위트리버A13009206기타운영비용2020081190000
8439인왕산2차아이파크아파트A10027708교육비2020080
69541한강로우림필유A14074501교육비2020080
85394구로월드A15286602기타운영수익202008402000
54012잠원훼미리 아파트A13790612장기수선비2020083139200
아파트명아파트코드비용명년월일금액
93279벽산A15721005소독비202008450000
34614서울숲2차푸르지오A13378102입주자대표회의운영비2020081085900
38245고덕현대A13478601급여20200819271070
22928래미안엘파인A13075402승강기수익202008650000
95793강서센트레빌4차A15781201세대전기료20200818170720
56141가락극동A13816202공동주택지원금수익2020080
17324망원1차대림A12182101경비비2020087192530
42933래미안대치하이스턴A13528007고용보험료202008147640
80538봉천두산1,2단지A15106901고용보험료202008296980
1219신촌그랑자이아파트A10025003선거관리위원회운영비202008929280