Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1426 (14.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:58:01.782218
Analysis finished2024-05-11 06:58:04.645818
Duration2.86 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2068
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:05.086792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1952
Min length2

Characters and Unicode

Total characters71952
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)0.9%

Sample

1st row전농삼성
2nd row역삼개나리푸르지오
3rd row신내건영2차아파트
4th row광장힐스테이트
5th row목동14단지
ValueCountFrequency (%)
아파트 111
 
1.1%
힐스테이트 22
 
0.2%
래미안 19
 
0.2%
장미3차 16
 
0.2%
신도림현대 15
 
0.1%
상도삼호 14
 
0.1%
북한산 14
 
0.1%
이촌강촌 14
 
0.1%
신트리1단지 13
 
0.1%
극동2차아파트입주자대표회의 13
 
0.1%
Other values (2129) 10291
97.6%
2024-05-11T06:58:06.602441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2285
 
3.2%
2201
 
3.1%
2057
 
2.9%
1776
 
2.5%
1733
 
2.4%
1689
 
2.3%
1612
 
2.2%
1508
 
2.1%
1397
 
1.9%
1339
 
1.9%
Other values (421) 54355
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65783
91.4%
Decimal Number 3853
 
5.4%
Uppercase Letter 769
 
1.1%
Space Separator 585
 
0.8%
Lowercase Letter 349
 
0.5%
Open Punctuation 162
 
0.2%
Close Punctuation 162
 
0.2%
Other Punctuation 141
 
0.2%
Dash Punctuation 139
 
0.2%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2285
 
3.5%
2201
 
3.3%
2057
 
3.1%
1776
 
2.7%
1733
 
2.6%
1689
 
2.6%
1612
 
2.5%
1508
 
2.3%
1397
 
2.1%
1339
 
2.0%
Other values (375) 48186
73.2%
Uppercase Letter
ValueCountFrequency (%)
S 131
17.0%
K 97
12.6%
C 90
11.7%
L 63
8.2%
D 59
7.7%
M 59
7.7%
H 45
 
5.9%
E 44
 
5.7%
I 41
 
5.3%
G 34
 
4.4%
Other values (7) 106
13.8%
Lowercase Letter
ValueCountFrequency (%)
e 177
50.7%
l 40
 
11.5%
i 36
 
10.3%
v 24
 
6.9%
c 20
 
5.7%
k 17
 
4.9%
s 9
 
2.6%
g 8
 
2.3%
a 8
 
2.3%
w 8
 
2.3%
Decimal Number
ValueCountFrequency (%)
1 1189
30.9%
2 1104
28.7%
3 520
13.5%
4 249
 
6.5%
5 209
 
5.4%
6 172
 
4.5%
8 113
 
2.9%
7 112
 
2.9%
9 103
 
2.7%
0 82
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 113
80.1%
. 28
 
19.9%
Space Separator
ValueCountFrequency (%)
585
100.0%
Open Punctuation
ValueCountFrequency (%)
( 162
100.0%
Close Punctuation
ValueCountFrequency (%)
) 162
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 139
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65783
91.4%
Common 5047
 
7.0%
Latin 1122
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2285
 
3.5%
2201
 
3.3%
2057
 
3.1%
1776
 
2.7%
1733
 
2.6%
1689
 
2.6%
1612
 
2.5%
1508
 
2.3%
1397
 
2.1%
1339
 
2.0%
Other values (375) 48186
73.2%
Latin
ValueCountFrequency (%)
e 177
15.8%
S 131
11.7%
K 97
 
8.6%
C 90
 
8.0%
L 63
 
5.6%
D 59
 
5.3%
M 59
 
5.3%
H 45
 
4.0%
E 44
 
3.9%
I 41
 
3.7%
Other values (19) 316
28.2%
Common
ValueCountFrequency (%)
1 1189
23.6%
2 1104
21.9%
585
11.6%
3 520
10.3%
4 249
 
4.9%
5 209
 
4.1%
6 172
 
3.4%
( 162
 
3.2%
) 162
 
3.2%
- 139
 
2.8%
Other values (7) 556
11.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65783
91.4%
ASCII 6165
 
8.6%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2285
 
3.5%
2201
 
3.3%
2057
 
3.1%
1776
 
2.7%
1733
 
2.6%
1689
 
2.6%
1612
 
2.5%
1508
 
2.3%
1397
 
2.1%
1339
 
2.0%
Other values (375) 48186
73.2%
ASCII
ValueCountFrequency (%)
1 1189
19.3%
2 1104
17.9%
585
 
9.5%
3 520
 
8.4%
4 249
 
4.0%
5 209
 
3.4%
e 177
 
2.9%
6 172
 
2.8%
( 162
 
2.6%
) 162
 
2.6%
Other values (35) 1636
26.5%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2073
Distinct (%)20.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:07.642626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)0.9%

Sample

1st rowA13085301
2nd rowA13579501
3rd rowA13185607
4th rowA14375301
5th rowA15807606
ValueCountFrequency (%)
a13872504 16
 
0.2%
a15678102 14
 
0.1%
a14003106 14
 
0.1%
a15083701 13
 
0.1%
a13010003 13
 
0.1%
a14380414 13
 
0.1%
a15807002 13
 
0.1%
a13527011 12
 
0.1%
a12201301 12
 
0.1%
a10045001 12
 
0.1%
Other values (2063) 9868
98.7%
2024-05-11T06:58:09.461477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18477
20.5%
1 17543
19.5%
A 10000
11.1%
3 8751
9.7%
2 8049
8.9%
5 6434
 
7.1%
8 5822
 
6.5%
7 4870
 
5.4%
4 3687
 
4.1%
6 3499
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18477
23.1%
1 17543
21.9%
3 8751
10.9%
2 8049
10.1%
5 6434
 
8.0%
8 5822
 
7.3%
7 4870
 
6.1%
4 3687
 
4.6%
6 3499
 
4.4%
9 2868
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18477
23.1%
1 17543
21.9%
3 8751
10.9%
2 8049
10.1%
5 6434
 
8.0%
8 5822
 
7.3%
7 4870
 
6.1%
4 3687
 
4.6%
6 3499
 
4.4%
9 2868
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18477
20.5%
1 17543
19.5%
A 10000
11.1%
3 8751
9.7%
2 8049
8.9%
5 6434
 
7.1%
8 5822
 
6.5%
7 4870
 
5.4%
4 3687
 
4.1%
6 3499
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:10.517703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8806
Min length2

Characters and Unicode

Total characters48806
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row임대료수익
2nd row잡비용
3rd row회계감사비
4th row수선유지비
5th row소모품비
ValueCountFrequency (%)
급여 222
 
2.2%
교육비 221
 
2.2%
장기수선비 218
 
2.2%
수선유지비 215
 
2.1%
보험료 213
 
2.1%
도서인쇄비 210
 
2.1%
승강기유지비 210
 
2.1%
소독비 209
 
2.1%
사무용품비 208
 
2.1%
청소비 203
 
2.0%
Other values (77) 7871
78.7%
2024-05-11T06:58:11.586928image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5456
 
11.2%
3571
 
7.3%
2051
 
4.2%
1977
 
4.1%
1755
 
3.6%
1307
 
2.7%
1006
 
2.1%
815
 
1.7%
808
 
1.7%
780
 
1.6%
Other values (110) 29280
60.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48806
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5456
 
11.2%
3571
 
7.3%
2051
 
4.2%
1977
 
4.1%
1755
 
3.6%
1307
 
2.7%
1006
 
2.1%
815
 
1.7%
808
 
1.7%
780
 
1.6%
Other values (110) 29280
60.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48806
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5456
 
11.2%
3571
 
7.3%
2051
 
4.2%
1977
 
4.1%
1755
 
3.6%
1307
 
2.7%
1006
 
2.1%
815
 
1.7%
808
 
1.7%
780
 
1.6%
Other values (110) 29280
60.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48806
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5456
 
11.2%
3571
 
7.3%
2051
 
4.2%
1977
 
4.1%
1755
 
3.6%
1307
 
2.7%
1006
 
2.1%
815
 
1.7%
808
 
1.7%
780
 
1.6%
Other values (110) 29280
60.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201912
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201912
2nd row201912
3rd row201912
4th row201912
5th row201912

Common Values

ValueCountFrequency (%)
201912 10000
100.0%

Length

2024-05-11T06:58:12.024185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:58:12.319370image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201912 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6892
Distinct (%)68.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3371301.5
Minimum-8119350
Maximum3.865093 × 108
Zeros1426
Zeros (%)14.3%
Negative19
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:58:12.689270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-8119350
5-th percentile0
Q159965
median300000
Q31388875
95-th percentile16590569
Maximum3.865093 × 108
Range3.9462865 × 108
Interquartile range (IQR)1328910

Descriptive statistics

Standard deviation12103694
Coefficient of variation (CV)3.5902142
Kurtosis204.81442
Mean3371301.5
Median Absolute Deviation (MAD)300000
Skewness10.955221
Sum3.3713015 × 1010
Variance1.4649942 × 1014
MonotonicityNot monotonic
2024-05-11T06:58:13.170497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1426
 
14.3%
200000 88
 
0.9%
300000 68
 
0.7%
150000 54
 
0.5%
100000 54
 
0.5%
400000 42
 
0.4%
500000 32
 
0.3%
350000 23
 
0.2%
220000 23
 
0.2%
450000 21
 
0.2%
Other values (6882) 8169
81.7%
ValueCountFrequency (%)
-8119350 1
< 0.1%
-2343150 1
< 0.1%
-1855020 1
< 0.1%
-1199600 1
< 0.1%
-854640 1
< 0.1%
-786000 1
< 0.1%
-703710 1
< 0.1%
-700000 1
< 0.1%
-681528 1
< 0.1%
-312500 1
< 0.1%
ValueCountFrequency (%)
386509300 1
< 0.1%
273862098 1
< 0.1%
235573430 1
< 0.1%
220791300 1
< 0.1%
219831620 1
< 0.1%
218731760 1
< 0.1%
206397300 1
< 0.1%
173615820 1
< 0.1%
158527760 1
< 0.1%
151255323 1
< 0.1%

Interactions

2024-05-11T06:58:03.241845image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:58:13.425501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.527
금액0.5271.000

Missing values

2024-05-11T06:58:03.943335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:58:04.442924image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
22195전농삼성A13085301임대료수익2019121200000
41272역삼개나리푸르지오A13579501잡비용2019122419000
25048신내건영2차아파트A13185607회계감사비201912187500
70064광장힐스테이트A14375301수선유지비2019123722460
96486목동14단지A15807606소모품비201912288540
37768삼성현대A13509001건강보험료201912264250
55799가락현대6차A13880201건강보험료201912418080
64170중계우성3차A13986201선거관리위원회운영비2019120
9486서대문천연뜨란채아파트A12004001급여20191220962830
42356수서동익A13588601보험료201912280480
아파트명아파트코드비용명년월일금액
9581독립문삼호A12007001기타사용료2019124173000
17701신사현대2차A12208105건강보험료201912314510
2695롯데캐슬노블레스A10026180임대료수익2019120
47899정릉중앙하이츠빌1단지A13685104급여20191211756110
14631상암휴먼시아1단지A12179502세금과공과2019120
57905우림루미아트1.2단지A13920104입주자대표회의운영비201912500000
88046대방2차현대A15681104급여2019128223600
6913마포한강푸르지오A10027902도서인쇄비201912205000
35110강일리버파크2단지A13410003검침수익201912190060
12537마포도화우성A12104007검침수익2019122480660