Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 665 (6.7%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:06.390273
Analysis finished2024-05-11 06:55:08.113120
Duration1.72 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2234
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:08.322013image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.3417
Min length2

Characters and Unicode

Total characters73417
Distinct characters436
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique143 ?
Unique (%)1.4%

Sample

1st row상암휴먼시아2단지아파트
2nd row동화히스토리
3rd row방화동부센트레빌
4th row공릉대아2차
5th row가양중앙하이츠
ValueCountFrequency (%)
아파트 152
 
1.4%
래미안 34
 
0.3%
아이파크 22
 
0.2%
힐스테이트 21
 
0.2%
e편한세상 16
 
0.1%
래미안밤섬리베뉴 15
 
0.1%
풍림아이원플러스 13
 
0.1%
해모로 13
 
0.1%
면목 13
 
0.1%
sk뷰 13
 
0.1%
Other values (2301) 10362
97.1%
2024-05-11T06:55:09.108979image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2593
 
3.5%
2420
 
3.3%
2282
 
3.1%
1852
 
2.5%
1800
 
2.5%
1693
 
2.3%
1474
 
2.0%
1454
 
2.0%
1441
 
2.0%
1401
 
1.9%
Other values (426) 55007
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67315
91.7%
Decimal Number 3657
 
5.0%
Uppercase Letter 790
 
1.1%
Space Separator 745
 
1.0%
Lowercase Letter 319
 
0.4%
Close Punctuation 161
 
0.2%
Open Punctuation 161
 
0.2%
Dash Punctuation 141
 
0.2%
Other Punctuation 120
 
0.2%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2593
 
3.9%
2420
 
3.6%
2282
 
3.4%
1852
 
2.8%
1800
 
2.7%
1693
 
2.5%
1474
 
2.2%
1454
 
2.2%
1441
 
2.1%
1401
 
2.1%
Other values (381) 48905
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 118
14.9%
C 107
13.5%
K 86
10.9%
M 78
9.9%
D 78
9.9%
L 74
9.4%
H 55
7.0%
I 37
 
4.7%
E 37
 
4.7%
G 34
 
4.3%
Other values (7) 86
10.9%
Lowercase Letter
ValueCountFrequency (%)
e 188
58.9%
l 30
 
9.4%
i 28
 
8.8%
v 20
 
6.3%
s 16
 
5.0%
k 15
 
4.7%
w 8
 
2.5%
g 5
 
1.6%
a 5
 
1.6%
h 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 1130
30.9%
2 1057
28.9%
3 471
12.9%
4 269
 
7.4%
5 224
 
6.1%
6 151
 
4.1%
8 98
 
2.7%
9 95
 
2.6%
0 81
 
2.2%
7 81
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 91
75.8%
. 29
 
24.2%
Space Separator
ValueCountFrequency (%)
745
100.0%
Close Punctuation
ValueCountFrequency (%)
) 161
100.0%
Open Punctuation
ValueCountFrequency (%)
( 161
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 141
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67315
91.7%
Common 4985
 
6.8%
Latin 1117
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2593
 
3.9%
2420
 
3.6%
2282
 
3.4%
1852
 
2.8%
1800
 
2.7%
1693
 
2.5%
1474
 
2.2%
1454
 
2.2%
1441
 
2.1%
1401
 
2.1%
Other values (381) 48905
72.7%
Latin
ValueCountFrequency (%)
e 188
16.8%
S 118
10.6%
C 107
9.6%
K 86
 
7.7%
M 78
 
7.0%
D 78
 
7.0%
L 74
 
6.6%
H 55
 
4.9%
I 37
 
3.3%
E 37
 
3.3%
Other values (19) 259
23.2%
Common
ValueCountFrequency (%)
1 1130
22.7%
2 1057
21.2%
745
14.9%
3 471
9.4%
4 269
 
5.4%
5 224
 
4.5%
) 161
 
3.2%
( 161
 
3.2%
6 151
 
3.0%
- 141
 
2.8%
Other values (6) 475
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67315
91.7%
ASCII 6094
 
8.3%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2593
 
3.9%
2420
 
3.6%
2282
 
3.4%
1852
 
2.8%
1800
 
2.7%
1693
 
2.5%
1474
 
2.2%
1454
 
2.2%
1441
 
2.1%
1401
 
2.1%
Other values (381) 48905
72.7%
ASCII
ValueCountFrequency (%)
1 1130
18.5%
2 1057
17.3%
745
12.2%
3 471
 
7.7%
4 269
 
4.4%
5 224
 
3.7%
e 188
 
3.1%
) 161
 
2.6%
( 161
 
2.6%
6 151
 
2.5%
Other values (34) 1537
25.2%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2241
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:09.707736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique143 ?
Unique (%)1.4%

Sample

1st rowA12179501
2nd rowA13572601
3rd rowA15722108
4th rowA13980604
5th rowA15780701
ValueCountFrequency (%)
a13707203 13
 
0.1%
a14003106 12
 
0.1%
a13684306 12
 
0.1%
a15681110 12
 
0.1%
a13385301 12
 
0.1%
a13405201 11
 
0.1%
a13872504 11
 
0.1%
a10025850 11
 
0.1%
a15279101 11
 
0.1%
a13379001 11
 
0.1%
Other values (2231) 9884
98.8%
2024-05-11T06:55:10.597234image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18557
20.6%
1 17524
19.5%
A 9988
11.1%
3 8707
9.7%
2 8287
9.2%
5 6218
 
6.9%
8 5650
 
6.3%
7 4834
 
5.4%
4 3968
 
4.4%
6 3353
 
3.7%
Other values (2) 2914
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18557
23.2%
1 17524
21.9%
3 8707
10.9%
2 8287
10.4%
5 6218
 
7.8%
8 5650
 
7.1%
7 4834
 
6.0%
4 3968
 
5.0%
6 3353
 
4.2%
9 2902
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9988
99.9%
B 12
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18557
23.2%
1 17524
21.9%
3 8707
10.9%
2 8287
10.4%
5 6218
 
7.8%
8 5650
 
7.1%
7 4834
 
6.0%
4 3968
 
5.0%
6 3353
 
4.2%
9 2902
 
3.6%
Latin
ValueCountFrequency (%)
A 9988
99.9%
B 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18557
20.6%
1 17524
19.5%
A 9988
11.1%
3 8707
9.7%
2 8287
9.2%
5 6218
 
6.9%
8 5650
 
6.3%
7 4834
 
5.4%
4 3968
 
4.4%
6 3353
 
3.7%
Other values (2) 2914
 
3.2%
Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:11.200325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7843
Min length2

Characters and Unicode

Total characters47843
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row부과차익
2nd row지급수수료
3rd row소독비
4th row이자수익
5th row기타운영비용
ValueCountFrequency (%)
경비비 258
 
2.6%
통신비 253
 
2.5%
연체료수익 248
 
2.5%
사무용품비 243
 
2.4%
소독비 239
 
2.4%
청소비 234
 
2.3%
소모품비 232
 
2.3%
급여 227
 
2.3%
승강기유지비 226
 
2.3%
도서인쇄비 226
 
2.3%
Other values (75) 7614
76.1%
2024-05-11T06:55:12.098868image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5395
 
11.3%
3604
 
7.5%
2268
 
4.7%
1916
 
4.0%
1659
 
3.5%
1342
 
2.8%
1051
 
2.2%
887
 
1.9%
848
 
1.8%
837
 
1.7%
Other values (110) 28036
58.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47843
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5395
 
11.3%
3604
 
7.5%
2268
 
4.7%
1916
 
4.0%
1659
 
3.5%
1342
 
2.8%
1051
 
2.2%
887
 
1.9%
848
 
1.8%
837
 
1.7%
Other values (110) 28036
58.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47843
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5395
 
11.3%
3604
 
7.5%
2268
 
4.7%
1916
 
4.0%
1659
 
3.5%
1342
 
2.8%
1051
 
2.2%
887
 
1.9%
848
 
1.8%
837
 
1.7%
Other values (110) 28036
58.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47843
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5395
 
11.3%
3604
 
7.5%
2268
 
4.7%
1916
 
4.0%
1659
 
3.5%
1342
 
2.8%
1051
 
2.2%
887
 
1.9%
848
 
1.8%
837
 
1.7%
Other values (110) 28036
58.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202102
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202102
2nd row202102
3rd row202102
4th row202102
5th row202102

Common Values

ValueCountFrequency (%)
202102 10000
100.0%

Length

2024-05-11T06:55:12.493489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:12.783339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202102 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7312
Distinct (%)73.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3729033.5
Minimum-2655090
Maximum6.1202052 × 108
Zeros665
Zeros (%)6.7%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:55:13.025706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2655090
5-th percentile0
Q1100000
median348480
Q31558248
95-th percentile18128390
Maximum6.1202052 × 108
Range6.1467561 × 108
Interquartile range (IQR)1458248

Descriptive statistics

Standard deviation13945014
Coefficient of variation (CV)3.7395786
Kurtosis487.1395
Mean3729033.5
Median Absolute Deviation (MAD)326110
Skewness15.740329
Sum3.7290335 × 1010
Variance1.9446341 × 1014
MonotonicityNot monotonic
2024-05-11T06:55:13.507431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 665
 
6.7%
200000 104
 
1.0%
100000 86
 
0.9%
300000 80
 
0.8%
150000 53
 
0.5%
48000 52
 
0.5%
50000 44
 
0.4%
400000 38
 
0.4%
250000 37
 
0.4%
30000 36
 
0.4%
Other values (7302) 8805
88.0%
ValueCountFrequency (%)
-2655090 1
 
< 0.1%
-500000 1
 
< 0.1%
-482880 1
 
< 0.1%
-360930 1
 
< 0.1%
-13064 1
 
< 0.1%
-390 1
 
< 0.1%
-360 1
 
< 0.1%
0 665
6.7%
4 2
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
612020516 1
< 0.1%
417669005 1
< 0.1%
266701030 1
< 0.1%
231403790 1
< 0.1%
195931570 1
< 0.1%
195235521 1
< 0.1%
178883300 1
< 0.1%
166855729 1
< 0.1%
147228005 1
< 0.1%
142955880 1
< 0.1%

Interactions

2024-05-11T06:55:07.246942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:13.797683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.299
금액0.2991.000

Missing values

2024-05-11T06:55:07.691779image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:08.012215image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
15941상암휴먼시아2단지아파트A12179501부과차익2021021403
40435동화히스토리A13572601지급수수료2021020
86683방화동부센트레빌A15722108소독비202102390000
58883공릉대아2차A13980604이자수익202102971582
87994가양중앙하이츠A15780701기타운영비용202102400000
15390래미안밤섬리베뉴 2A12170702기타부대비202102296700
24446면목신성미소지움A13181201공동가스료202102103810
86187마곡엠밸리7단지A15721007교육비20210238000
42073개포한신A13594402공동가스료202102670
1927신촌그랑자이아파트A10025003교육비20210248000
아파트명아파트코드비용명년월일금액
7852강남더샵포레스트A10027446위탁관리수수료2021022253614
89628염창현대1차A15786426세대전기료20210223514700
22805전농삼성A13085301연체료수익20210213970
9405대림쌍용플래티넘SA10027935세대전기료2021027388220
94205은평뉴타운마고정11단지A41279913경비비2021027681286
78904신도림현대A15286210보험료202102449240
45837정릉2차 e편한세상A13677101임대료수익202102755000
72904롯데캐슬엠파이어A15088614피복비202102195250
82875사당 대림아파트 관리사무소A15609005공동난방비20210296542830
16769SH성산아파트A12185003수선유지비2021024130107