Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1329 (13.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:57.879466
Analysis finished2024-05-11 06:55:59.521063
Duration1.64 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2104
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:59.875397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.215
Min length2

Characters and Unicode

Total characters72150
Distinct characters428
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)1.0%

Sample

1st row중계금호
2nd row용산파크자이
3rd row천호e-편한세상
4th row방화12단지(중앙)
5th row자양현대
ValueCountFrequency (%)
아파트 165
 
1.5%
래미안 43
 
0.4%
아이파크 24
 
0.2%
신반포 22
 
0.2%
힐스테이트 20
 
0.2%
신내 18
 
0.2%
팰리스 14
 
0.1%
e편한세상 14
 
0.1%
고덕현대 14
 
0.1%
신반포한신5지구(12,13,18차 14
 
0.1%
Other values (2172) 10408
96.8%
2024-05-11T06:56:00.795297image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2571
 
3.6%
2447
 
3.4%
2290
 
3.2%
1812
 
2.5%
1654
 
2.3%
1611
 
2.2%
1505
 
2.1%
1388
 
1.9%
1346
 
1.9%
1288
 
1.8%
Other values (418) 54238
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66137
91.7%
Decimal Number 3437
 
4.8%
Space Separator 847
 
1.2%
Uppercase Letter 682
 
0.9%
Lowercase Letter 416
 
0.6%
Open Punctuation 163
 
0.2%
Close Punctuation 163
 
0.2%
Dash Punctuation 151
 
0.2%
Other Punctuation 148
 
0.2%
Math Symbol 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2571
 
3.9%
2447
 
3.7%
2290
 
3.5%
1812
 
2.7%
1654
 
2.5%
1611
 
2.4%
1505
 
2.3%
1388
 
2.1%
1346
 
2.0%
1288
 
1.9%
Other values (372) 48225
72.9%
Uppercase Letter
ValueCountFrequency (%)
C 110
16.1%
S 103
15.1%
D 83
12.2%
M 83
12.2%
K 72
10.6%
L 43
 
6.3%
H 42
 
6.2%
G 29
 
4.3%
I 27
 
4.0%
E 23
 
3.4%
Other values (7) 67
9.8%
Lowercase Letter
ValueCountFrequency (%)
e 201
48.3%
l 48
 
11.5%
i 41
 
9.9%
v 29
 
7.0%
k 25
 
6.0%
s 23
 
5.5%
c 18
 
4.3%
w 10
 
2.4%
h 7
 
1.7%
a 7
 
1.7%
Decimal Number
ValueCountFrequency (%)
2 1041
30.3%
1 1014
29.5%
3 451
13.1%
4 269
 
7.8%
5 195
 
5.7%
6 150
 
4.4%
7 89
 
2.6%
8 89
 
2.6%
9 83
 
2.4%
0 56
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 127
85.8%
. 21
 
14.2%
Space Separator
ValueCountFrequency (%)
847
100.0%
Open Punctuation
ValueCountFrequency (%)
( 163
100.0%
Close Punctuation
ValueCountFrequency (%)
) 163
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 151
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66137
91.7%
Common 4913
 
6.8%
Latin 1100
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2571
 
3.9%
2447
 
3.7%
2290
 
3.5%
1812
 
2.7%
1654
 
2.5%
1611
 
2.4%
1505
 
2.3%
1388
 
2.1%
1346
 
2.0%
1288
 
1.9%
Other values (372) 48225
72.9%
Latin
ValueCountFrequency (%)
e 201
18.3%
C 110
10.0%
S 103
 
9.4%
D 83
 
7.5%
M 83
 
7.5%
K 72
 
6.5%
l 48
 
4.4%
L 43
 
3.9%
H 42
 
3.8%
i 41
 
3.7%
Other values (19) 274
24.9%
Common
ValueCountFrequency (%)
2 1041
21.2%
1 1014
20.6%
847
17.2%
3 451
9.2%
4 269
 
5.5%
5 195
 
4.0%
( 163
 
3.3%
) 163
 
3.3%
- 151
 
3.1%
6 150
 
3.1%
Other values (7) 469
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66137
91.7%
ASCII 6011
 
8.3%
Number Forms 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2571
 
3.9%
2447
 
3.7%
2290
 
3.5%
1812
 
2.7%
1654
 
2.5%
1611
 
2.4%
1505
 
2.3%
1388
 
2.1%
1346
 
2.0%
1288
 
1.9%
Other values (372) 48225
72.9%
ASCII
ValueCountFrequency (%)
2 1041
17.3%
1 1014
16.9%
847
14.1%
3 451
 
7.5%
4 269
 
4.5%
e 201
 
3.3%
5 195
 
3.2%
( 163
 
2.7%
) 163
 
2.7%
- 151
 
2.5%
Other values (35) 1516
25.2%
Number Forms
ValueCountFrequency (%)
2
100.0%
Distinct2111
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:01.513875image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)1.0%

Sample

1st rowA13922904
2nd rowA14075201
3rd rowA13402202
4th rowA15777501
5th rowA14319003
ValueCountFrequency (%)
a13790726 14
 
0.1%
a13872504 13
 
0.1%
a10025245 13
 
0.1%
a15209203 12
 
0.1%
a15105008 12
 
0.1%
a15606002 12
 
0.1%
a14006001 12
 
0.1%
a15703301 11
 
0.1%
a15284101 11
 
0.1%
a13703027 11
 
0.1%
Other values (2101) 9879
98.8%
2024-05-11T06:56:02.554100image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18890
21.0%
1 17584
19.5%
A 10000
11.1%
3 8936
9.9%
2 8246
9.2%
5 6178
 
6.9%
8 5519
 
6.1%
7 4550
 
5.1%
4 3794
 
4.2%
6 3499
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18890
23.6%
1 17584
22.0%
3 8936
11.2%
2 8246
10.3%
5 6178
 
7.7%
8 5519
 
6.9%
7 4550
 
5.7%
4 3794
 
4.7%
6 3499
 
4.4%
9 2804
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18890
23.6%
1 17584
22.0%
3 8936
11.2%
2 8246
10.3%
5 6178
 
7.7%
8 5519
 
6.9%
7 4550
 
5.7%
4 3794
 
4.7%
6 3499
 
4.4%
9 2804
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18890
21.0%
1 17584
19.5%
A 10000
11.1%
3 8936
9.9%
2 8246
9.2%
5 6178
 
6.9%
8 5519
 
6.1%
7 4550
 
5.1%
4 3794
 
4.2%
6 3499
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:02.937489image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9218
Min length2

Characters and Unicode

Total characters49218
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row공동수도료
2nd row입주자대표회의운영비
3rd row잡수익
4th row퇴직급여
5th row제수당
ValueCountFrequency (%)
경비비 220
 
2.2%
소독비 219
 
2.2%
사무용품비 218
 
2.2%
수선유지비 210
 
2.1%
통신비 210
 
2.1%
도서인쇄비 208
 
2.1%
연체료수익 208
 
2.1%
입주자대표회의운영비 206
 
2.1%
승강기유지비 205
 
2.1%
이자수익 205
 
2.1%
Other values (77) 7891
78.9%
2024-05-11T06:56:03.815422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5453
 
11.1%
3615
 
7.3%
2076
 
4.2%
2070
 
4.2%
1787
 
3.6%
1301
 
2.6%
1030
 
2.1%
872
 
1.8%
804
 
1.6%
760
 
1.5%
Other values (110) 29450
59.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49218
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5453
 
11.1%
3615
 
7.3%
2076
 
4.2%
2070
 
4.2%
1787
 
3.6%
1301
 
2.6%
1030
 
2.1%
872
 
1.8%
804
 
1.6%
760
 
1.5%
Other values (110) 29450
59.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49218
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5453
 
11.1%
3615
 
7.3%
2076
 
4.2%
2070
 
4.2%
1787
 
3.6%
1301
 
2.6%
1030
 
2.1%
872
 
1.8%
804
 
1.6%
760
 
1.5%
Other values (110) 29450
59.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49218
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5453
 
11.1%
3615
 
7.3%
2076
 
4.2%
2070
 
4.2%
1787
 
3.6%
1301
 
2.6%
1030
 
2.1%
872
 
1.8%
804
 
1.6%
760
 
1.5%
Other values (110) 29450
59.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202009
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202009
2nd row202009
3rd row202009
4th row202009
5th row202009

Common Values

ValueCountFrequency (%)
202009 10000
100.0%

Length

2024-05-11T06:56:04.151069image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:56:04.323713image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202009 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6787
Distinct (%)67.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2820248.3
Minimum-1.5238409 × 108
Maximum2.6898382 × 108
Zeros1329
Zeros (%)13.3%
Negative15
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:56:04.530074image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1.5238409 × 108
5-th percentile0
Q153724
median300000
Q31252961.5
95-th percentile14005270
Maximum2.6898382 × 108
Range4.2136791 × 108
Interquartile range (IQR)1199237.5

Descriptive statistics

Standard deviation10150224
Coefficient of variation (CV)3.5990533
Kurtosis163.72918
Mean2820248.3
Median Absolute Deviation (MAD)300000
Skewness9.4887116
Sum2.8202483 × 1010
Variance1.0302704 × 1014
MonotonicityNot monotonic
2024-05-11T06:56:04.803219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1329
 
13.3%
100000 83
 
0.8%
200000 81
 
0.8%
38000 77
 
0.8%
300000 60
 
0.6%
150000 42
 
0.4%
116000 40
 
0.4%
250000 36
 
0.4%
110000 29
 
0.3%
400000 29
 
0.3%
Other values (6777) 8194
81.9%
ValueCountFrequency (%)
-152384092 1
< 0.1%
-25368749 1
< 0.1%
-11718200 1
< 0.1%
-8230383 1
< 0.1%
-4886510 1
< 0.1%
-3389300 1
< 0.1%
-683500 1
< 0.1%
-500000 1
< 0.1%
-450000 1
< 0.1%
-275057 1
< 0.1%
ValueCountFrequency (%)
268983820 1
< 0.1%
251681830 1
< 0.1%
217931740 1
< 0.1%
174712945 1
< 0.1%
164233260 1
< 0.1%
160034000 1
< 0.1%
158103628 1
< 0.1%
131735740 1
< 0.1%
122075710 1
< 0.1%
119486150 1
< 0.1%

Interactions

2024-05-11T06:55:58.779958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:56:05.030690image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.517
금액0.5171.000

Missing values

2024-05-11T06:55:59.057393image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:59.365885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
62299중계금호A13922904공동수도료20200976320
70223용산파크자이A14075201입주자대표회의운영비2020091827390
36912천호e-편한세상A13402202잡수익20200914800
95831방화12단지(중앙)A15777501퇴직급여2020091439220
72549자양현대A14319003제수당2020091068680
9612강남 한신휴플러스 6단지A10027912광고료수익202009120000
80142신림2차푸르지오A15101503승강기유지비2020091158850
36145성수우방2차A13383301세금과공과2020090
16731메세나폴리스A12174601보험료2020091578750
89214상도sh-villeA15603004세금과공과20200912470
아파트명아파트코드비용명년월일금액
14649홍은현대A12084504회계감사비202009142300
50786롯데캐슬갤럭시2차A13703019연체료수익2020096050
22234답십리동아A13003406정화조관리비2020091170790
86898신도림대림7차e-편한세상A15288807회계감사비20200982500
92353상도쌍용A15683901세대전기료20200919907574
80646보라매삼성A15105004공동수도료202009974490
30878방학동부센트레빌A13272102퇴직급여202009726964
88987대방경남아너스빌A15602001경비비2020094770600
78971신길우성3차아파트A15086004자치활동비202009100000
65382상계주공12단지A13982202급여20200938068010