Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1301 (13.0%) zerosZeros

Reproduction

Analysis started2024-05-11 06:58:43.466666
Analysis finished2024-05-11 06:58:46.000534
Duration2.53 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2105
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:46.337078image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2097
Min length2

Characters and Unicode

Total characters72097
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st row상계마들대림
2nd row쌍용예가클래식
3rd row송파꿈에그린아파트
4th row롯데캐슬천지인
5th row신도림대림3차
ValueCountFrequency (%)
아파트 116
 
1.1%
래미안 26
 
0.2%
힐스테이트 19
 
0.2%
관리사무소 16
 
0.2%
아이파크 15
 
0.1%
신반포 14
 
0.1%
중림삼성사이버빌리지 14
 
0.1%
신동아파밀리에 13
 
0.1%
신길우성2차 13
 
0.1%
월드컵참누리 13
 
0.1%
Other values (2162) 10281
97.5%
2024-05-11T06:58:47.607638image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2303
 
3.2%
2252
 
3.1%
1986
 
2.8%
1875
 
2.6%
1859
 
2.6%
1662
 
2.3%
1600
 
2.2%
1538
 
2.1%
1440
 
2.0%
1342
 
1.9%
Other values (421) 54240
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66129
91.7%
Decimal Number 3839
 
5.3%
Uppercase Letter 655
 
0.9%
Space Separator 596
 
0.8%
Lowercase Letter 313
 
0.4%
Open Punctuation 142
 
0.2%
Close Punctuation 142
 
0.2%
Dash Punctuation 137
 
0.2%
Other Punctuation 131
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2303
 
3.5%
2252
 
3.4%
1986
 
3.0%
1875
 
2.8%
1859
 
2.8%
1662
 
2.5%
1600
 
2.4%
1538
 
2.3%
1440
 
2.2%
1342
 
2.0%
Other values (375) 48272
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 140
21.4%
K 89
13.6%
C 72
11.0%
H 53
 
8.1%
L 45
 
6.9%
M 42
 
6.4%
D 42
 
6.4%
E 38
 
5.8%
I 26
 
4.0%
G 25
 
3.8%
Other values (7) 83
12.7%
Lowercase Letter
ValueCountFrequency (%)
e 161
51.4%
l 40
 
12.8%
i 31
 
9.9%
v 24
 
7.7%
k 15
 
4.8%
c 12
 
3.8%
s 12
 
3.8%
w 7
 
2.2%
a 4
 
1.3%
g 4
 
1.3%
Decimal Number
ValueCountFrequency (%)
2 1154
30.1%
1 1103
28.7%
3 553
14.4%
4 261
 
6.8%
5 185
 
4.8%
6 178
 
4.6%
7 120
 
3.1%
9 116
 
3.0%
8 93
 
2.4%
0 76
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 115
87.8%
. 16
 
12.2%
Space Separator
ValueCountFrequency (%)
596
100.0%
Open Punctuation
ValueCountFrequency (%)
( 142
100.0%
Close Punctuation
ValueCountFrequency (%)
) 142
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 137
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%
Math Symbol
ValueCountFrequency (%)
~ 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66129
91.7%
Common 4993
 
6.9%
Latin 975
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2303
 
3.5%
2252
 
3.4%
1986
 
3.0%
1875
 
2.8%
1859
 
2.8%
1662
 
2.5%
1600
 
2.4%
1538
 
2.3%
1440
 
2.2%
1342
 
2.0%
Other values (375) 48272
73.0%
Latin
ValueCountFrequency (%)
e 161
16.5%
S 140
14.4%
K 89
 
9.1%
C 72
 
7.4%
H 53
 
5.4%
L 45
 
4.6%
M 42
 
4.3%
D 42
 
4.3%
l 40
 
4.1%
E 38
 
3.9%
Other values (19) 253
25.9%
Common
ValueCountFrequency (%)
2 1154
23.1%
1 1103
22.1%
596
11.9%
3 553
11.1%
4 261
 
5.2%
5 185
 
3.7%
6 178
 
3.6%
( 142
 
2.8%
) 142
 
2.8%
- 137
 
2.7%
Other values (7) 542
10.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66129
91.7%
ASCII 5961
 
8.3%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2303
 
3.5%
2252
 
3.4%
1986
 
3.0%
1875
 
2.8%
1859
 
2.8%
1662
 
2.5%
1600
 
2.4%
1538
 
2.3%
1440
 
2.2%
1342
 
2.0%
Other values (375) 48272
73.0%
ASCII
ValueCountFrequency (%)
2 1154
19.4%
1 1103
18.5%
596
10.0%
3 553
 
9.3%
4 261
 
4.4%
5 185
 
3.1%
6 178
 
3.0%
e 161
 
2.7%
( 142
 
2.4%
) 142
 
2.4%
Other values (35) 1486
24.9%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2110
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:48.666552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st rowA13982702
2nd rowA13782601
3rd rowA13876114
4th rowA11087601
5th rowA15288802
ValueCountFrequency (%)
a10085903 14
 
0.1%
a15086007 13
 
0.1%
a12187906 13
 
0.1%
a15089513 12
 
0.1%
a13880105 12
 
0.1%
a13982002 12
 
0.1%
a12007001 12
 
0.1%
a15086601 12
 
0.1%
a13082501 12
 
0.1%
a15603205 11
 
0.1%
Other values (2100) 9877
98.8%
2024-05-11T06:58:50.166849image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18415
20.5%
1 17604
19.6%
A 9997
11.1%
3 8765
9.7%
2 8146
9.1%
5 6325
 
7.0%
8 5822
 
6.5%
7 4815
 
5.3%
4 3712
 
4.1%
6 3458
 
3.8%
Other values (2) 2941
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18415
23.0%
1 17604
22.0%
3 8765
11.0%
2 8146
10.2%
5 6325
 
7.9%
8 5822
 
7.3%
7 4815
 
6.0%
4 3712
 
4.6%
6 3458
 
4.3%
9 2938
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9997
> 99.9%
B 3
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18415
23.0%
1 17604
22.0%
3 8765
11.0%
2 8146
10.2%
5 6325
 
7.9%
8 5822
 
7.3%
7 4815
 
6.0%
4 3712
 
4.6%
6 3458
 
4.3%
9 2938
 
3.7%
Latin
ValueCountFrequency (%)
A 9997
> 99.9%
B 3
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18415
20.5%
1 17604
19.6%
A 9997
11.1%
3 8765
9.7%
2 8146
9.1%
5 6325
 
7.0%
8 5822
 
6.5%
7 4815
 
5.3%
4 3712
 
4.1%
6 3458
 
3.8%
Other values (2) 2941
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:58:51.090994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8994
Min length2

Characters and Unicode

Total characters48994
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row도서인쇄비
2nd row국민연금
3rd row세대전기료
4th row입주자대표회의운영비
5th row복리후생비
ValueCountFrequency (%)
사무용품비 235
 
2.4%
급여 226
 
2.3%
교육비 217
 
2.2%
청소비 216
 
2.2%
승강기유지비 215
 
2.1%
도서인쇄비 214
 
2.1%
잡수익 213
 
2.1%
경비비 212
 
2.1%
통신비 208
 
2.1%
건강보험료 207
 
2.1%
Other values (77) 7837
78.4%
2024-05-11T06:58:52.453510image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5459
 
11.1%
3506
 
7.2%
2049
 
4.2%
1985
 
4.1%
1761
 
3.6%
1290
 
2.6%
1046
 
2.1%
811
 
1.7%
783
 
1.6%
766
 
1.6%
Other values (110) 29538
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48994
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5459
 
11.1%
3506
 
7.2%
2049
 
4.2%
1985
 
4.1%
1761
 
3.6%
1290
 
2.6%
1046
 
2.1%
811
 
1.7%
783
 
1.6%
766
 
1.6%
Other values (110) 29538
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48994
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5459
 
11.1%
3506
 
7.2%
2049
 
4.2%
1985
 
4.1%
1761
 
3.6%
1290
 
2.6%
1046
 
2.1%
811
 
1.7%
783
 
1.6%
766
 
1.6%
Other values (110) 29538
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48994
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5459
 
11.1%
3506
 
7.2%
2049
 
4.2%
1985
 
4.1%
1761
 
3.6%
1290
 
2.6%
1046
 
2.1%
811
 
1.7%
783
 
1.6%
766
 
1.6%
Other values (110) 29538
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201909
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201909
2nd row201909
3rd row201909
4th row201909
5th row201909

Common Values

ValueCountFrequency (%)
201909 10000
100.0%

Length

2024-05-11T06:58:52.970808image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:58:53.346264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201909 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6804
Distinct (%)68.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2846839.8
Minimum-2341030
Maximum3.0639364 × 108
Zeros1301
Zeros (%)13.0%
Negative10
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:58:53.893908image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2341030
5-th percentile0
Q150627.5
median300000
Q31315475
95-th percentile14019538
Maximum3.0639364 × 108
Range3.0873467 × 108
Interquartile range (IQR)1264847.5

Descriptive statistics

Standard deviation9960178.5
Coefficient of variation (CV)3.498679
Kurtosis228.60814
Mean2846839.8
Median Absolute Deviation (MAD)300000
Skewness11.585139
Sum2.8468398 × 1010
Variance9.9205156 × 1013
MonotonicityNot monotonic
2024-05-11T06:58:54.612345image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1301
 
13.0%
38000 116
 
1.2%
200000 96
 
1.0%
100000 56
 
0.6%
300000 55
 
0.5%
150000 52
 
0.5%
500000 41
 
0.4%
250000 40
 
0.4%
50000 39
 
0.4%
400000 32
 
0.3%
Other values (6794) 8172
81.7%
ValueCountFrequency (%)
-2341030 1
< 0.1%
-562240 1
< 0.1%
-533540 1
< 0.1%
-446100 1
< 0.1%
-360960 1
< 0.1%
-150000 1
< 0.1%
-7840 1
< 0.1%
-3000 1
< 0.1%
-990 1
< 0.1%
-5 1
< 0.1%
ValueCountFrequency (%)
306393640 1
< 0.1%
269113434 1
< 0.1%
229207385 1
< 0.1%
205069310 1
< 0.1%
195199255 1
< 0.1%
162612000 1
< 0.1%
147483720 1
< 0.1%
131728816 1
< 0.1%
128820450 1
< 0.1%
122617270 1
< 0.1%

Interactions

2024-05-11T06:58:44.827578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:58:54.980246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.379
금액0.3791.000

Missing values

2024-05-11T06:58:45.457869image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:58:45.855978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
61863상계마들대림A13982702도서인쇄비201909464480
49356쌍용예가클래식A13782601국민연금201909395860
55146송파꿈에그린아파트A13876114세대전기료20190972627720
8754롯데캐슬천지인A11087601입주자대표회의운영비201909660000
82768신도림대림3차A15288802복리후생비201909140000
78300신림푸르지오A15190705감가상각비201909496973
63422중계주공10단지A13986004세대전기료2019091346380
1407e편한세상화랑대아파트A10025855장기수선비2019094235160
48173풍림아이원플러스A13707203통신비20190918369
84083독산신도브래뉴A15382301급여2019098351100
아파트명아파트코드비용명년월일금액
20048장안현대홈타운A13010006제수당2019093599350
31167래미안옥수리버젠A13375907세금과공과2019090
1095이편한세상 상도 노빌리티A10025768소모품비2019090
93351방화6단지A15785612잡수익201909500
47972방배대우효령A13706303이자수익2019091960
93125마곡현대아파트A15784601고용안정사업수익201909880000
54071잠실동트리지움A13822002교육비201909219000
27239창동현대A13204503충당부채전입이자비용2019090
48382서초더샵포레A13718001제수당2019093417030
4878힐스테이트 송파위례아파트A10027461연차수당2019091461900