Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 809 (8.1%) zerosZeros

Reproduction

Analysis started2024-05-11 07:00:26.880668
Analysis finished2024-05-11 07:00:29.557990
Duration2.68 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2104
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:29.989748image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1105
Min length2

Characters and Unicode

Total characters71105
Distinct characters432
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique103 ?
Unique (%)1.0%

Sample

1st row잠실우성1,2,3차
2nd row상암월드컵8단지
3rd row압구정미성1차
4th row등촌서광
5th row명일한양
ValueCountFrequency (%)
아파트 129
 
1.2%
래미안 25
 
0.2%
입주자대표회의 16
 
0.2%
힐스테이트 14
 
0.1%
북한산 14
 
0.1%
신동아파밀리에 14
 
0.1%
자양현대 14
 
0.1%
서초포레스타3단지 13
 
0.1%
서초참누리에코리치 13
 
0.1%
금호대우 12
 
0.1%
Other values (2157) 10216
97.5%
2024-05-11T07:00:31.537402image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2241
 
3.2%
2159
 
3.0%
1953
 
2.7%
1911
 
2.7%
1777
 
2.5%
1677
 
2.4%
1564
 
2.2%
1551
 
2.2%
1425
 
2.0%
1367
 
1.9%
Other values (422) 53480
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65316
91.9%
Decimal Number 3838
 
5.4%
Uppercase Letter 609
 
0.9%
Space Separator 519
 
0.7%
Lowercase Letter 287
 
0.4%
Open Punctuation 142
 
0.2%
Close Punctuation 142
 
0.2%
Other Punctuation 134
 
0.2%
Dash Punctuation 110
 
0.2%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2241
 
3.4%
2159
 
3.3%
1953
 
3.0%
1911
 
2.9%
1777
 
2.7%
1677
 
2.6%
1564
 
2.4%
1551
 
2.4%
1425
 
2.2%
1367
 
2.1%
Other values (376) 47691
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 110
18.1%
K 81
13.3%
C 54
8.9%
L 52
8.5%
H 48
7.9%
I 35
 
5.7%
M 34
 
5.6%
D 34
 
5.6%
G 33
 
5.4%
E 33
 
5.4%
Other values (7) 95
15.6%
Lowercase Letter
ValueCountFrequency (%)
e 161
56.1%
l 36
 
12.5%
i 30
 
10.5%
v 24
 
8.4%
w 10
 
3.5%
k 8
 
2.8%
s 8
 
2.8%
c 4
 
1.4%
a 2
 
0.7%
g 2
 
0.7%
Decimal Number
ValueCountFrequency (%)
1 1146
29.9%
2 1123
29.3%
3 559
14.6%
4 282
 
7.3%
5 183
 
4.8%
6 156
 
4.1%
7 104
 
2.7%
9 100
 
2.6%
8 98
 
2.6%
0 87
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 115
85.8%
. 19
 
14.2%
Space Separator
ValueCountFrequency (%)
519
100.0%
Open Punctuation
ValueCountFrequency (%)
( 142
100.0%
Close Punctuation
ValueCountFrequency (%)
) 142
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 110
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65316
91.9%
Common 4888
 
6.9%
Latin 901
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2241
 
3.4%
2159
 
3.3%
1953
 
3.0%
1911
 
2.9%
1777
 
2.7%
1677
 
2.6%
1564
 
2.4%
1551
 
2.4%
1425
 
2.2%
1367
 
2.1%
Other values (376) 47691
73.0%
Latin
ValueCountFrequency (%)
e 161
17.9%
S 110
12.2%
K 81
 
9.0%
C 54
 
6.0%
L 52
 
5.8%
H 48
 
5.3%
l 36
 
4.0%
I 35
 
3.9%
M 34
 
3.8%
D 34
 
3.8%
Other values (19) 256
28.4%
Common
ValueCountFrequency (%)
1 1146
23.4%
2 1123
23.0%
3 559
11.4%
519
10.6%
4 282
 
5.8%
5 183
 
3.7%
6 156
 
3.2%
( 142
 
2.9%
) 142
 
2.9%
, 115
 
2.4%
Other values (7) 521
10.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65316
91.9%
ASCII 5784
 
8.1%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2241
 
3.4%
2159
 
3.3%
1953
 
3.0%
1911
 
2.9%
1777
 
2.7%
1677
 
2.6%
1564
 
2.4%
1551
 
2.4%
1425
 
2.2%
1367
 
2.1%
Other values (376) 47691
73.0%
ASCII
ValueCountFrequency (%)
1 1146
19.8%
2 1123
19.4%
3 559
9.7%
519
 
9.0%
4 282
 
4.9%
5 183
 
3.2%
e 161
 
2.8%
6 156
 
2.7%
( 142
 
2.5%
) 142
 
2.5%
Other values (35) 1371
23.7%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2110
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:32.709424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique103 ?
Unique (%)1.0%

Sample

1st rowA13822702
2nd rowA12127008
3rd rowA13511001
4th rowA15784008
5th rowA13482603
ValueCountFrequency (%)
a14319003 14
 
0.1%
a13778211 13
 
0.1%
a13716003 13
 
0.1%
a13790703 12
 
0.1%
a13285305 12
 
0.1%
a13309404 12
 
0.1%
a13286107 12
 
0.1%
a15875102 11
 
0.1%
a15807208 11
 
0.1%
a10078901 11
 
0.1%
Other values (2100) 9879
98.8%
2024-05-11T07:00:34.163780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18326
20.4%
1 17778
19.8%
A 9993
11.1%
3 8952
9.9%
2 7997
8.9%
5 6081
 
6.8%
8 5757
 
6.4%
7 4886
 
5.4%
4 3805
 
4.2%
6 3451
 
3.8%
Other values (2) 2974
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18326
22.9%
1 17778
22.2%
3 8952
11.2%
2 7997
10.0%
5 6081
 
7.6%
8 5757
 
7.2%
7 4886
 
6.1%
4 3805
 
4.8%
6 3451
 
4.3%
9 2967
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18326
22.9%
1 17778
22.2%
3 8952
11.2%
2 7997
10.0%
5 6081
 
7.6%
8 5757
 
7.2%
7 4886
 
6.1%
4 3805
 
4.8%
6 3451
 
4.3%
9 2967
 
3.7%
Latin
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18326
20.4%
1 17778
19.8%
A 9993
11.1%
3 8952
9.9%
2 7997
8.9%
5 6081
 
6.8%
8 5757
 
6.4%
7 4886
 
5.4%
4 3805
 
4.2%
6 3451
 
3.8%
Other values (2) 2974
 
3.3%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:34.892338image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7723
Min length2

Characters and Unicode

Total characters47723
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row승강기유지비
2nd row세금과공과
3rd row공동수도료
4th row산재보험료
5th row음식물처리비
ValueCountFrequency (%)
수선유지비 244
 
2.4%
통신비 239
 
2.4%
승강기유지비 237
 
2.4%
잡수익 237
 
2.4%
경비비 235
 
2.4%
사무용품비 234
 
2.3%
입주자대표회의운영비 230
 
2.3%
소독비 227
 
2.3%
도서인쇄비 225
 
2.2%
소모품비 224
 
2.2%
Other values (76) 7668
76.7%
2024-05-11T07:00:36.334420image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5476
 
11.5%
3535
 
7.4%
2213
 
4.6%
1815
 
3.8%
1632
 
3.4%
1360
 
2.8%
1131
 
2.4%
880
 
1.8%
845
 
1.8%
820
 
1.7%
Other values (110) 28016
58.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47723
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5476
 
11.5%
3535
 
7.4%
2213
 
4.6%
1815
 
3.8%
1632
 
3.4%
1360
 
2.8%
1131
 
2.4%
880
 
1.8%
845
 
1.8%
820
 
1.7%
Other values (110) 28016
58.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47723
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5476
 
11.5%
3535
 
7.4%
2213
 
4.6%
1815
 
3.8%
1632
 
3.4%
1360
 
2.8%
1131
 
2.4%
880
 
1.8%
845
 
1.8%
820
 
1.7%
Other values (110) 28016
58.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47723
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5476
 
11.5%
3535
 
7.4%
2213
 
4.6%
1815
 
3.8%
1632
 
3.4%
1360
 
2.8%
1131
 
2.4%
880
 
1.8%
845
 
1.8%
820
 
1.7%
Other values (110) 28016
58.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201902
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201902
2nd row201902
3rd row201902
4th row201902
5th row201902

Common Values

ValueCountFrequency (%)
201902 10000
100.0%

Length

2024-05-11T07:00:36.992450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T07:00:37.448806image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201902 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7167
Distinct (%)71.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3379751.5
Minimum-22801593
Maximum4.1374388 × 108
Zeros809
Zeros (%)8.1%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T07:00:37.943282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-22801593
5-th percentile0
Q193302.5
median323245
Q31483027.5
95-th percentile16653808
Maximum4.1374388 × 108
Range4.3654547 × 108
Interquartile range (IQR)1389725

Descriptive statistics

Standard deviation12463725
Coefficient of variation (CV)3.6877637
Kurtosis243.96765
Mean3379751.5
Median Absolute Deviation (MAD)306358
Skewness12.238068
Sum3.3797515 × 1010
Variance1.5534443 × 1014
MonotonicityNot monotonic
2024-05-11T07:00:38.498886image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 809
 
8.1%
200000 88
 
0.9%
100000 80
 
0.8%
300000 69
 
0.7%
150000 56
 
0.6%
50000 48
 
0.5%
48000 42
 
0.4%
38000 38
 
0.4%
60000 36
 
0.4%
120000 35
 
0.4%
Other values (7157) 8699
87.0%
ValueCountFrequency (%)
-22801593 1
 
< 0.1%
-9205150 1
 
< 0.1%
-1362220 1
 
< 0.1%
-822570 1
 
< 0.1%
-748230 1
 
< 0.1%
-240000 1
 
< 0.1%
-2566 1
 
< 0.1%
-1230 1
 
< 0.1%
0 809
8.1%
1 2
 
< 0.1%
ValueCountFrequency (%)
413743878 1
< 0.1%
296422958 1
< 0.1%
274895170 1
< 0.1%
232765297 1
< 0.1%
207245470 1
< 0.1%
204036270 1
< 0.1%
196694230 1
< 0.1%
196279850 1
< 0.1%
184458900 1
< 0.1%
178320458 1
< 0.1%

Interactions

2024-05-11T07:00:28.536670image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T07:00:38.891895image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.523
금액0.5231.000

Missing values

2024-05-11T07:00:28.951451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T07:00:29.414958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
47948잠실우성1,2,3차A13822702승강기유지비2019025491200
10638상암월드컵8단지A12127008세금과공과2019022860
33350압구정미성1차A13511001공동수도료2019020
83052등촌서광A15784008산재보험료201902162620
31136명일한양A13482603음식물처리비2019021824000
35605개포1차2차우성A13528105재활용품수익201902535800
31510둔촌역청구아파트A13484501입주자대표회의운영비201902255560
1003래미안미드카운티A10026232위탁관리수수료2019021134350
71677오류금강수목원A15210211검침수익201902266600
86563목동1단지A15875101선거관리위원회운영비2019021977300
아파트명아파트코드비용명년월일금액
16081답십리우성그린A13003404교통비2019022600
17918이문e-편한세상A13082805장기수선비20190218493420
78963신대방우성2차A15685201경비비2019025611155
26359금호대우A13309404보험료2019021212810
51573중계한화꿈에그린더퍼스트A13922003건강보험료201902404750
87175양천벽산블루밍A15883201공동주택지원금수익20190276160
15464신성수정A12289402입주자대표회의운영비201902429880
69593봉천벽산타운2차A15180701퇴직급여201902391880
77958남성두산위브트레지움A15677501광고료수익201902393500
65031문래현대6차아파트A15009605기타사용료201902150000