Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 649 (6.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:52:47.501782
Analysis finished2024-05-11 06:52:49.710644
Duration2.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2232
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:49.978199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3695
Min length2

Characters and Unicode

Total characters73695
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique142 ?
Unique (%)1.4%

Sample

1st row고척벽산블루밍
2nd row당산푸르지오
3rd row오류금강수목원
4th row사당롯데캐슬골든포레
5th row당산효성1차
ValueCountFrequency (%)
아파트 171
 
1.6%
래미안 42
 
0.4%
e편한세상 37
 
0.3%
아이파크 21
 
0.2%
해모로 16
 
0.1%
sk뷰 15
 
0.1%
마포 15
 
0.1%
푸르지오 14
 
0.1%
보라매 13
 
0.1%
래미안밤섬리베뉴 13
 
0.1%
Other values (2311) 10481
96.7%
2024-05-11T06:52:51.169738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2587
 
3.5%
2498
 
3.4%
2327
 
3.2%
1829
 
2.5%
1827
 
2.5%
1657
 
2.2%
1468
 
2.0%
1432
 
1.9%
1424
 
1.9%
1361
 
1.8%
Other values (425) 55285
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67437
91.5%
Decimal Number 3705
 
5.0%
Space Separator 926
 
1.3%
Uppercase Letter 733
 
1.0%
Lowercase Letter 306
 
0.4%
Close Punctuation 161
 
0.2%
Open Punctuation 161
 
0.2%
Dash Punctuation 135
 
0.2%
Other Punctuation 125
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2587
 
3.8%
2498
 
3.7%
2327
 
3.5%
1829
 
2.7%
1827
 
2.7%
1657
 
2.5%
1468
 
2.2%
1432
 
2.1%
1424
 
2.1%
1361
 
2.0%
Other values (380) 49027
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 118
16.1%
C 108
14.7%
K 79
10.8%
M 74
10.1%
D 74
10.1%
L 54
7.4%
H 49
6.7%
E 40
 
5.5%
I 26
 
3.5%
G 26
 
3.5%
Other values (7) 85
11.6%
Lowercase Letter
ValueCountFrequency (%)
e 180
58.8%
l 34
 
11.1%
i 24
 
7.8%
v 20
 
6.5%
s 17
 
5.6%
k 15
 
4.9%
w 7
 
2.3%
h 5
 
1.6%
c 2
 
0.7%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 1162
31.4%
2 1045
28.2%
3 489
13.2%
4 264
 
7.1%
5 230
 
6.2%
6 164
 
4.4%
7 115
 
3.1%
9 97
 
2.6%
0 70
 
1.9%
8 69
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 105
84.0%
. 20
 
16.0%
Space Separator
ValueCountFrequency (%)
926
100.0%
Close Punctuation
ValueCountFrequency (%)
) 161
100.0%
Open Punctuation
ValueCountFrequency (%)
( 161
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 135
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67437
91.5%
Common 5213
 
7.1%
Latin 1045
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2587
 
3.8%
2498
 
3.7%
2327
 
3.5%
1829
 
2.7%
1827
 
2.7%
1657
 
2.5%
1468
 
2.2%
1432
 
2.1%
1424
 
2.1%
1361
 
2.0%
Other values (380) 49027
72.7%
Latin
ValueCountFrequency (%)
e 180
17.2%
S 118
11.3%
C 108
10.3%
K 79
 
7.6%
M 74
 
7.1%
D 74
 
7.1%
L 54
 
5.2%
H 49
 
4.7%
E 40
 
3.8%
l 34
 
3.3%
Other values (19) 235
22.5%
Common
ValueCountFrequency (%)
1 1162
22.3%
2 1045
20.0%
926
17.8%
3 489
9.4%
4 264
 
5.1%
5 230
 
4.4%
6 164
 
3.1%
) 161
 
3.1%
( 161
 
3.1%
- 135
 
2.6%
Other values (6) 476
9.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67437
91.5%
ASCII 6252
 
8.5%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2587
 
3.8%
2498
 
3.7%
2327
 
3.5%
1829
 
2.7%
1827
 
2.7%
1657
 
2.5%
1468
 
2.2%
1432
 
2.1%
1424
 
2.1%
1361
 
2.0%
Other values (380) 49027
72.7%
ASCII
ValueCountFrequency (%)
1 1162
18.6%
2 1045
16.7%
926
14.8%
3 489
 
7.8%
4 264
 
4.2%
5 230
 
3.7%
e 180
 
2.9%
6 164
 
2.6%
) 161
 
2.6%
( 161
 
2.6%
Other values (34) 1470
23.5%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2238
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:52.174852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique142 ?
Unique (%)1.4%

Sample

1st rowA15283711
2nd rowA15003802
3rd rowA15210211
4th rowA10025013
5th rowA15004506
ValueCountFrequency (%)
a10027553 12
 
0.1%
a13610003 12
 
0.1%
a14319303 12
 
0.1%
a13822004 12
 
0.1%
a15807210 11
 
0.1%
a15106901 11
 
0.1%
a13203401 11
 
0.1%
a13585404 11
 
0.1%
a10024814 11
 
0.1%
a13822702 11
 
0.1%
Other values (2228) 9886
98.9%
2024-05-11T06:52:53.877500image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18665
20.7%
1 17366
19.3%
A 9996
11.1%
3 8786
9.8%
2 8354
9.3%
5 6228
 
6.9%
8 5524
 
6.1%
7 4676
 
5.2%
4 3996
 
4.4%
6 3398
 
3.8%
Other values (2) 3011
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18665
23.3%
1 17366
21.7%
3 8786
11.0%
2 8354
10.4%
5 6228
 
7.8%
8 5524
 
6.9%
7 4676
 
5.8%
4 3996
 
5.0%
6 3398
 
4.2%
9 3007
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18665
23.3%
1 17366
21.7%
3 8786
11.0%
2 8354
10.4%
5 6228
 
7.8%
8 5524
 
6.9%
7 4676
 
5.8%
4 3996
 
5.0%
6 3398
 
4.2%
9 3007
 
3.8%
Latin
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18665
20.7%
1 17366
19.3%
A 9996
11.1%
3 8786
9.8%
2 8354
9.3%
5 6228
 
6.9%
8 5524
 
6.1%
7 4676
 
5.2%
4 3996
 
4.4%
6 3398
 
3.8%
Other values (2) 3011
 
3.3%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:52:54.563514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8051
Min length2

Characters and Unicode

Total characters48051
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row연체료수익
2nd row승강기유지비
3rd row잡수익
4th row연차수당
5th row장기수선비
ValueCountFrequency (%)
세대전기료 251
 
2.5%
경비비 247
 
2.5%
사무용품비 245
 
2.5%
급여 238
 
2.4%
이자수익 233
 
2.3%
연체료수익 231
 
2.3%
소독비 229
 
2.3%
입주자대표회의운영비 227
 
2.3%
청소비 225
 
2.2%
보험료 223
 
2.2%
Other values (76) 7651
76.5%
2024-05-11T06:52:56.251637image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5301
 
11.0%
3650
 
7.6%
2255
 
4.7%
1971
 
4.1%
1709
 
3.6%
1342
 
2.8%
1111
 
2.3%
880
 
1.8%
841
 
1.8%
796
 
1.7%
Other values (110) 28195
58.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48051
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5301
 
11.0%
3650
 
7.6%
2255
 
4.7%
1971
 
4.1%
1709
 
3.6%
1342
 
2.8%
1111
 
2.3%
880
 
1.8%
841
 
1.8%
796
 
1.7%
Other values (110) 28195
58.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48051
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5301
 
11.0%
3650
 
7.6%
2255
 
4.7%
1971
 
4.1%
1709
 
3.6%
1342
 
2.8%
1111
 
2.3%
880
 
1.8%
841
 
1.8%
796
 
1.7%
Other values (110) 28195
58.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48051
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5301
 
11.0%
3650
 
7.6%
2255
 
4.7%
1971
 
4.1%
1709
 
3.6%
1342
 
2.8%
1111
 
2.3%
880
 
1.8%
841
 
1.8%
796
 
1.7%
Other values (110) 28195
58.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202203
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202203
2nd row202203
3rd row202203
4th row202203
5th row202203

Common Values

ValueCountFrequency (%)
202203 10000
100.0%

Length

2024-05-11T06:52:56.686601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:52:56.974378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202203 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7506
Distinct (%)75.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3927467.3
Minimum-4334120
Maximum7.9351297 × 108
Zeros649
Zeros (%)6.5%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:52:57.319953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4334120
5-th percentile0
Q193935
median340000
Q31514390
95-th percentile18403134
Maximum7.9351297 × 108
Range7.9784709 × 108
Interquartile range (IQR)1420455

Descriptive statistics

Standard deviation17071254
Coefficient of variation (CV)4.3466319
Kurtosis648.59222
Mean3927467.3
Median Absolute Deviation (MAD)319875
Skewness19.652757
Sum3.9274673 × 1010
Variance2.9142773 × 1014
MonotonicityNot monotonic
2024-05-11T06:52:58.154204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 649
 
6.5%
23000 88
 
0.9%
200000 77
 
0.8%
100000 68
 
0.7%
300000 56
 
0.6%
150000 41
 
0.4%
400000 35
 
0.4%
50000 33
 
0.3%
250000 30
 
0.3%
180000 28
 
0.3%
Other values (7496) 8895
88.9%
ValueCountFrequency (%)
-4334120 1
 
< 0.1%
-3394270 1
 
< 0.1%
-700000 1
 
< 0.1%
-140000 1
 
< 0.1%
-85000 1
 
< 0.1%
-26561 1
 
< 0.1%
-3605 1
 
< 0.1%
0 649
6.5%
3 1
 
< 0.1%
6 1
 
< 0.1%
ValueCountFrequency (%)
793512970 1
< 0.1%
495383990 1
< 0.1%
415458580 1
< 0.1%
380496027 1
< 0.1%
365615832 1
< 0.1%
356864110 1
< 0.1%
280362004 1
< 0.1%
274812480 1
< 0.1%
191237780 1
< 0.1%
188220812 1
< 0.1%

Interactions

2024-05-11T06:52:48.582254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:52:58.503199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.303
금액0.3031.000

Missing values

2024-05-11T06:52:49.114426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:52:49.489352image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
81345고척벽산블루밍A15283711연체료수익20220339580
72192당산푸르지오A15003802승강기유지비202203858000
80200오류금강수목원A15210211잡수익202203206000
3745사당롯데캐슬골든포레A10025013연차수당2022031660670
72659당산효성1차A15004506장기수선비2022036878340
77622봉천두산1,2단지A15106901주차장수익2022038463637
60742중계4단지목화A13972603주차장수익2022031850000
87854신대방우성2차A15685201세금과공과202203180000
27116면목삼호A13184401퇴직급여202203398190
84704대방1차e편한세상A15602007세대전기료20220329179018
아파트명아파트코드비용명년월일금액
40026삼성롯데캐슬프레미어A13509010제수당2022032442290
23776전농동아임대A13071301복리후생비202203542230
9602마곡엠밸리11단지A10027371퇴직급여2022031344300
20813신사라이프씨티A12208104세금과공과202203230850
62815상계양우아파트A13982103이자수익20220323495
38338둔촌현대4차A13481802검침수익20220397720
14274홍제유원하나A12009304청소비2022037365970
83202시흥성지A15303103산재보험료20220391440
57851가락삼익맨숀A13885306광고료수익2022031450000
89676마곡푸르지오A15722004감가상각비20220378750