Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 20.47511623)Skewed
금액 has 1310 (13.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:53:54.820371
Analysis finished2024-05-11 06:53:56.446043
Duration1.63 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2116
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:56.792572image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.2907
Min length2

Characters and Unicode

Total characters72907
Distinct characters424
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st row고덕리엔파크2단지
2nd row암사삼성광나루
3rd row래미안 웰스트림
4th row래미안포레
5th rowDMC롯데캐슬더퍼스트
ValueCountFrequency (%)
아파트 167
 
1.5%
래미안 39
 
0.4%
아이파크 27
 
0.3%
e편한세상 25
 
0.2%
신반포 18
 
0.2%
센트럴 16
 
0.1%
북한산 15
 
0.1%
강남한신휴플러스 14
 
0.1%
sk뷰 14
 
0.1%
고덕 14
 
0.1%
Other values (2191) 10448
96.8%
2024-05-11T06:53:57.694366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2715
 
3.7%
2571
 
3.5%
2416
 
3.3%
1800
 
2.5%
1634
 
2.2%
1624
 
2.2%
1442
 
2.0%
1422
 
2.0%
1393
 
1.9%
1247
 
1.7%
Other values (414) 54643
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66946
91.8%
Decimal Number 3370
 
4.6%
Space Separator 915
 
1.3%
Uppercase Letter 808
 
1.1%
Lowercase Letter 333
 
0.5%
Open Punctuation 137
 
0.2%
Close Punctuation 137
 
0.2%
Other Punctuation 132
 
0.2%
Dash Punctuation 126
 
0.2%
Letter Number 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2715
 
4.1%
2571
 
3.8%
2416
 
3.6%
1800
 
2.7%
1634
 
2.4%
1624
 
2.4%
1442
 
2.2%
1422
 
2.1%
1393
 
2.1%
1247
 
1.9%
Other values (369) 48682
72.7%
Uppercase Letter
ValueCountFrequency (%)
C 138
17.1%
S 118
14.6%
K 97
12.0%
M 89
11.0%
D 89
11.0%
L 57
7.1%
H 38
 
4.7%
I 38
 
4.7%
G 37
 
4.6%
E 29
 
3.6%
Other values (7) 78
9.7%
Lowercase Letter
ValueCountFrequency (%)
e 197
59.2%
l 30
 
9.0%
i 26
 
7.8%
s 20
 
6.0%
v 19
 
5.7%
k 18
 
5.4%
w 7
 
2.1%
c 4
 
1.2%
h 4
 
1.2%
g 4
 
1.2%
Decimal Number
ValueCountFrequency (%)
2 1040
30.9%
1 972
28.8%
3 446
13.2%
4 233
 
6.9%
5 202
 
6.0%
6 144
 
4.3%
7 112
 
3.3%
8 81
 
2.4%
9 79
 
2.3%
0 61
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 102
77.3%
. 30
 
22.7%
Space Separator
ValueCountFrequency (%)
915
100.0%
Open Punctuation
ValueCountFrequency (%)
( 137
100.0%
Close Punctuation
ValueCountFrequency (%)
) 137
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 126
100.0%
Letter Number
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66946
91.8%
Common 4817
 
6.6%
Latin 1144
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2715
 
4.1%
2571
 
3.8%
2416
 
3.6%
1800
 
2.7%
1634
 
2.4%
1624
 
2.4%
1442
 
2.2%
1422
 
2.1%
1393
 
2.1%
1247
 
1.9%
Other values (369) 48682
72.7%
Latin
ValueCountFrequency (%)
e 197
17.2%
C 138
12.1%
S 118
10.3%
K 97
 
8.5%
M 89
 
7.8%
D 89
 
7.8%
L 57
 
5.0%
H 38
 
3.3%
I 38
 
3.3%
G 37
 
3.2%
Other values (19) 246
21.5%
Common
ValueCountFrequency (%)
2 1040
21.6%
1 972
20.2%
915
19.0%
3 446
9.3%
4 233
 
4.8%
5 202
 
4.2%
6 144
 
3.0%
( 137
 
2.8%
) 137
 
2.8%
- 126
 
2.6%
Other values (6) 465
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66946
91.8%
ASCII 5958
 
8.2%
Number Forms 3
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2715
 
4.1%
2571
 
3.8%
2416
 
3.6%
1800
 
2.7%
1634
 
2.4%
1624
 
2.4%
1442
 
2.2%
1422
 
2.1%
1393
 
2.1%
1247
 
1.9%
Other values (369) 48682
72.7%
ASCII
ValueCountFrequency (%)
2 1040
17.5%
1 972
16.3%
915
15.4%
3 446
 
7.5%
4 233
 
3.9%
5 202
 
3.4%
e 197
 
3.3%
6 144
 
2.4%
C 138
 
2.3%
( 137
 
2.3%
Other values (34) 1534
25.7%
Number Forms
ValueCountFrequency (%)
3
100.0%
Distinct2120
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:58.447481image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st rowA13410011
2nd rowA13405002
3rd rowA10027714
4th rowA13520001
5th rowA10024828
ValueCountFrequency (%)
a15381402 13
 
0.1%
a13471501 13
 
0.1%
a10024719 12
 
0.1%
a15780703 12
 
0.1%
a13307001 12
 
0.1%
a13887301 12
 
0.1%
a15080507 12
 
0.1%
a15720101 12
 
0.1%
a15209002 12
 
0.1%
a12187906 12
 
0.1%
Other values (2110) 9878
98.8%
2024-05-11T06:53:59.270195image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18894
21.0%
1 17530
19.5%
A 10000
11.1%
3 8967
10.0%
2 8314
9.2%
5 6165
 
6.9%
8 5375
 
6.0%
7 4578
 
5.1%
4 4022
 
4.5%
6 3325
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18894
23.6%
1 17530
21.9%
3 8967
11.2%
2 8314
10.4%
5 6165
 
7.7%
8 5375
 
6.7%
7 4578
 
5.7%
4 4022
 
5.0%
6 3325
 
4.2%
9 2830
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18894
23.6%
1 17530
21.9%
3 8967
11.2%
2 8314
10.4%
5 6165
 
7.7%
8 5375
 
6.7%
7 4578
 
5.7%
4 4022
 
5.0%
6 3325
 
4.2%
9 2830
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18894
21.0%
1 17530
19.5%
A 10000
11.1%
3 8967
10.0%
2 8314
9.2%
5 6165
 
6.9%
8 5375
 
6.0%
7 4578
 
5.1%
4 4022
 
4.5%
6 3325
 
3.7%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:53:59.866172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8789
Min length2

Characters and Unicode

Total characters48789
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row잡수익
2nd row급여
3rd row검침수익
4th row제수당
5th row복리후생비
ValueCountFrequency (%)
청소비 242
 
2.4%
보험료 239
 
2.4%
수선유지비 234
 
2.3%
세대전기료 229
 
2.3%
도서인쇄비 228
 
2.3%
통신비 224
 
2.2%
퇴직급여 220
 
2.2%
사무용품비 219
 
2.2%
교육비 212
 
2.1%
제수당 206
 
2.1%
Other values (76) 7747
77.5%
2024-05-11T06:54:00.584172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5358
 
11.0%
3552
 
7.3%
2096
 
4.3%
2014
 
4.1%
1645
 
3.4%
1301
 
2.7%
1072
 
2.2%
820
 
1.7%
818
 
1.7%
764
 
1.6%
Other values (110) 29349
60.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48789
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5358
 
11.0%
3552
 
7.3%
2096
 
4.3%
2014
 
4.1%
1645
 
3.4%
1301
 
2.7%
1072
 
2.2%
820
 
1.7%
818
 
1.7%
764
 
1.6%
Other values (110) 29349
60.2%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48789
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5358
 
11.0%
3552
 
7.3%
2096
 
4.3%
2014
 
4.1%
1645
 
3.4%
1301
 
2.7%
1072
 
2.2%
820
 
1.7%
818
 
1.7%
764
 
1.6%
Other values (110) 29349
60.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48789
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5358
 
11.0%
3552
 
7.3%
2096
 
4.3%
2014
 
4.1%
1645
 
3.4%
1301
 
2.7%
1072
 
2.2%
820
 
1.7%
818
 
1.7%
764
 
1.6%
Other values (110) 29349
60.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202109
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202109
2nd row202109
3rd row202109
4th row202109
5th row202109

Common Values

ValueCountFrequency (%)
202109 10000
100.0%

Length

2024-05-11T06:54:00.873326image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:54:01.043244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202109 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct6908
Distinct (%)69.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3117217.3
Minimum-7496475
Maximum6.0894896 × 108
Zeros1310
Zeros (%)13.1%
Negative10
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:54:01.243805image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-7496475
5-th percentile0
Q151667.5
median300000
Q31354005
95-th percentile15009725
Maximum6.0894896 × 108
Range6.1644544 × 108
Interquartile range (IQR)1302337.5

Descriptive statistics

Standard deviation13230692
Coefficient of variation (CV)4.2443919
Kurtosis696.67748
Mean3117217.3
Median Absolute Deviation (MAD)300000
Skewness20.475116
Sum3.1172173 × 1010
Variance1.750512 × 1014
MonotonicityNot monotonic
2024-05-11T06:54:01.656741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1310
 
13.1%
23000 83
 
0.8%
100000 67
 
0.7%
300000 65
 
0.7%
200000 61
 
0.6%
400000 38
 
0.4%
30000 36
 
0.4%
150000 35
 
0.4%
600000 31
 
0.3%
50000 30
 
0.3%
Other values (6898) 8244
82.4%
ValueCountFrequency (%)
-7496475 1
< 0.1%
-1418720 1
< 0.1%
-1291250 1
< 0.1%
-854050 1
< 0.1%
-738493 1
< 0.1%
-720827 1
< 0.1%
-282460 1
< 0.1%
-644 1
< 0.1%
-216 1
< 0.1%
-4 1
< 0.1%
ValueCountFrequency (%)
608948963 1
< 0.1%
466696704 1
< 0.1%
332839538 1
< 0.1%
301742206 1
< 0.1%
241122429 1
< 0.1%
216974823 1
< 0.1%
210755870 1
< 0.1%
209773233 1
< 0.1%
178201600 1
< 0.1%
151549674 1
< 0.1%

Interactions

2024-05-11T06:53:55.660525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:54:01.931251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.226
금액0.2261.000

Missing values

2024-05-11T06:53:56.033428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:53:56.348793image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
39452고덕리엔파크2단지A13410011잡수익202109556780
38230암사삼성광나루A13405002급여20210913319080
10358래미안 웰스트림A10027714검침수익202109359050
43026래미안포레A13520001제수당2021092114970
2260DMC롯데캐슬더퍼스트A10024828복리후생비2021091294700
34021왕십리풍림아이원A13302206부과차익202109877
37954강변그대가리버뷰A13402204피복비2021090
20486녹번역센트레빌A12201005입주자대표회의운영비202109763600
37162송정건영A13383702도서인쇄비202109165000
31990도봉파크빌2단지A13275303이자수익20210927012
아파트명아파트코드비용명년월일금액
51437돈암동부센트레빌A13681303수선유지비2021095458670
95523개화산동부센트레빌A15722102청소비2021092259280
59946가락금호A13880407급여20210918672910
60742한양아파트A13885102보험료202109526650
39724암사한강현대A13471501정화조관리비202109385183
74315구의강변우성A14320302고용안정사업비용202109420000
73254해모로A14286108피복비2021090
45309개나리SKVIEWA13579506도서인쇄비202109143000
28112신내4단지A13184609피복비20210962700
42175청담자이A13510007회계감사비202109133870