Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 25.94465494)Skewed
금액 has 2356 (23.6%) zerosZeros

Reproduction

Analysis started2024-05-11 05:58:12.982950
Analysis finished2024-05-11 05:58:14.209721
Duration1.23 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2233
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:14.517589image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.32
Min length2

Characters and Unicode

Total characters73200
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique136 ?
Unique (%)1.4%

Sample

1st row하월곡동신
2nd row홍제성원아파트
3rd row힐스테이트서초젠트리스
4th row월계6-2초안
5th row푸른마을아파트
ValueCountFrequency (%)
아파트 158
 
1.5%
래미안 32
 
0.3%
아이파크 22
 
0.2%
디에이치 19
 
0.2%
sk뷰 17
 
0.2%
경남아너스빌 16
 
0.1%
e편한세상 16
 
0.1%
해모로 14
 
0.1%
고덕 14
 
0.1%
도화현대1차아파트 14
 
0.1%
Other values (2313) 10396
97.0%
2024-05-11T14:58:15.226025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2510
 
3.4%
2422
 
3.3%
2230
 
3.0%
1873
 
2.6%
1750
 
2.4%
1736
 
2.4%
1498
 
2.0%
1470
 
2.0%
1450
 
2.0%
1354
 
1.8%
Other values (425) 54907
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67053
91.6%
Decimal Number 3655
 
5.0%
Uppercase Letter 835
 
1.1%
Space Separator 792
 
1.1%
Lowercase Letter 354
 
0.5%
Open Punctuation 130
 
0.2%
Close Punctuation 130
 
0.2%
Dash Punctuation 130
 
0.2%
Other Punctuation 114
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2510
 
3.7%
2422
 
3.6%
2230
 
3.3%
1873
 
2.8%
1750
 
2.6%
1736
 
2.6%
1498
 
2.2%
1470
 
2.2%
1450
 
2.2%
1354
 
2.0%
Other values (380) 48760
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 140
16.8%
C 116
13.9%
K 103
12.3%
M 82
9.8%
D 82
9.8%
L 53
 
6.3%
I 44
 
5.3%
E 43
 
5.1%
H 40
 
4.8%
G 30
 
3.6%
Other values (7) 102
12.2%
Lowercase Letter
ValueCountFrequency (%)
e 199
56.2%
l 39
 
11.0%
i 28
 
7.9%
v 21
 
5.9%
s 18
 
5.1%
k 16
 
4.5%
h 11
 
3.1%
w 8
 
2.3%
c 8
 
2.3%
g 3
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1112
30.4%
2 1039
28.4%
3 483
13.2%
4 262
 
7.2%
5 236
 
6.5%
6 158
 
4.3%
7 115
 
3.1%
9 88
 
2.4%
0 83
 
2.3%
8 79
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 101
88.6%
. 13
 
11.4%
Space Separator
ValueCountFrequency (%)
792
100.0%
Open Punctuation
ValueCountFrequency (%)
( 130
100.0%
Close Punctuation
ValueCountFrequency (%)
) 130
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 130
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67053
91.6%
Common 4951
 
6.8%
Latin 1196
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2510
 
3.7%
2422
 
3.6%
2230
 
3.3%
1873
 
2.8%
1750
 
2.6%
1736
 
2.6%
1498
 
2.2%
1470
 
2.2%
1450
 
2.2%
1354
 
2.0%
Other values (380) 48760
72.7%
Latin
ValueCountFrequency (%)
e 199
16.6%
S 140
11.7%
C 116
 
9.7%
K 103
 
8.6%
M 82
 
6.9%
D 82
 
6.9%
L 53
 
4.4%
I 44
 
3.7%
E 43
 
3.6%
H 40
 
3.3%
Other values (19) 294
24.6%
Common
ValueCountFrequency (%)
1 1112
22.5%
2 1039
21.0%
792
16.0%
3 483
9.8%
4 262
 
5.3%
5 236
 
4.8%
6 158
 
3.2%
( 130
 
2.6%
) 130
 
2.6%
- 130
 
2.6%
Other values (6) 479
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67053
91.6%
ASCII 6140
 
8.4%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2510
 
3.7%
2422
 
3.6%
2230
 
3.3%
1873
 
2.8%
1750
 
2.6%
1736
 
2.6%
1498
 
2.2%
1470
 
2.2%
1450
 
2.2%
1354
 
2.0%
Other values (380) 48760
72.7%
ASCII
ValueCountFrequency (%)
1 1112
18.1%
2 1039
16.9%
792
12.9%
3 483
 
7.9%
4 262
 
4.3%
5 236
 
3.8%
e 199
 
3.2%
6 158
 
2.6%
S 140
 
2.3%
( 130
 
2.1%
Other values (34) 1589
25.9%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2239
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:15.772398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique138 ?
Unique (%)1.4%

Sample

1st rowA13613005
2nd rowA12009201
3rd rowA10028046
4th rowA13905208
5th rowA13594203
ValueCountFrequency (%)
a12181406 14
 
0.1%
a12013003 12
 
0.1%
a13080401 12
 
0.1%
a13471501 12
 
0.1%
a13204301 11
 
0.1%
a13822002 11
 
0.1%
a10024216 11
 
0.1%
a15210211 11
 
0.1%
a13611007 11
 
0.1%
a13302204 11
 
0.1%
Other values (2229) 9884
98.8%
2024-05-11T14:58:16.659116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18454
20.5%
1 17649
19.6%
A 9996
11.1%
3 8863
9.8%
2 8300
9.2%
5 6144
 
6.8%
8 5568
 
6.2%
7 4596
 
5.1%
4 4072
 
4.5%
6 3345
 
3.7%
Other values (2) 3013
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18454
23.1%
1 17649
22.1%
3 8863
11.1%
2 8300
10.4%
5 6144
 
7.7%
8 5568
 
7.0%
7 4596
 
5.7%
4 4072
 
5.1%
6 3345
 
4.2%
9 3009
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18454
23.1%
1 17649
22.1%
3 8863
11.1%
2 8300
10.4%
5 6144
 
7.7%
8 5568
 
7.0%
7 4596
 
5.7%
4 4072
 
5.1%
6 3345
 
4.2%
9 3009
 
3.8%
Latin
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18454
20.5%
1 17649
19.6%
A 9996
11.1%
3 8863
9.8%
2 8300
9.2%
5 6144
 
6.8%
8 5568
 
6.2%
7 4596
 
5.1%
4 4072
 
4.5%
6 3345
 
3.7%
Other values (2) 3013
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:58:17.078750image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length5.9689
Min length2

Characters and Unicode

Total characters59689
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row연차수당충당부채
2nd row장기수선충당예금
3rd row미수수익
4th row장기수선충당부채
5th row가수금
ValueCountFrequency (%)
관리비미수금 330
 
3.3%
퇴직급여충당부채 319
 
3.2%
예금 315
 
3.1%
장기수선충당부채 311
 
3.1%
연차수당충당부채 308
 
3.1%
당기순이익 308
 
3.1%
장기수선충당예금 305
 
3.0%
공동주택적립금 302
 
3.0%
선급비용 300
 
3.0%
미처분이익잉여금 297
 
3.0%
Other values (67) 6905
69.0%
2024-05-11T14:58:17.691132image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4640
 
7.8%
3869
 
6.5%
3098
 
5.2%
3044
 
5.1%
2953
 
4.9%
2950
 
4.9%
2660
 
4.5%
2556
 
4.3%
1876
 
3.1%
1689
 
2.8%
Other values (97) 30354
50.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59689
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4640
 
7.8%
3869
 
6.5%
3098
 
5.2%
3044
 
5.1%
2953
 
4.9%
2950
 
4.9%
2660
 
4.5%
2556
 
4.3%
1876
 
3.1%
1689
 
2.8%
Other values (97) 30354
50.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59689
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4640
 
7.8%
3869
 
6.5%
3098
 
5.2%
3044
 
5.1%
2953
 
4.9%
2950
 
4.9%
2660
 
4.5%
2556
 
4.3%
1876
 
3.1%
1689
 
2.8%
Other values (97) 30354
50.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59689
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4640
 
7.8%
3869
 
6.5%
3098
 
5.2%
3044
 
5.1%
2953
 
4.9%
2950
 
4.9%
2660
 
4.5%
2556
 
4.3%
1876
 
3.1%
1689
 
2.8%
Other values (97) 30354
50.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202203
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202203
2nd row202203
3rd row202203
4th row202203
5th row202203

Common Values

ValueCountFrequency (%)
202203 10000
100.0%

Length

2024-05-11T14:58:17.910286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:58:18.069520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202203 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7330
Distinct (%)73.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79936787
Minimum-6.0442056 × 108
Maximum2.2863029 × 1010
Zeros2356
Zeros (%)23.6%
Negative329
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T14:58:18.271694image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-6.0442056 × 108
5-th percentile0
Q10
median2815730
Q331867436
95-th percentile3.765935 × 108
Maximum2.2863029 × 1010
Range2.346745 × 1010
Interquartile range (IQR)31867436

Descriptive statistics

Standard deviation3.9812528 × 108
Coefficient of variation (CV)4.9805014
Kurtosis1191.0794
Mean79936787
Median Absolute Deviation (MAD)2815730
Skewness25.944655
Sum7.9936787 × 1011
Variance1.5850374 × 1017
MonotonicityNot monotonic
2024-05-11T14:58:18.541476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2356
 
23.6%
500000 26
 
0.3%
250000 17
 
0.2%
484000 13
 
0.1%
300000 13
 
0.1%
242000 12
 
0.1%
30000000 11
 
0.1%
200000 11
 
0.1%
3000000 10
 
0.1%
1000000 10
 
0.1%
Other values (7320) 7521
75.2%
ValueCountFrequency (%)
-604420565 1
< 0.1%
-302175540 1
< 0.1%
-279779260 1
< 0.1%
-271719290 1
< 0.1%
-230922000 1
< 0.1%
-197026520 1
< 0.1%
-136451812 1
< 0.1%
-123413690 1
< 0.1%
-120813335 1
< 0.1%
-116368256 1
< 0.1%
ValueCountFrequency (%)
22863029101 1
< 0.1%
10667455557 1
< 0.1%
7821550005 1
< 0.1%
7491393593 1
< 0.1%
7257482358 1
< 0.1%
6545163308 1
< 0.1%
5724597640 1
< 0.1%
5255507426 1
< 0.1%
5133279263 1
< 0.1%
4962590333 1
< 0.1%

Interactions

2024-05-11T14:58:13.730959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:58:18.716668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.232
금액0.2321.000

Missing values

2024-05-11T14:58:13.938946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:58:14.126017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
34489하월곡동신A13613005연차수당충당부채2022036236980
9527홍제성원아파트A12009201장기수선충당예금202203150172096
7662힐스테이트서초젠트리스A10028046미수수익202203659930
42943월계6-2초안A13905208장기수선충당부채202203380190529
32650푸른마을아파트A13594203가수금2022034070600
61250시흥베르빌A15303102수선유지비충당부채2022033416960
54394양평현대2차A15010305관리비예치금20220338688000
42585현대리버빌2차아파트A13887403경비비충당부채2022038761155
25931옥수중앙하이츠A13383801예수금2022031418800
46635상계대림e-편한세상A13983803가수금202203918700
아파트명아파트코드비용명년월일금액
42254방이코오롱A13883602예수금2022032785027
28794아크로힐스논현A13501006주차장충당부채2022030
66836마곡수명산파크5단지A15728007미처분이익잉여금20220332401122
36394반포삼호가든맨션5차A13704101일반관리비충당부채2022030
23240창동동아A13290003미지급금202203164008310
65412등촌주공2단지A15703304가수금2022031075640
59158천왕이펜하우스1단지A15213006전신전화가입권2022030
61156서울가든빌라A15289508미처분이익잉여금2022030
39860오금삼성A13813003관리비예치금20220319212000
33307돈암삼성임대A13606106기타충당부채2022030