Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 28.94624167)Skewed
금액 has 2263 (22.6%) zerosZeros

Reproduction

Analysis started2024-05-11 05:56:44.764186
Analysis finished2024-05-11 05:56:46.080335
Duration1.32 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2210
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:56:46.324372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length19
Mean length7.3798
Min length2

Characters and Unicode

Total characters73798
Distinct characters437
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113 ?
Unique (%)1.1%

Sample

1st row청담대림
2nd row상봉듀오트리스
3rd row금호어울림1차
4th row강남한신휴플러스 8단지
5th row롯데캐슬
ValueCountFrequency (%)
아파트 166
 
1.5%
래미안 27
 
0.3%
e편한세상 25
 
0.2%
푸르지오 20
 
0.2%
은평뉴타운상림마을6단지 19
 
0.2%
아이파크 19
 
0.2%
경남아너스빌 18
 
0.2%
해모로 17
 
0.2%
래미안밤섬리베뉴 15
 
0.1%
북한산 15
 
0.1%
Other values (2292) 10417
96.8%
2024-05-11T14:56:46.941249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2585
 
3.5%
2482
 
3.4%
2326
 
3.2%
1866
 
2.5%
1772
 
2.4%
1685
 
2.3%
1559
 
2.1%
1476
 
2.0%
1450
 
2.0%
1413
 
1.9%
Other values (427) 55184
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67504
91.5%
Decimal Number 3789
 
5.1%
Space Separator 830
 
1.1%
Uppercase Letter 774
 
1.0%
Lowercase Letter 348
 
0.5%
Close Punctuation 147
 
0.2%
Open Punctuation 147
 
0.2%
Dash Punctuation 142
 
0.2%
Other Punctuation 113
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2585
 
3.8%
2482
 
3.7%
2326
 
3.4%
1866
 
2.8%
1772
 
2.6%
1685
 
2.5%
1559
 
2.3%
1476
 
2.2%
1450
 
2.1%
1413
 
2.1%
Other values (382) 48890
72.4%
Uppercase Letter
ValueCountFrequency (%)
S 125
16.1%
C 114
14.7%
K 92
11.9%
D 72
9.3%
M 72
9.3%
L 68
8.8%
H 60
7.8%
E 35
 
4.5%
I 32
 
4.1%
A 22
 
2.8%
Other values (7) 82
10.6%
Lowercase Letter
ValueCountFrequency (%)
e 196
56.3%
i 30
 
8.6%
l 30
 
8.6%
v 20
 
5.7%
c 18
 
5.2%
k 16
 
4.6%
s 12
 
3.4%
w 9
 
2.6%
g 6
 
1.7%
a 6
 
1.7%
Decimal Number
ValueCountFrequency (%)
1 1165
30.7%
2 1076
28.4%
3 504
13.3%
4 267
 
7.0%
5 222
 
5.9%
6 180
 
4.8%
7 118
 
3.1%
9 96
 
2.5%
8 92
 
2.4%
0 69
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 88
77.9%
. 25
 
22.1%
Space Separator
ValueCountFrequency (%)
830
100.0%
Close Punctuation
ValueCountFrequency (%)
) 147
100.0%
Open Punctuation
ValueCountFrequency (%)
( 147
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 142
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67504
91.5%
Common 5168
 
7.0%
Latin 1126
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2585
 
3.8%
2482
 
3.7%
2326
 
3.4%
1866
 
2.8%
1772
 
2.6%
1685
 
2.5%
1559
 
2.3%
1476
 
2.2%
1450
 
2.1%
1413
 
2.1%
Other values (382) 48890
72.4%
Latin
ValueCountFrequency (%)
e 196
17.4%
S 125
11.1%
C 114
10.1%
K 92
 
8.2%
D 72
 
6.4%
M 72
 
6.4%
L 68
 
6.0%
H 60
 
5.3%
E 35
 
3.1%
I 32
 
2.8%
Other values (19) 260
23.1%
Common
ValueCountFrequency (%)
1 1165
22.5%
2 1076
20.8%
830
16.1%
3 504
9.8%
4 267
 
5.2%
5 222
 
4.3%
6 180
 
3.5%
) 147
 
2.8%
( 147
 
2.8%
- 142
 
2.7%
Other values (6) 488
9.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67504
91.5%
ASCII 6290
 
8.5%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2585
 
3.8%
2482
 
3.7%
2326
 
3.4%
1866
 
2.8%
1772
 
2.6%
1685
 
2.5%
1559
 
2.3%
1476
 
2.2%
1450
 
2.1%
1413
 
2.1%
Other values (382) 48890
72.4%
ASCII
ValueCountFrequency (%)
1 1165
18.5%
2 1076
17.1%
830
13.2%
3 504
 
8.0%
4 267
 
4.2%
5 222
 
3.5%
e 196
 
3.1%
6 180
 
2.9%
) 147
 
2.3%
( 147
 
2.3%
Other values (34) 1556
24.7%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2217
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:56:47.505070image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique114 ?
Unique (%)1.1%

Sample

1st rowA13510006
2nd rowA10027670
3rd rowA13812003
4th rowA10027909
5th rowA15807205
ValueCountFrequency (%)
a12010202 14
 
0.1%
a13983811 13
 
0.1%
a12185303 12
 
0.1%
a15807601 12
 
0.1%
a15209305 12
 
0.1%
a12220001 11
 
0.1%
a13776301 11
 
0.1%
a15602001 11
 
0.1%
a12085303 11
 
0.1%
a11077101 11
 
0.1%
Other values (2207) 9882
98.8%
2024-05-11T14:56:48.289290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18333
20.4%
1 17569
19.5%
A 9996
11.1%
3 8761
9.7%
2 8275
9.2%
5 6215
 
6.9%
8 5717
 
6.4%
7 4730
 
5.3%
4 4059
 
4.5%
6 3338
 
3.7%
Other values (2) 3007
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18333
22.9%
1 17569
22.0%
3 8761
11.0%
2 8275
10.3%
5 6215
 
7.8%
8 5717
 
7.1%
7 4730
 
5.9%
4 4059
 
5.1%
6 3338
 
4.2%
9 3003
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18333
22.9%
1 17569
22.0%
3 8761
11.0%
2 8275
10.3%
5 6215
 
7.8%
8 5717
 
7.1%
7 4730
 
5.9%
4 4059
 
5.1%
6 3338
 
4.2%
9 3003
 
3.8%
Latin
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18333
20.4%
1 17569
19.5%
A 9996
11.1%
3 8761
9.7%
2 8275
9.2%
5 6215
 
6.9%
8 5717
 
6.4%
7 4730
 
5.3%
4 4059
 
4.5%
6 3338
 
3.7%
Other values (2) 3007
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:56:48.602391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length10
Mean length6.0264
Min length2

Characters and Unicode

Total characters60264
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row주차장충당부채
2nd row관리비예치금
3rd row현금
4th row현금
5th row수선유지비충당부채
ValueCountFrequency (%)
퇴직급여충당부채 333
 
3.3%
미처분이익잉여금 320
 
3.2%
장기수선충당부채 317
 
3.2%
예수금 313
 
3.1%
연차수당충당부채 307
 
3.1%
예금 306
 
3.1%
비품 305
 
3.0%
선급비용 295
 
2.9%
공동주택적립금 295
 
2.9%
당기순이익 292
 
2.9%
Other values (67) 6917
69.2%
2024-05-11T14:56:49.059406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4558
 
7.6%
3959
 
6.6%
3148
 
5.2%
3145
 
5.2%
3059
 
5.1%
3035
 
5.0%
2737
 
4.5%
2457
 
4.1%
1875
 
3.1%
1773
 
2.9%
Other values (97) 30518
50.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 60264
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4558
 
7.6%
3959
 
6.6%
3148
 
5.2%
3145
 
5.2%
3059
 
5.1%
3035
 
5.0%
2737
 
4.5%
2457
 
4.1%
1875
 
3.1%
1773
 
2.9%
Other values (97) 30518
50.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 60264
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4558
 
7.6%
3959
 
6.6%
3148
 
5.2%
3145
 
5.2%
3059
 
5.1%
3035
 
5.0%
2737
 
4.5%
2457
 
4.1%
1875
 
3.1%
1773
 
2.9%
Other values (97) 30518
50.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 60264
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4558
 
7.6%
3959
 
6.6%
3148
 
5.2%
3145
 
5.2%
3059
 
5.1%
3035
 
5.0%
2737
 
4.5%
2457
 
4.1%
1875
 
3.1%
1773
 
2.9%
Other values (97) 30518
50.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202201
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202201
2nd row202201
3rd row202201
4th row202201
5th row202201

Common Values

ValueCountFrequency (%)
202201 10000
100.0%

Length

2024-05-11T14:56:49.243673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:56:49.386097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202201 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7419
Distinct (%)74.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79401132
Minimum-2.8881651 × 108
Maximum2.2674316 × 1010
Zeros2263
Zeros (%)22.6%
Negative366
Negative (%)3.7%
Memory size166.0 KiB
2024-05-11T14:56:49.544350image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2.8881651 × 108
5-th percentile0
Q10
median3222414
Q339620065
95-th percentile3.751509 × 108
Maximum2.2674316 × 1010
Range2.2963133 × 1010
Interquartile range (IQR)39620065

Descriptive statistics

Standard deviation3.7010666 × 108
Coefficient of variation (CV)4.6612265
Kurtosis1508.8189
Mean79401132
Median Absolute Deviation (MAD)3222414
Skewness28.946242
Sum7.9401132 × 1011
Variance1.3697894 × 1017
MonotonicityNot monotonic
2024-05-11T14:56:49.783390image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2263
 
22.6%
500000 24
 
0.2%
300000 20
 
0.2%
250000 18
 
0.2%
242000 16
 
0.2%
484000 10
 
0.1%
55000 10
 
0.1%
2000000 9
 
0.1%
30000000 9
 
0.1%
200000 9
 
0.1%
Other values (7409) 7612
76.1%
ValueCountFrequency (%)
-288816510 1
< 0.1%
-261437990 1
< 0.1%
-258613552 1
< 0.1%
-247223594 1
< 0.1%
-195908810 1
< 0.1%
-190422700 1
< 0.1%
-139043880 1
< 0.1%
-130798052 1
< 0.1%
-113645208 1
< 0.1%
-104221160 1
< 0.1%
ValueCountFrequency (%)
22674316027 1
< 0.1%
11665415223 1
< 0.1%
5141986238 1
< 0.1%
4618510216 1
< 0.1%
4187121542 1
< 0.1%
4090240000 1
< 0.1%
4036757193 1
< 0.1%
3868317160 1
< 0.1%
3811974595 1
< 0.1%
3626773009 1
< 0.1%

Interactions

2024-05-11T14:56:45.556682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:56:49.956833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.171
금액0.1711.000

Missing values

2024-05-11T14:56:45.823659image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:56:46.000095image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
28866청담대림A13510006주차장충당부채2022010
6433상봉듀오트리스A10027670관리비예치금202201123093600
38818금호어울림1차A13812003현금2022016070
7043강남한신휴플러스 8단지A10027909현금202201256031
68128롯데캐슬A15807205수선유지비충당부채2022010
14048갈현현대아파트A12205004현금202201377840
69419목동현대1차A15882008현금202201137600
67150화곡중앙하이츠A15788203선수관리비202201115940000
4349래미안로이파크아파트A10026299저장품202201667500
6380상도2차 두산위브트레지움 아파트A10027633선급비용20220133648950
아파트명아파트코드비용명년월일금액
50364자양더샵스타시티A14319012예금2022011777651521
1608로데오현대아파트A10024814연차수당충당부채2022013749660
47567대우월드마크용산A14001101예금202201109593920
6792DMC파크뷰자이아파트A10027817기타유동부채20220143237126
29653수서삼성A13522004세대배부용비품2022011027000
62306보라매삼성쉐르빌A15672002당기순이익2022011272612
45674수락산벨리체아파트A13983811기타충당부채20220128180973
14329은평뉴타운상림마을6단지 제1아파트(8단지 푸르지오)A12220001예금202201138607389
53065양평동삼천리아파트A15010303수선유지비충당부채2022011203400
5789파크하비오푸르지오아파트A10027346임대보증금20220150000000