Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 30.48539677)Skewed
금액 has 2187 (21.9%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:20.163278
Analysis finished2024-05-11 05:59:21.094139
Duration0.93 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2219
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:21.245762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2877
Min length2

Characters and Unicode

Total characters72877
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique121 ?
Unique (%)1.2%

Sample

1st row도봉서원제2
2nd row양재우성KBS(113동)
3rd row은평뉴타운상림마을12단지
4th row길음뉴타운9단지제2
5th row천호태영
ValueCountFrequency (%)
아파트 142
 
1.3%
래미안 32
 
0.3%
아이파크 21
 
0.2%
e편한세상 19
 
0.2%
경남아너스빌 17
 
0.2%
북한산 16
 
0.2%
미아경남아너스빌 13
 
0.1%
서울숲2차푸르지오임대 12
 
0.1%
상계보람 12
 
0.1%
중계그린 12
 
0.1%
Other values (2287) 10315
97.2%
2024-05-11T14:59:21.648310image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2427
 
3.3%
2396
 
3.3%
2184
 
3.0%
1841
 
2.5%
1825
 
2.5%
1685
 
2.3%
1457
 
2.0%
1430
 
2.0%
1418
 
1.9%
1329
 
1.8%
Other values (425) 54885
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66898
91.8%
Decimal Number 3630
 
5.0%
Uppercase Letter 782
 
1.1%
Space Separator 685
 
0.9%
Lowercase Letter 363
 
0.5%
Dash Punctuation 139
 
0.2%
Open Punctuation 129
 
0.2%
Close Punctuation 129
 
0.2%
Other Punctuation 115
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2427
 
3.6%
2396
 
3.6%
2184
 
3.3%
1841
 
2.8%
1825
 
2.7%
1685
 
2.5%
1457
 
2.2%
1430
 
2.1%
1418
 
2.1%
1329
 
2.0%
Other values (380) 48906
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 119
15.2%
C 104
13.3%
K 86
11.0%
D 73
9.3%
M 73
9.3%
L 66
8.4%
H 58
7.4%
I 41
 
5.2%
G 38
 
4.9%
E 32
 
4.1%
Other values (7) 92
11.8%
Lowercase Letter
ValueCountFrequency (%)
e 202
55.6%
l 32
 
8.8%
i 29
 
8.0%
k 23
 
6.3%
v 22
 
6.1%
s 20
 
5.5%
c 16
 
4.4%
w 12
 
3.3%
h 5
 
1.4%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
1 1126
31.0%
2 1012
27.9%
3 466
12.8%
4 280
 
7.7%
5 214
 
5.9%
6 146
 
4.0%
7 125
 
3.4%
8 93
 
2.6%
0 91
 
2.5%
9 77
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 94
81.7%
. 21
 
18.3%
Space Separator
ValueCountFrequency (%)
685
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 139
100.0%
Open Punctuation
ValueCountFrequency (%)
( 129
100.0%
Close Punctuation
ValueCountFrequency (%)
) 129
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66898
91.8%
Common 4827
 
6.6%
Latin 1152
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2427
 
3.6%
2396
 
3.6%
2184
 
3.3%
1841
 
2.8%
1825
 
2.7%
1685
 
2.5%
1457
 
2.2%
1430
 
2.1%
1418
 
2.1%
1329
 
2.0%
Other values (380) 48906
73.1%
Latin
ValueCountFrequency (%)
e 202
17.5%
S 119
10.3%
C 104
 
9.0%
K 86
 
7.5%
D 73
 
6.3%
M 73
 
6.3%
L 66
 
5.7%
H 58
 
5.0%
I 41
 
3.6%
G 38
 
3.3%
Other values (19) 292
25.3%
Common
ValueCountFrequency (%)
1 1126
23.3%
2 1012
21.0%
685
14.2%
3 466
9.7%
4 280
 
5.8%
5 214
 
4.4%
6 146
 
3.0%
- 139
 
2.9%
( 129
 
2.7%
) 129
 
2.7%
Other values (6) 501
10.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66898
91.8%
ASCII 5972
 
8.2%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2427
 
3.6%
2396
 
3.6%
2184
 
3.3%
1841
 
2.8%
1825
 
2.7%
1685
 
2.5%
1457
 
2.2%
1430
 
2.1%
1418
 
2.1%
1329
 
2.0%
Other values (380) 48906
73.1%
ASCII
ValueCountFrequency (%)
1 1126
18.9%
2 1012
16.9%
685
11.5%
3 466
 
7.8%
4 280
 
4.7%
5 214
 
3.6%
e 202
 
3.4%
6 146
 
2.4%
- 139
 
2.3%
( 129
 
2.2%
Other values (34) 1573
26.3%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2226
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:22.010984image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique121 ?
Unique (%)1.2%

Sample

1st rowA13275302
2nd rowA13789201
3rd rowA12220004
4th rowA13679402
5th rowA13402002
ValueCountFrequency (%)
a14272306 13
 
0.1%
a13982604 12
 
0.1%
a13986306 12
 
0.1%
a14003001 12
 
0.1%
a15601105 11
 
0.1%
a15105008 11
 
0.1%
a13778204 11
 
0.1%
a13922907 11
 
0.1%
a15180705 11
 
0.1%
a13410006 11
 
0.1%
Other values (2216) 9885
98.9%
2024-05-11T14:59:22.486279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18473
20.5%
1 17682
19.6%
A 9981
11.1%
3 8811
9.8%
2 8183
9.1%
5 6234
 
6.9%
8 5606
 
6.2%
7 4750
 
5.3%
4 3900
 
4.3%
6 3313
 
3.7%
Other values (2) 3067
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18473
23.1%
1 17682
22.1%
3 8811
11.0%
2 8183
10.2%
5 6234
 
7.8%
8 5606
 
7.0%
7 4750
 
5.9%
4 3900
 
4.9%
6 3313
 
4.1%
9 3048
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9981
99.8%
B 19
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18473
23.1%
1 17682
22.1%
3 8811
11.0%
2 8183
10.2%
5 6234
 
7.8%
8 5606
 
7.0%
7 4750
 
5.9%
4 3900
 
4.9%
6 3313
 
4.1%
9 3048
 
3.8%
Latin
ValueCountFrequency (%)
A 9981
99.8%
B 19
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18473
20.5%
1 17682
19.6%
A 9981
11.1%
3 8811
9.8%
2 8183
9.1%
5 6234
 
6.9%
8 5606
 
6.2%
7 4750
 
5.3%
4 3900
 
4.3%
6 3313
 
3.7%
Other values (2) 3067
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:22.749499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9972
Min length2

Characters and Unicode

Total characters59972
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row공동주택적립금
2nd row주차장충당예금
3rd row미부과관리비
4th row미처분이익잉여금
5th row주차장충당부채
ValueCountFrequency (%)
미처분이익잉여금 336
 
3.4%
예금 334
 
3.3%
퇴직급여충당부채 325
 
3.2%
당기순이익 321
 
3.2%
연차수당충당부채 316
 
3.2%
선급비용 310
 
3.1%
장기수선충당부채 306
 
3.1%
장기수선충당예금 300
 
3.0%
예수금 296
 
3.0%
공동주택적립금 295
 
2.9%
Other values (67) 6861
68.6%
2024-05-11T14:59:23.129627image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4748
 
7.9%
3839
 
6.4%
3154
 
5.3%
3120
 
5.2%
2976
 
5.0%
2961
 
4.9%
2673
 
4.5%
2376
 
4.0%
1914
 
3.2%
1791
 
3.0%
Other values (97) 30420
50.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59972
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4748
 
7.9%
3839
 
6.4%
3154
 
5.3%
3120
 
5.2%
2976
 
5.0%
2961
 
4.9%
2673
 
4.5%
2376
 
4.0%
1914
 
3.2%
1791
 
3.0%
Other values (97) 30420
50.7%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59972
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4748
 
7.9%
3839
 
6.4%
3154
 
5.3%
3120
 
5.2%
2976
 
5.0%
2961
 
4.9%
2673
 
4.5%
2376
 
4.0%
1914
 
3.2%
1791
 
3.0%
Other values (97) 30420
50.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59972
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4748
 
7.9%
3839
 
6.4%
3154
 
5.3%
3120
 
5.2%
2976
 
5.0%
2961
 
4.9%
2673
 
4.5%
2376
 
4.0%
1914
 
3.2%
1791
 
3.0%
Other values (97) 30420
50.7%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202103
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202103
2nd row202103
3rd row202103
4th row202103
5th row202103

Common Values

ValueCountFrequency (%)
202103 10000
100.0%

Length

2024-05-11T14:59:23.292366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:59:23.405419image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202103 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7517
Distinct (%)75.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73921529
Minimum-2.4201078 × 109
Maximum2.2105239 × 1010
Zeros2187
Zeros (%)21.9%
Negative343
Negative (%)3.4%
Memory size166.0 KiB
2024-05-11T14:59:23.534433image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2.4201078 × 109
5-th percentile0
Q10
median3313210
Q333731948
95-th percentile3.6332115 × 108
Maximum2.2105239 × 1010
Range2.4525347 × 1010
Interquartile range (IQR)33731948

Descriptive statistics

Standard deviation3.5569236 × 108
Coefficient of variation (CV)4.811756
Kurtosis1616.0042
Mean73921529
Median Absolute Deviation (MAD)3313210
Skewness30.485397
Sum7.3921529 × 1011
Variance1.2651705 × 1017
MonotonicityNot monotonic
2024-05-11T14:59:23.709426image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2187
 
21.9%
500000 26
 
0.3%
250000 19
 
0.2%
300000 15
 
0.1%
1000000 14
 
0.1%
242000 12
 
0.1%
484000 12
 
0.1%
3000000 12
 
0.1%
10000000 9
 
0.1%
30000000 8
 
0.1%
Other values (7507) 7686
76.9%
ValueCountFrequency (%)
-2420107766 1
< 0.1%
-766340955 1
< 0.1%
-361355158 1
< 0.1%
-197034426 1
< 0.1%
-161469720 1
< 0.1%
-148144012 1
< 0.1%
-134098170 1
< 0.1%
-130932285 1
< 0.1%
-122481350 1
< 0.1%
-119511381 1
< 0.1%
ValueCountFrequency (%)
22105239225 1
< 0.1%
11634939102 1
< 0.1%
6027070234 1
< 0.1%
5954105162 1
< 0.1%
4340686033 1
< 0.1%
4275204643 1
< 0.1%
4085705462 1
< 0.1%
4061418186 1
< 0.1%
3990814123 1
< 0.1%
3756922003 1
< 0.1%

Interactions

2024-05-11T14:59:20.685203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:59:23.808125image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.646
금액0.6461.000

Missing values

2024-05-11T14:59:20.888282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:59:21.039471image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
21138도봉서원제2A13275302공동주택적립금2021030
37294양재우성KBS(113동)A13789201주차장충당예금20210344686898
13712은평뉴타운상림마을12단지A12220004미부과관리비202103115008480
34404길음뉴타운9단지제2A13679402미처분이익잉여금2021031294302
25170천호태영A13402002주차장충당부채2021030
31051역삼개나리래미안A13592601미지급금2021030
53346여의도자이A15076302비품감가상각누계액202103-117717769
29692대치삼성A13528003기타충당부채2021030
36872서초우성5차아파트A13785705비품2021030
25200대우한강베네시티A13402003장기수선충당부채202103286877824
아파트명아파트코드비용명년월일금액
71139은평뉴타운구파발9-2단지A41279920당기순이익2021038984268
52711문래대원A15009603비품20210310132698
44450상계대동아파트A13981606승강기유지비충당부채202103720000
33739종암극동아파트A13671207선급금202103524540
16126답십리대우A13080201상여충당부채2021032499260
5877신내의료안심주택A10027775미수관리비예치금2021033144000
8971DMC휴먼빌A12013001미수관리비예치금2021030
30889압구정현대아파트A13589802선수금20210346925063
49219미아현대A14272307당기순이익2021032167243
70146신정대림A15885303주차장충당예금2021030