Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1434 (14.3%) zerosZeros

Reproduction

Analysis started2024-05-11 06:56:19.912483
Analysis finished2024-05-11 06:56:22.973519
Duration3.06 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2144
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:23.341287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1963
Min length2

Characters and Unicode

Total characters71963
Distinct characters430
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique90 ?
Unique (%)0.9%

Sample

1st row창전삼성임대
2nd row마곡우림필유아파트
3rd row문정3차푸르지오
4th row독립문파크빌
5th row건영3차아파트
ValueCountFrequency (%)
아파트 156
 
1.5%
래미안 33
 
0.3%
아이파크 21
 
0.2%
신내 18
 
0.2%
2단지 16
 
0.1%
신반포 15
 
0.1%
해모로 15
 
0.1%
북한산 13
 
0.1%
고덕 13
 
0.1%
월드컵참누리 13
 
0.1%
Other values (2209) 10367
97.1%
2024-05-11T06:56:24.652484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2461
 
3.4%
2322
 
3.2%
2164
 
3.0%
1896
 
2.6%
1742
 
2.4%
1654
 
2.3%
1550
 
2.2%
1468
 
2.0%
1359
 
1.9%
1303
 
1.8%
Other values (420) 54044
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65819
91.5%
Decimal Number 3649
 
5.1%
Uppercase Letter 787
 
1.1%
Space Separator 746
 
1.0%
Lowercase Letter 348
 
0.5%
Close Punctuation 169
 
0.2%
Open Punctuation 169
 
0.2%
Other Punctuation 139
 
0.2%
Dash Punctuation 128
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2461
 
3.7%
2322
 
3.5%
2164
 
3.3%
1896
 
2.9%
1742
 
2.6%
1654
 
2.5%
1550
 
2.4%
1468
 
2.2%
1359
 
2.1%
1303
 
2.0%
Other values (374) 47900
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 116
14.7%
C 116
14.7%
K 87
11.1%
D 71
9.0%
M 71
9.0%
L 60
7.6%
H 48
6.1%
I 44
 
5.6%
E 40
 
5.1%
G 31
 
3.9%
Other values (7) 103
13.1%
Lowercase Letter
ValueCountFrequency (%)
e 184
52.9%
l 38
 
10.9%
i 34
 
9.8%
v 22
 
6.3%
k 16
 
4.6%
s 15
 
4.3%
c 12
 
3.4%
w 8
 
2.3%
g 7
 
2.0%
a 7
 
2.0%
Decimal Number
ValueCountFrequency (%)
1 1092
29.9%
2 1076
29.5%
3 534
14.6%
4 268
 
7.3%
5 195
 
5.3%
6 151
 
4.1%
8 90
 
2.5%
7 88
 
2.4%
0 86
 
2.4%
9 69
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 113
81.3%
. 26
 
18.7%
Space Separator
ValueCountFrequency (%)
746
100.0%
Close Punctuation
ValueCountFrequency (%)
) 169
100.0%
Open Punctuation
ValueCountFrequency (%)
( 169
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 128
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%
Math Symbol
ValueCountFrequency (%)
~ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65819
91.5%
Common 5002
 
7.0%
Latin 1142
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2461
 
3.7%
2322
 
3.5%
2164
 
3.3%
1896
 
2.9%
1742
 
2.6%
1654
 
2.5%
1550
 
2.4%
1468
 
2.2%
1359
 
2.1%
1303
 
2.0%
Other values (374) 47900
72.8%
Latin
ValueCountFrequency (%)
e 184
16.1%
S 116
 
10.2%
C 116
 
10.2%
K 87
 
7.6%
D 71
 
6.2%
M 71
 
6.2%
L 60
 
5.3%
H 48
 
4.2%
I 44
 
3.9%
E 40
 
3.5%
Other values (19) 305
26.7%
Common
ValueCountFrequency (%)
1 1092
21.8%
2 1076
21.5%
746
14.9%
3 534
10.7%
4 268
 
5.4%
5 195
 
3.9%
) 169
 
3.4%
( 169
 
3.4%
6 151
 
3.0%
- 128
 
2.6%
Other values (7) 474
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65819
91.5%
ASCII 6137
 
8.5%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2461
 
3.7%
2322
 
3.5%
2164
 
3.3%
1896
 
2.9%
1742
 
2.6%
1654
 
2.5%
1550
 
2.4%
1468
 
2.2%
1359
 
2.1%
1303
 
2.0%
Other values (374) 47900
72.8%
ASCII
ValueCountFrequency (%)
1 1092
17.8%
2 1076
17.5%
746
12.2%
3 534
 
8.7%
4 268
 
4.4%
5 195
 
3.2%
e 184
 
3.0%
) 169
 
2.8%
( 169
 
2.8%
6 151
 
2.5%
Other values (35) 1553
25.3%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2150
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:25.539367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique92 ?
Unique (%)0.9%

Sample

1st rowA12177802
2nd rowA15722106
3rd rowA13820001
4th rowA12008001
5th rowA15101903
ValueCountFrequency (%)
a12187906 13
 
0.1%
a14072701 12
 
0.1%
a10025372 12
 
0.1%
a15086007 12
 
0.1%
a13986004 12
 
0.1%
a13707203 12
 
0.1%
a13376906 11
 
0.1%
a13510103 11
 
0.1%
a13519006 11
 
0.1%
a13379001 11
 
0.1%
Other values (2140) 9883
98.8%
2024-05-11T06:56:26.805130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18680
20.8%
1 17667
19.6%
A 10000
11.1%
3 8972
10.0%
2 8197
9.1%
5 6107
 
6.8%
8 5686
 
6.3%
7 4640
 
5.2%
4 3754
 
4.2%
6 3476
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18680
23.4%
1 17667
22.1%
3 8972
11.2%
2 8197
10.2%
5 6107
 
7.6%
8 5686
 
7.1%
7 4640
 
5.8%
4 3754
 
4.7%
6 3476
 
4.3%
9 2821
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18680
23.4%
1 17667
22.1%
3 8972
11.2%
2 8197
10.2%
5 6107
 
7.6%
8 5686
 
7.1%
7 4640
 
5.8%
4 3754
 
4.7%
6 3476
 
4.3%
9 2821
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18680
20.8%
1 17667
19.6%
A 10000
11.1%
3 8972
10.0%
2 8197
9.1%
5 6107
 
6.8%
8 5686
 
6.3%
7 4640
 
5.2%
4 3754
 
4.2%
6 3476
 
3.9%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:56:27.392483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8468
Min length2

Characters and Unicode

Total characters48468
Distinct characters118
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row음식물처리비
2nd row재활용품수익
3rd row잡비용
4th row경비비
5th row공동주택지원금비용
ValueCountFrequency (%)
급여 236
 
2.4%
세대전기료 221
 
2.2%
소모품비 219
 
2.2%
잡수익 219
 
2.2%
경비비 218
 
2.2%
소독비 218
 
2.2%
청소비 214
 
2.1%
입주자대표회의운영비 214
 
2.1%
승강기유지비 213
 
2.1%
퇴직급여 213
 
2.1%
Other values (76) 7815
78.1%
2024-05-11T06:56:28.631129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5429
 
11.2%
3511
 
7.2%
2090
 
4.3%
1948
 
4.0%
1687
 
3.5%
1297
 
2.7%
1086
 
2.2%
824
 
1.7%
791
 
1.6%
780
 
1.6%
Other values (108) 29025
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48468
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5429
 
11.2%
3511
 
7.2%
2090
 
4.3%
1948
 
4.0%
1687
 
3.5%
1297
 
2.7%
1086
 
2.2%
824
 
1.7%
791
 
1.6%
780
 
1.6%
Other values (108) 29025
59.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48468
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5429
 
11.2%
3511
 
7.2%
2090
 
4.3%
1948
 
4.0%
1687
 
3.5%
1297
 
2.7%
1086
 
2.2%
824
 
1.7%
791
 
1.6%
780
 
1.6%
Other values (108) 29025
59.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48468
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5429
 
11.2%
3511
 
7.2%
2090
 
4.3%
1948
 
4.0%
1687
 
3.5%
1297
 
2.7%
1086
 
2.2%
824
 
1.7%
791
 
1.6%
780
 
1.6%
Other values (108) 29025
59.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202007
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202007
2nd row202007
3rd row202007
4th row202007
5th row202007

Common Values

ValueCountFrequency (%)
202007 10000
100.0%

Length

2024-05-11T06:56:29.099383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:56:29.419452image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202007 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6873
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2913441.8
Minimum-9850000
Maximum2.505836 × 108
Zeros1434
Zeros (%)14.3%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:56:29.922810image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-9850000
5-th percentile0
Q158677.5
median305205
Q31354892.5
95-th percentile14660075
Maximum2.505836 × 108
Range2.604336 × 108
Interquartile range (IQR)1296215

Descriptive statistics

Standard deviation9760286.2
Coefficient of variation (CV)3.350088
Kurtosis152.00522
Mean2913441.8
Median Absolute Deviation (MAD)305205
Skewness9.5665842
Sum2.9134418 × 1010
Variance9.5263186 × 1013
MonotonicityNot monotonic
2024-05-11T06:56:30.391639image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1434
 
14.3%
200000 86
 
0.9%
100000 62
 
0.6%
300000 59
 
0.6%
150000 40
 
0.4%
400000 40
 
0.4%
600000 33
 
0.3%
50000 31
 
0.3%
500000 28
 
0.3%
250000 27
 
0.3%
Other values (6863) 8160
81.6%
ValueCountFrequency (%)
-9850000 1
< 0.1%
-4485279 1
< 0.1%
-2974100 1
< 0.1%
-1700000 1
< 0.1%
-1094500 1
< 0.1%
-1052000 1
< 0.1%
-691430 1
< 0.1%
-192780 1
< 0.1%
-168300 1
< 0.1%
-41940 1
< 0.1%
ValueCountFrequency (%)
250583596 1
< 0.1%
240188086 1
< 0.1%
232192970 1
< 0.1%
160034000 1
< 0.1%
140157000 1
< 0.1%
137090960 1
< 0.1%
131416770 1
< 0.1%
124751303 1
< 0.1%
123148580 1
< 0.1%
122805809 1
< 0.1%

Interactions

2024-05-11T06:56:21.525961image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:56:30.676570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.495
금액0.4951.000

Missing values

2024-05-11T06:56:22.055898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:56:22.655126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
16090창전삼성임대A12177802음식물처리비202007332520
92975마곡우림필유아파트A15722106재활용품수익202007134545
55641문정3차푸르지오A13820001잡비용202007212110
11563독립문파크빌A12008001경비비2020074722324
78877건영3차아파트A15101903공동주택지원금비용2020072727630
47208하월곡동신A13613005수선유지비2020075913570
27615도봉삼환A13201207입주자대표회의운영비202007718000
972신촌숲 아이파크 아파트A10024974세대수도료20200718122160
42515도곡한신A13550403회계감사비2020070
93500마곡수명산파크7단지A15728005국민연금202007340060
아파트명아파트코드비용명년월일금액
47694석관코오롱A13615002교육비2020070
64486두산아파트A13983713선거관리위원회운영비2020070
24082신내동성7차A13113001소독비202007227100
35740천호e-편한세상A13402202퇴직급여202007196900
78183여의도시범아파트A15089421부과차익202007874
68153용산CJ나인파크A14010003지급수수료202007329030
82820신도림디큐브시티A15277302연체료수익202007106092
54740마천금호A13812002퇴직급여202007545000
71725구의강변우성A14320302사무용품비2020070
27120신내9단지A13187305연체료수익202007564460