Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1661 (16.6%) zerosZeros

Reproduction

Analysis started2024-05-11 06:55:36.808985
Analysis finished2024-05-11 06:55:38.639555
Duration1.83 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2100
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:38.882998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2269
Min length2

Characters and Unicode

Total characters72269
Distinct characters429
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)1.0%

Sample

1st row신개봉삼환
2nd row신당푸르지오
3rd row상계주공14단지
4th row현대강변
5th row대치우성1차아파트
ValueCountFrequency (%)
아파트 165
 
1.5%
래미안 28
 
0.3%
e편한세상 21
 
0.2%
북한산 20
 
0.2%
아이파크 20
 
0.2%
고덕 17
 
0.2%
힐스테이트 17
 
0.2%
고척대우 16
 
0.1%
sk뷰 15
 
0.1%
신내 14
 
0.1%
Other values (2169) 10384
96.9%
2024-05-11T06:55:39.790269image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2542
 
3.5%
2478
 
3.4%
2300
 
3.2%
1888
 
2.6%
1630
 
2.3%
1566
 
2.2%
1559
 
2.2%
1414
 
2.0%
1404
 
1.9%
1272
 
1.8%
Other values (419) 54216
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66369
91.8%
Decimal Number 3411
 
4.7%
Uppercase Letter 812
 
1.1%
Space Separator 785
 
1.1%
Lowercase Letter 361
 
0.5%
Open Punctuation 138
 
0.2%
Close Punctuation 138
 
0.2%
Other Punctuation 124
 
0.2%
Dash Punctuation 122
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2542
 
3.8%
2478
 
3.7%
2300
 
3.5%
1888
 
2.8%
1630
 
2.5%
1566
 
2.4%
1559
 
2.3%
1414
 
2.1%
1404
 
2.1%
1272
 
1.9%
Other values (373) 48316
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 129
15.9%
C 114
14.0%
K 98
12.1%
D 90
11.1%
M 90
11.1%
L 51
 
6.3%
H 47
 
5.8%
I 41
 
5.0%
G 38
 
4.7%
E 24
 
3.0%
Other values (7) 90
11.1%
Lowercase Letter
ValueCountFrequency (%)
e 192
53.2%
l 44
 
12.2%
i 29
 
8.0%
v 25
 
6.9%
k 21
 
5.8%
c 20
 
5.5%
s 16
 
4.4%
h 5
 
1.4%
w 5
 
1.4%
a 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
2 1065
31.2%
1 1017
29.8%
3 463
13.6%
4 220
 
6.4%
5 178
 
5.2%
6 144
 
4.2%
7 99
 
2.9%
9 86
 
2.5%
8 85
 
2.5%
0 54
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 99
79.8%
. 25
 
20.2%
Space Separator
ValueCountFrequency (%)
785
100.0%
Open Punctuation
ValueCountFrequency (%)
( 138
100.0%
Close Punctuation
ValueCountFrequency (%)
) 138
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 122
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66369
91.8%
Common 4721
 
6.5%
Latin 1179
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2542
 
3.8%
2478
 
3.7%
2300
 
3.5%
1888
 
2.8%
1630
 
2.5%
1566
 
2.4%
1559
 
2.3%
1414
 
2.1%
1404
 
2.1%
1272
 
1.9%
Other values (373) 48316
72.8%
Latin
ValueCountFrequency (%)
e 192
16.3%
S 129
10.9%
C 114
9.7%
K 98
 
8.3%
D 90
 
7.6%
M 90
 
7.6%
L 51
 
4.3%
H 47
 
4.0%
l 44
 
3.7%
I 41
 
3.5%
Other values (19) 283
24.0%
Common
ValueCountFrequency (%)
2 1065
22.6%
1 1017
21.5%
785
16.6%
3 463
9.8%
4 220
 
4.7%
5 178
 
3.8%
6 144
 
3.1%
( 138
 
2.9%
) 138
 
2.9%
- 122
 
2.6%
Other values (7) 451
9.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66369
91.8%
ASCII 5894
 
8.2%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2542
 
3.8%
2478
 
3.7%
2300
 
3.5%
1888
 
2.8%
1630
 
2.5%
1566
 
2.4%
1559
 
2.3%
1414
 
2.1%
1404
 
2.1%
1272
 
1.9%
Other values (373) 48316
72.8%
ASCII
ValueCountFrequency (%)
2 1065
18.1%
1 1017
17.3%
785
13.3%
3 463
 
7.9%
4 220
 
3.7%
e 192
 
3.3%
5 178
 
3.0%
6 144
 
2.4%
( 138
 
2.3%
) 138
 
2.3%
Other values (35) 1554
26.4%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2106
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:40.424848image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique98 ?
Unique (%)1.0%

Sample

1st rowA15280602
2nd rowA10045001
3rd rowA13981903
4th rowA14319201
5th rowA13583403
ValueCountFrequency (%)
a15279404 16
 
0.2%
a12010001 12
 
0.1%
a10027817 12
 
0.1%
a15106101 12
 
0.1%
a12013003 12
 
0.1%
a15721005 12
 
0.1%
a13086701 12
 
0.1%
a13591402 11
 
0.1%
a13003202 11
 
0.1%
a13527203 11
 
0.1%
Other values (2096) 9879
98.8%
2024-05-11T06:55:41.497279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18733
20.8%
1 17706
19.7%
A 10000
11.1%
3 8953
9.9%
2 8292
9.2%
5 6161
 
6.8%
8 5491
 
6.1%
7 4662
 
5.2%
4 3741
 
4.2%
6 3475
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18733
23.4%
1 17706
22.1%
3 8953
11.2%
2 8292
10.4%
5 6161
 
7.7%
8 5491
 
6.9%
7 4662
 
5.8%
4 3741
 
4.7%
6 3475
 
4.3%
9 2786
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18733
23.4%
1 17706
22.1%
3 8953
11.2%
2 8292
10.4%
5 6161
 
7.7%
8 5491
 
6.9%
7 4662
 
5.8%
4 3741
 
4.7%
6 3475
 
4.3%
9 2786
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18733
20.8%
1 17706
19.7%
A 10000
11.1%
3 8953
9.9%
2 8292
9.2%
5 6161
 
6.8%
8 5491
 
6.1%
7 4662
 
5.2%
4 3741
 
4.2%
6 3475
 
3.9%
Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:42.047756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.886
Min length2

Characters and Unicode

Total characters48860
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row잡수익
2nd row고용보험료
3rd row소모품비
4th row승강기수익
5th row잡수익
ValueCountFrequency (%)
통신비 229
 
2.3%
이자수익 227
 
2.3%
입주자대표회의운영비 224
 
2.2%
급여 223
 
2.2%
보험료 211
 
2.1%
교육비 209
 
2.1%
소독비 206
 
2.1%
세대전기료 202
 
2.0%
연체료수익 202
 
2.0%
산재보험료 201
 
2.0%
Other values (75) 7866
78.7%
2024-05-11T06:55:43.058221image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5345
 
10.9%
3535
 
7.2%
2104
 
4.3%
2024
 
4.1%
1720
 
3.5%
1213
 
2.5%
1044
 
2.1%
828
 
1.7%
795
 
1.6%
756
 
1.5%
Other values (110) 29496
60.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48860
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5345
 
10.9%
3535
 
7.2%
2104
 
4.3%
2024
 
4.1%
1720
 
3.5%
1213
 
2.5%
1044
 
2.1%
828
 
1.7%
795
 
1.6%
756
 
1.5%
Other values (110) 29496
60.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48860
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5345
 
10.9%
3535
 
7.2%
2104
 
4.3%
2024
 
4.1%
1720
 
3.5%
1213
 
2.5%
1044
 
2.1%
828
 
1.7%
795
 
1.6%
756
 
1.5%
Other values (110) 29496
60.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48860
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5345
 
10.9%
3535
 
7.2%
2104
 
4.3%
2024
 
4.1%
1720
 
3.5%
1213
 
2.5%
1044
 
2.1%
828
 
1.7%
795
 
1.6%
756
 
1.5%
Other values (110) 29496
60.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202011
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202011
2nd row202011
3rd row202011
4th row202011
5th row202011

Common Values

ValueCountFrequency (%)
202011 10000
100.0%

Length

2024-05-11T06:55:43.389697image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:43.620290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202011 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6606
Distinct (%)66.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3110485.7
Minimum-4273300
Maximum5.6064406 × 108
Zeros1661
Zeros (%)16.6%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:55:44.106508image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4273300
5-th percentile0
Q145221
median279460
Q31220132.5
95-th percentile15299156
Maximum5.6064406 × 108
Range5.6491736 × 108
Interquartile range (IQR)1174911.5

Descriptive statistics

Standard deviation12664716
Coefficient of variation (CV)4.0716201
Kurtosis619.31299
Mean3110485.7
Median Absolute Deviation (MAD)279460
Skewness18.477294
Sum3.1104857 × 1010
Variance1.6039504 × 1014
MonotonicityNot monotonic
2024-05-11T06:55:44.360585image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1661
 
16.6%
200000 77
 
0.8%
300000 77
 
0.8%
100000 61
 
0.6%
150000 46
 
0.5%
30000 35
 
0.4%
600000 35
 
0.4%
400000 34
 
0.3%
50000 34
 
0.3%
250000 31
 
0.3%
Other values (6596) 7909
79.1%
ValueCountFrequency (%)
-4273300 1
 
< 0.1%
-520040 1
 
< 0.1%
-280620 1
 
< 0.1%
-271330 1
 
< 0.1%
-250454 1
 
< 0.1%
-229350 1
 
< 0.1%
-40000 1
 
< 0.1%
-35750 1
 
< 0.1%
0 1661
16.6%
1 1
 
< 0.1%
ValueCountFrequency (%)
560644058 1
< 0.1%
475548020 1
< 0.1%
247984940 1
< 0.1%
203537810 1
< 0.1%
190039492 1
< 0.1%
167626810 1
< 0.1%
162659600 1
< 0.1%
159488006 1
< 0.1%
159208010 1
< 0.1%
147102409 1
< 0.1%

Interactions

2024-05-11T06:55:37.654204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:44.525270image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.278
금액0.2781.000

Missing values

2024-05-11T06:55:38.078512image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:55:38.415913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
85771신개봉삼환A15280602잡수익202011315360
10698신당푸르지오A10045001고용보험료202011137060
65859상계주공14단지A13981903소모품비202011886020
73816현대강변A14319201승강기수익2020110
45085대치우성1차아파트A13583403잡수익2020111500182
6325용산푸르지오써밋A10026759승강기유지비2020111782000
65997상계신동아A13982003청소비2020114373800
9229위례아이파크아파트A10027744건강보험료202011811450
91459사당삼호그린A15609002고용안정사업비용202011380000
65134공릉대아2차A13980604부과차익2020111323
아파트명아파트코드비용명년월일금액
16652마포동원베네스트A12170401이자수익2020110
92266사당신동아4단지A15677204장기수선비20201113864680
87690신도림현대A15288803부과차손2020110
78826한강아파트A15080501통신비20201143270
42782수서까치마을A13522007청소비20201114628630
17196서강한화오벨리스크스위트A12177801지급수수료20201127600
62614중계양지대림2차A13922110산재보험료202011162730
78037양평삼호A15010304재활용품수익202011213000
80954신대림신동아파밀리에A15095002세대수도료2020113503760
3639신정이든채A10025649고용보험료202011107320