Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1147 (11.5%) zerosZeros

Reproduction

Analysis started2024-05-11 06:48:27.258833
Analysis finished2024-05-11 06:48:29.035160
Duration1.78 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2253
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:29.279839image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.4617
Min length2

Characters and Unicode

Total characters74617
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique123 ?
Unique (%)1.2%

Sample

1st row성내현대
2nd row신도림우성1,2차
3rd row이촌한강맨션
4th row고척대우
5th row신내중앙하이츠
ValueCountFrequency (%)
아파트 197
 
1.8%
래미안 47
 
0.4%
e편한세상 34
 
0.3%
아이파크 28
 
0.3%
sk뷰 26
 
0.2%
이편한세상 21
 
0.2%
송파 17
 
0.2%
길음뉴타운 16
 
0.1%
경남아너스빌 15
 
0.1%
고덕 15
 
0.1%
Other values (2338) 10479
96.2%
2024-05-11T06:48:29.947232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2672
 
3.6%
2563
 
3.4%
2501
 
3.4%
1834
 
2.5%
1651
 
2.2%
1644
 
2.2%
1495
 
2.0%
1471
 
2.0%
1426
 
1.9%
1417
 
1.9%
Other values (425) 55943
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 68367
91.6%
Decimal Number 3595
 
4.8%
Space Separator 979
 
1.3%
Uppercase Letter 800
 
1.1%
Lowercase Letter 341
 
0.5%
Open Punctuation 156
 
0.2%
Close Punctuation 156
 
0.2%
Dash Punctuation 118
 
0.2%
Other Punctuation 101
 
0.1%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2672
 
3.9%
2563
 
3.7%
2501
 
3.7%
1834
 
2.7%
1651
 
2.4%
1644
 
2.4%
1495
 
2.2%
1471
 
2.2%
1426
 
2.1%
1417
 
2.1%
Other values (380) 49693
72.7%
Uppercase Letter
ValueCountFrequency (%)
C 127
15.9%
S 125
15.6%
D 89
11.1%
K 89
11.1%
M 89
11.1%
L 56
7.0%
H 42
 
5.2%
I 39
 
4.9%
G 33
 
4.1%
E 30
 
3.8%
Other values (7) 81
10.1%
Lowercase Letter
ValueCountFrequency (%)
e 204
59.8%
i 22
 
6.5%
l 22
 
6.5%
s 21
 
6.2%
k 21
 
6.2%
v 14
 
4.1%
c 14
 
4.1%
h 7
 
2.1%
w 6
 
1.8%
g 5
 
1.5%
Decimal Number
ValueCountFrequency (%)
1 1108
30.8%
2 1041
29.0%
3 454
12.6%
4 258
 
7.2%
5 219
 
6.1%
6 169
 
4.7%
9 94
 
2.6%
8 93
 
2.6%
7 92
 
2.6%
0 67
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 75
74.3%
. 26
 
25.7%
Space Separator
ValueCountFrequency (%)
979
100.0%
Open Punctuation
ValueCountFrequency (%)
( 156
100.0%
Close Punctuation
ValueCountFrequency (%)
) 156
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 118
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 68367
91.6%
Common 5105
 
6.8%
Latin 1145
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2672
 
3.9%
2563
 
3.7%
2501
 
3.7%
1834
 
2.7%
1651
 
2.4%
1644
 
2.4%
1495
 
2.2%
1471
 
2.2%
1426
 
2.1%
1417
 
2.1%
Other values (380) 49693
72.7%
Latin
ValueCountFrequency (%)
e 204
17.8%
C 127
11.1%
S 125
10.9%
D 89
 
7.8%
K 89
 
7.8%
M 89
 
7.8%
L 56
 
4.9%
H 42
 
3.7%
I 39
 
3.4%
G 33
 
2.9%
Other values (19) 252
22.0%
Common
ValueCountFrequency (%)
1 1108
21.7%
2 1041
20.4%
979
19.2%
3 454
8.9%
4 258
 
5.1%
5 219
 
4.3%
6 169
 
3.3%
( 156
 
3.1%
) 156
 
3.1%
- 118
 
2.3%
Other values (6) 447
8.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 68367
91.6%
ASCII 6246
 
8.4%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2672
 
3.9%
2563
 
3.7%
2501
 
3.7%
1834
 
2.7%
1651
 
2.4%
1644
 
2.4%
1495
 
2.2%
1471
 
2.2%
1426
 
2.1%
1417
 
2.1%
Other values (380) 49693
72.7%
ASCII
ValueCountFrequency (%)
1 1108
17.7%
2 1041
16.7%
979
15.7%
3 454
 
7.3%
4 258
 
4.1%
5 219
 
3.5%
e 204
 
3.3%
6 169
 
2.7%
( 156
 
2.5%
) 156
 
2.5%
Other values (34) 1502
24.0%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2257
Distinct (%)22.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:30.611981image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)1.2%

Sample

1st rowA13484003
2nd rowA15288806
3rd rowA14003008
4th rowA15279404
5th rowA13186907
ValueCountFrequency (%)
a15701003 13
 
0.1%
a13805002 13
 
0.1%
a14272309 12
 
0.1%
a13381902 12
 
0.1%
a10025850 12
 
0.1%
a15086006 12
 
0.1%
a10024473 11
 
0.1%
a10027716 11
 
0.1%
a13613006 11
 
0.1%
a13986009 11
 
0.1%
Other values (2247) 9882
98.8%
2024-05-11T06:48:31.415619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18629
20.7%
1 17262
19.2%
A 9992
11.1%
3 8729
9.7%
2 8372
9.3%
5 6215
 
6.9%
8 5463
 
6.1%
7 4744
 
5.3%
4 4030
 
4.5%
6 3502
 
3.9%
Other values (2) 3062
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18629
23.3%
1 17262
21.6%
3 8729
10.9%
2 8372
10.5%
5 6215
 
7.8%
8 5463
 
6.8%
7 4744
 
5.9%
4 4030
 
5.0%
6 3502
 
4.4%
9 3054
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18629
23.3%
1 17262
21.6%
3 8729
10.9%
2 8372
10.5%
5 6215
 
7.8%
8 5463
 
6.8%
7 4744
 
5.9%
4 4030
 
5.0%
6 3502
 
4.4%
9 3054
 
3.8%
Latin
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18629
20.7%
1 17262
19.2%
A 9992
11.1%
3 8729
9.7%
2 8372
9.3%
5 6215
 
6.9%
8 5463
 
6.1%
7 4744
 
5.3%
4 4030
 
4.5%
6 3502
 
3.9%
Other values (2) 3062
 
3.4%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:48:31.995708image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7694
Min length2

Characters and Unicode

Total characters47694
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row세대수도료
2nd row고용보험료
3rd row세대전기료
4th row잡비용
5th row선거관리위원회운영비
ValueCountFrequency (%)
청소비 263
 
2.6%
도서인쇄비 239
 
2.4%
통신비 237
 
2.4%
사무용품비 232
 
2.3%
경비비 231
 
2.3%
급여 231
 
2.3%
교육비 228
 
2.3%
승강기유지비 217
 
2.2%
수선유지비 215
 
2.1%
소독비 214
 
2.1%
Other values (77) 7693
76.9%
2024-05-11T06:48:32.735862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5544
 
11.6%
3445
 
7.2%
2135
 
4.5%
1813
 
3.8%
1379
 
2.9%
1328
 
2.8%
1085
 
2.3%
883
 
1.9%
843
 
1.8%
799
 
1.7%
Other values (110) 28440
59.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47694
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5544
 
11.6%
3445
 
7.2%
2135
 
4.5%
1813
 
3.8%
1379
 
2.9%
1328
 
2.8%
1085
 
2.3%
883
 
1.9%
843
 
1.8%
799
 
1.7%
Other values (110) 28440
59.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47694
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5544
 
11.6%
3445
 
7.2%
2135
 
4.5%
1813
 
3.8%
1379
 
2.9%
1328
 
2.8%
1085
 
2.3%
883
 
1.9%
843
 
1.8%
799
 
1.7%
Other values (110) 28440
59.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47694
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5544
 
11.6%
3445
 
7.2%
2135
 
4.5%
1813
 
3.8%
1379
 
2.9%
1328
 
2.8%
1085
 
2.3%
883
 
1.9%
843
 
1.8%
799
 
1.7%
Other values (110) 28440
59.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202305
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202305
2nd row202305
3rd row202305
4th row202305
5th row202305

Common Values

ValueCountFrequency (%)
202305 10000
100.0%

Length

2024-05-11T06:48:33.087198image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:48:33.298939image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202305 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7195
Distinct (%)72.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3620207
Minimum-26542800
Maximum2.9259998 × 108
Zeros1147
Zeros (%)11.5%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:48:33.617394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-26542800
5-th percentile0
Q180395
median337270
Q31587073.2
95-th percentile18157272
Maximum2.9259998 × 108
Range3.1914278 × 108
Interquartile range (IQR)1506678.2

Descriptive statistics

Standard deviation12433195
Coefficient of variation (CV)3.434388
Kurtosis151.31742
Mean3620207
Median Absolute Deviation (MAD)335618
Skewness9.9427495
Sum3.620207 × 1010
Variance1.5458435 × 1014
MonotonicityNot monotonic
2024-05-11T06:48:33.897158image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1147
 
11.5%
200000 92
 
0.9%
300000 62
 
0.6%
100000 55
 
0.5%
150000 43
 
0.4%
400000 42
 
0.4%
500000 40
 
0.4%
30000 34
 
0.3%
50000 33
 
0.3%
250000 27
 
0.3%
Other values (7185) 8425
84.2%
ValueCountFrequency (%)
-26542800 1
 
< 0.1%
-9000540 1
 
< 0.1%
-3961528 1
 
< 0.1%
-1075520 1
 
< 0.1%
-137290 1
 
< 0.1%
-23000 1
 
< 0.1%
-12700 1
 
< 0.1%
-259 1
 
< 0.1%
0 1147
11.5%
1 1
 
< 0.1%
ValueCountFrequency (%)
292599982 1
< 0.1%
277574436 1
< 0.1%
276955733 1
< 0.1%
247582908 1
< 0.1%
221663007 1
< 0.1%
213704770 1
< 0.1%
182875390 1
< 0.1%
178284700 1
< 0.1%
177965494 1
< 0.1%
174367700 1
< 0.1%

Interactions

2024-05-11T06:48:28.220823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:48:34.093117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.640
금액0.6401.000

Missing values

2024-05-11T06:48:28.571196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:48:28.898778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
39761성내현대A13484003세대수도료2023054850820
83087신도림우성1,2차A15288806고용보험료202305122870
66803이촌한강맨션A14003008세대전기료20230538436150
81214고척대우A15279404잡비용202305480000
29234신내중앙하이츠A13186907선거관리위원회운영비2023050
53111서초현대4차A13788201장기수선비2023057011799
93601화곡대림아파트A15788302부과차익2023052552
74491양평거성파스텔A15010306세대전기료20230526614756
92495방화3차우림필유A15785002검침수익20230580410
21645북한산힐스테이트3차A12204004복리후생비202305400000
아파트명아파트코드비용명년월일금액
94580경남아너스빌A15807001주차장수익2023052038770
33221창동주공18단지A13290105잡비용202305550000
47177정릉힐스테이트A13610103청소비2023056324670
56181잠실리센츠A13822003감가상각비202305584150
87830대방2차현대A15681104입주자대표회의운영비202305795000
96596목동현대1차A15882008연체료수익20230514330
49865돈암힐스테이트A13679902선거관리위원회운영비2023050
79271구로두산위브A15205305이자수익2023050
30053대상타운현대A13202002고용보험료202305393970
75727문래국화A15083601감가상각비202305407000