Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1407 (14.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:54:15.286510
Analysis finished2024-05-11 06:54:17.221779
Duration1.94 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2140
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:17.564166image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2782
Min length2

Characters and Unicode

Total characters72782
Distinct characters428
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique102 ?
Unique (%)1.0%

Sample

1st row둔촌역청구아파트
2nd row논현동현
3rd row상도동중앙하이츠빌아파트
4th row휘경 미소지움아파트
5th row래미안남가좌2차
ValueCountFrequency (%)
아파트 186
 
1.7%
래미안 31
 
0.3%
아이파크 29
 
0.3%
e편한세상 20
 
0.2%
sk뷰 18
 
0.2%
고덕 17
 
0.2%
북한산 16
 
0.1%
신반포 16
 
0.1%
힐스테이트 14
 
0.1%
꿈의숲 14
 
0.1%
Other values (2208) 10409
96.6%
2024-05-11T06:54:18.562293image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2621
 
3.6%
2481
 
3.4%
2340
 
3.2%
1859
 
2.6%
1710
 
2.3%
1700
 
2.3%
1493
 
2.1%
1401
 
1.9%
1368
 
1.9%
1302
 
1.8%
Other values (418) 54507
74.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66677
91.6%
Decimal Number 3559
 
4.9%
Space Separator 855
 
1.2%
Uppercase Letter 774
 
1.1%
Lowercase Letter 295
 
0.4%
Open Punctuation 177
 
0.2%
Close Punctuation 177
 
0.2%
Other Punctuation 139
 
0.2%
Dash Punctuation 122
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2621
 
3.9%
2481
 
3.7%
2340
 
3.5%
1859
 
2.8%
1710
 
2.6%
1700
 
2.5%
1493
 
2.2%
1401
 
2.1%
1368
 
2.1%
1302
 
2.0%
Other values (373) 48402
72.6%
Uppercase Letter
ValueCountFrequency (%)
S 123
15.9%
C 121
15.6%
M 95
12.3%
D 95
12.3%
K 84
10.9%
L 57
7.4%
H 42
 
5.4%
E 29
 
3.7%
G 28
 
3.6%
I 24
 
3.1%
Other values (7) 76
9.8%
Lowercase Letter
ValueCountFrequency (%)
e 198
67.1%
s 19
 
6.4%
i 17
 
5.8%
k 17
 
5.8%
l 14
 
4.7%
v 11
 
3.7%
w 9
 
3.1%
c 4
 
1.4%
h 4
 
1.4%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 1090
30.6%
1 1037
29.1%
3 478
13.4%
4 263
 
7.4%
5 207
 
5.8%
6 149
 
4.2%
8 99
 
2.8%
7 90
 
2.5%
9 82
 
2.3%
0 64
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 108
77.7%
. 31
 
22.3%
Space Separator
ValueCountFrequency (%)
855
100.0%
Open Punctuation
ValueCountFrequency (%)
( 177
100.0%
Close Punctuation
ValueCountFrequency (%)
) 177
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 122
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66677
91.6%
Common 5029
 
6.9%
Latin 1076
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2621
 
3.9%
2481
 
3.7%
2340
 
3.5%
1859
 
2.8%
1710
 
2.6%
1700
 
2.5%
1493
 
2.2%
1401
 
2.1%
1368
 
2.1%
1302
 
2.0%
Other values (373) 48402
72.6%
Latin
ValueCountFrequency (%)
e 198
18.4%
S 123
11.4%
C 121
11.2%
M 95
8.8%
D 95
8.8%
K 84
7.8%
L 57
 
5.3%
H 42
 
3.9%
E 29
 
2.7%
G 28
 
2.6%
Other values (19) 204
19.0%
Common
ValueCountFrequency (%)
2 1090
21.7%
1 1037
20.6%
855
17.0%
3 478
9.5%
4 263
 
5.2%
5 207
 
4.1%
( 177
 
3.5%
) 177
 
3.5%
6 149
 
3.0%
- 122
 
2.4%
Other values (6) 474
9.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66677
91.6%
ASCII 6098
 
8.4%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2621
 
3.9%
2481
 
3.7%
2340
 
3.5%
1859
 
2.8%
1710
 
2.6%
1700
 
2.5%
1493
 
2.2%
1401
 
2.1%
1368
 
2.1%
1302
 
2.0%
Other values (373) 48402
72.6%
ASCII
ValueCountFrequency (%)
2 1090
17.9%
1 1037
17.0%
855
14.0%
3 478
 
7.8%
4 263
 
4.3%
5 207
 
3.4%
e 198
 
3.2%
( 177
 
2.9%
) 177
 
2.9%
6 149
 
2.4%
Other values (34) 1467
24.1%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2145
Distinct (%)21.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:19.207152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique103 ?
Unique (%)1.0%

Sample

1st rowA13484501
2nd rowA13582002
3rd rowA15683402
4th rowA13077702
5th rowA12012101
ValueCountFrequency (%)
a14383205 14
 
0.1%
a13790620 12
 
0.1%
a10024927 12
 
0.1%
a14319012 12
 
0.1%
a13606101 12
 
0.1%
a13613011 12
 
0.1%
a13985402 12
 
0.1%
a13981901 12
 
0.1%
a13994501 12
 
0.1%
a15085404 11
 
0.1%
Other values (2135) 9879
98.8%
2024-05-11T06:54:20.374658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18784
20.9%
1 17609
19.6%
A 10000
11.1%
3 8869
9.9%
2 8245
9.2%
5 6104
 
6.8%
8 5414
 
6.0%
7 4689
 
5.2%
4 3962
 
4.4%
6 3502
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18784
23.5%
1 17609
22.0%
3 8869
11.1%
2 8245
10.3%
5 6104
 
7.6%
8 5414
 
6.8%
7 4689
 
5.9%
4 3962
 
5.0%
6 3502
 
4.4%
9 2822
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18784
23.5%
1 17609
22.0%
3 8869
11.1%
2 8245
10.3%
5 6104
 
7.6%
8 5414
 
6.8%
7 4689
 
5.9%
4 3962
 
5.0%
6 3502
 
4.4%
9 2822
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18784
20.9%
1 17609
19.6%
A 10000
11.1%
3 8869
9.9%
2 8245
9.2%
5 6104
 
6.8%
8 5414
 
6.0%
7 4689
 
5.2%
4 3962
 
4.4%
6 3502
 
3.9%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:20.867509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8503
Min length2

Characters and Unicode

Total characters48503
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row세금과공과
2nd row고용안정사업비용
3rd row회계감사비
4th row잡수익
5th row고용보험료
ValueCountFrequency (%)
보험료 242
 
2.4%
청소비 240
 
2.4%
소독비 226
 
2.3%
교육비 225
 
2.2%
급여 223
 
2.2%
이자수익 219
 
2.2%
경비비 210
 
2.1%
잡수익 209
 
2.1%
통신비 207
 
2.1%
수선유지비 205
 
2.1%
Other values (77) 7794
77.9%
2024-05-11T06:54:21.678730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5386
 
11.1%
3612
 
7.4%
2122
 
4.4%
2037
 
4.2%
1681
 
3.5%
1291
 
2.7%
1044
 
2.2%
858
 
1.8%
840
 
1.7%
790
 
1.6%
Other values (110) 28842
59.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48503
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5386
 
11.1%
3612
 
7.4%
2122
 
4.4%
2037
 
4.2%
1681
 
3.5%
1291
 
2.7%
1044
 
2.2%
858
 
1.8%
840
 
1.7%
790
 
1.6%
Other values (110) 28842
59.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48503
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5386
 
11.1%
3612
 
7.4%
2122
 
4.4%
2037
 
4.2%
1681
 
3.5%
1291
 
2.7%
1044
 
2.2%
858
 
1.8%
840
 
1.7%
790
 
1.6%
Other values (110) 28842
59.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48503
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5386
 
11.1%
3612
 
7.4%
2122
 
4.4%
2037
 
4.2%
1681
 
3.5%
1291
 
2.7%
1044
 
2.2%
858
 
1.8%
840
 
1.7%
790
 
1.6%
Other values (110) 28842
59.5%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202107
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202107
2nd row202107
3rd row202107
4th row202107
5th row202107

Common Values

ValueCountFrequency (%)
202107 10000
100.0%

Length

2024-05-11T06:54:22.094263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:54:22.385749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202107 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6754
Distinct (%)67.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3312674.8
Minimum-20864685
Maximum2.4530275 × 108
Zeros1407
Zeros (%)14.1%
Negative16
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:54:22.737145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-20864685
5-th percentile0
Q158250
median300000
Q31340195
95-th percentile15650178
Maximum2.4530275 × 108
Range2.6616743 × 108
Interquartile range (IQR)1281945

Descriptive statistics

Standard deviation12170443
Coefficient of variation (CV)3.6739022
Kurtosis114.77281
Mean3312674.8
Median Absolute Deviation (MAD)300000
Skewness9.1375607
Sum3.3126748 × 1010
Variance1.4811969 × 1014
MonotonicityNot monotonic
2024-05-11T06:54:23.198570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1407
 
14.1%
200000 96
 
1.0%
300000 64
 
0.6%
100000 59
 
0.6%
400000 41
 
0.4%
500000 40
 
0.4%
30000 37
 
0.4%
150000 36
 
0.4%
250000 35
 
0.4%
110000 30
 
0.3%
Other values (6744) 8155
81.5%
ValueCountFrequency (%)
-20864685 1
< 0.1%
-10027284 1
< 0.1%
-3500000 1
< 0.1%
-2174380 1
< 0.1%
-804091 1
< 0.1%
-742954 1
< 0.1%
-263210 1
< 0.1%
-160010 1
< 0.1%
-155450 1
< 0.1%
-94480 1
< 0.1%
ValueCountFrequency (%)
245302748 1
< 0.1%
228270539 1
< 0.1%
220124639 1
< 0.1%
196233834 1
< 0.1%
194600410 1
< 0.1%
192129830 1
< 0.1%
189507450 1
< 0.1%
184986623 1
< 0.1%
183577613 1
< 0.1%
180907947 1
< 0.1%

Interactions

2024-05-11T06:54:16.452534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:54:23.461434image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.648
금액0.6481.000

Missing values

2024-05-11T06:54:16.838822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:54:17.139061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
39867둔촌역청구아파트A13484501세금과공과2021075355
44706논현동현A13582002고용안정사업비용202107900000
91603상도동중앙하이츠빌아파트A15683402회계감사비20210782500
24095휘경 미소지움아파트A13077702잡수익20210732500
14106래미안남가좌2차A12012101고용보험료202107122470
32067방학4단지신동아A13285507음식물처리비202107632400
64425상계성림(미라보)A13980903경비비2021079634880
38304강일리버파크8단지A13410002교육비20210745000
85781구로현대연예인A15286807승강기유지비2021071210000
48904길음서희스타힐스A13613012고용안정사업비용202107236380
아파트명아파트코드비용명년월일금액
86182신도림우성3차A15288804검침비용202107122120
11545신당남산타운임대A10045301도서인쇄비202107550000
65652상계한신A13983608검침비용202107488310
78972양평성원A15086603사무용품비202107110000
94623마곡수명산파크1단지A15728008기타부대비202107945530
8990상도파크자이 아파트A10027424통신비202107173370
46363동소문동송산A13603401정화조관리비202107321230
87590시흥삼익A15383702소독비202107590000
16565망원휴먼빌A12123001재활용품수익20210740690
36314성수동아A13382106임대료수익2021070