Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 20.10608385)Skewed
금액 has 603 (6.0%) zerosZeros

Reproduction

Analysis started2024-05-11 06:47:44.079817
Analysis finished2024-05-11 06:47:45.828188
Duration1.75 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2237
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:47:46.089710image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.3892
Min length2

Characters and Unicode

Total characters73892
Distinct characters432
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique138 ?
Unique (%)1.4%

Sample

1st row휘경동양1.2차
2nd row우장산한화꿈에그린
3rd row정릉스카이쌍용
4th row이문대우1차
5th row하계극동건영벽산
ValueCountFrequency (%)
아파트 198
 
1.8%
래미안 57
 
0.5%
아이파크 25
 
0.2%
e편한세상 22
 
0.2%
이편한세상 20
 
0.2%
sk뷰 19
 
0.2%
해모로 16
 
0.1%
마포 15
 
0.1%
센트럴 13
 
0.1%
휘경 13
 
0.1%
Other values (2323) 10512
96.4%
2024-05-11T06:47:46.938634image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2590
 
3.5%
2550
 
3.5%
2437
 
3.3%
1840
 
2.5%
1686
 
2.3%
1579
 
2.1%
1467
 
2.0%
1453
 
2.0%
1424
 
1.9%
1407
 
1.9%
Other values (422) 55459
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67625
91.5%
Decimal Number 3613
 
4.9%
Space Separator 1010
 
1.4%
Uppercase Letter 849
 
1.1%
Lowercase Letter 287
 
0.4%
Dash Punctuation 135
 
0.2%
Open Punctuation 132
 
0.2%
Close Punctuation 132
 
0.2%
Other Punctuation 106
 
0.1%
Letter Number 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2590
 
3.8%
2550
 
3.8%
2437
 
3.6%
1840
 
2.7%
1686
 
2.5%
1579
 
2.3%
1467
 
2.2%
1453
 
2.1%
1424
 
2.1%
1407
 
2.1%
Other values (377) 49192
72.7%
Uppercase Letter
ValueCountFrequency (%)
S 131
15.4%
C 116
13.7%
K 101
11.9%
D 82
9.7%
M 82
9.7%
L 57
6.7%
I 50
 
5.9%
H 45
 
5.3%
E 40
 
4.7%
G 30
 
3.5%
Other values (7) 115
13.5%
Lowercase Letter
ValueCountFrequency (%)
e 169
58.9%
l 26
 
9.1%
i 23
 
8.0%
s 17
 
5.9%
v 15
 
5.2%
k 13
 
4.5%
w 9
 
3.1%
h 7
 
2.4%
c 6
 
2.1%
g 1
 
0.3%
Decimal Number
ValueCountFrequency (%)
2 1069
29.6%
1 1069
29.6%
3 483
13.4%
4 257
 
7.1%
5 198
 
5.5%
6 156
 
4.3%
7 104
 
2.9%
9 103
 
2.9%
8 99
 
2.7%
0 75
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 86
81.1%
. 20
 
18.9%
Space Separator
ValueCountFrequency (%)
1010
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 135
100.0%
Open Punctuation
ValueCountFrequency (%)
( 132
100.0%
Close Punctuation
ValueCountFrequency (%)
) 132
100.0%
Letter Number
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67625
91.5%
Common 5128
 
6.9%
Latin 1139
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2590
 
3.8%
2550
 
3.8%
2437
 
3.6%
1840
 
2.7%
1686
 
2.5%
1579
 
2.3%
1467
 
2.2%
1453
 
2.1%
1424
 
2.1%
1407
 
2.1%
Other values (377) 49192
72.7%
Latin
ValueCountFrequency (%)
e 169
14.8%
S 131
11.5%
C 116
10.2%
K 101
 
8.9%
D 82
 
7.2%
M 82
 
7.2%
L 57
 
5.0%
I 50
 
4.4%
H 45
 
4.0%
E 40
 
3.5%
Other values (19) 266
23.4%
Common
ValueCountFrequency (%)
2 1069
20.8%
1 1069
20.8%
1010
19.7%
3 483
9.4%
4 257
 
5.0%
5 198
 
3.9%
6 156
 
3.0%
- 135
 
2.6%
( 132
 
2.6%
) 132
 
2.6%
Other values (6) 487
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67625
91.5%
ASCII 6264
 
8.5%
Number Forms 3
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2590
 
3.8%
2550
 
3.8%
2437
 
3.6%
1840
 
2.7%
1686
 
2.5%
1579
 
2.3%
1467
 
2.2%
1453
 
2.1%
1424
 
2.1%
1407
 
2.1%
Other values (377) 49192
72.7%
ASCII
ValueCountFrequency (%)
2 1069
17.1%
1 1069
17.1%
1010
16.1%
3 483
 
7.7%
4 257
 
4.1%
5 198
 
3.2%
e 169
 
2.7%
6 156
 
2.5%
- 135
 
2.2%
( 132
 
2.1%
Other values (34) 1586
25.3%
Number Forms
ValueCountFrequency (%)
3
100.0%
Distinct2242
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:47:47.609918image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique139 ?
Unique (%)1.4%

Sample

1st rowA13009001
2nd rowA15701004
3rd rowA13676504
4th rowA13082702
5th rowA13987306
ValueCountFrequency (%)
a15086601 13
 
0.1%
a12119006 13
 
0.1%
a13885102 12
 
0.1%
a13790703 12
 
0.1%
a13920506 12
 
0.1%
a15009602 12
 
0.1%
a15678103 12
 
0.1%
a15086006 11
 
0.1%
a13508012 11
 
0.1%
a13285406 11
 
0.1%
Other values (2232) 9881
98.8%
2024-05-11T06:47:48.670312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18684
20.8%
1 17351
19.3%
A 9990
11.1%
3 8855
9.8%
2 8357
9.3%
5 6217
 
6.9%
8 5518
 
6.1%
7 4578
 
5.1%
4 4107
 
4.6%
6 3410
 
3.8%
Other values (2) 2933
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18684
23.4%
1 17351
21.7%
3 8855
11.1%
2 8357
10.4%
5 6217
 
7.8%
8 5518
 
6.9%
7 4578
 
5.7%
4 4107
 
5.1%
6 3410
 
4.3%
9 2923
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18684
23.4%
1 17351
21.7%
3 8855
11.1%
2 8357
10.4%
5 6217
 
7.8%
8 5518
 
6.9%
7 4578
 
5.7%
4 4107
 
5.1%
6 3410
 
4.3%
9 2923
 
3.7%
Latin
ValueCountFrequency (%)
A 9990
99.9%
B 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18684
20.8%
1 17351
19.3%
A 9990
11.1%
3 8855
9.8%
2 8357
9.3%
5 6217
 
6.9%
8 5518
 
6.1%
7 4578
 
5.1%
4 4107
 
4.6%
6 3410
 
3.8%
Other values (2) 2933
 
3.3%
Distinct84
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:47:49.409394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7308
Min length2

Characters and Unicode

Total characters47308
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row보험료
2nd row승강기유지비
3rd row업무추진비
4th row정화조관리비
5th row고용안정사업수익
ValueCountFrequency (%)
승강기유지비 266
 
2.7%
청소비 258
 
2.6%
사무용품비 254
 
2.5%
세대전기료 249
 
2.5%
보험료 249
 
2.5%
퇴직급여 241
 
2.4%
경비비 240
 
2.4%
위탁관리수수료 239
 
2.4%
복리후생비 233
 
2.3%
통신비 232
 
2.3%
Other values (74) 7539
75.4%
2024-05-11T06:47:50.523804image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5411
 
11.4%
3477
 
7.3%
2318
 
4.9%
1756
 
3.7%
1380
 
2.9%
1297
 
2.7%
1137
 
2.4%
981
 
2.1%
930
 
2.0%
884
 
1.9%
Other values (110) 27737
58.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47308
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5411
 
11.4%
3477
 
7.3%
2318
 
4.9%
1756
 
3.7%
1380
 
2.9%
1297
 
2.7%
1137
 
2.4%
981
 
2.1%
930
 
2.0%
884
 
1.9%
Other values (110) 27737
58.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47308
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5411
 
11.4%
3477
 
7.3%
2318
 
4.9%
1756
 
3.7%
1380
 
2.9%
1297
 
2.7%
1137
 
2.4%
981
 
2.1%
930
 
2.0%
884
 
1.9%
Other values (110) 27737
58.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47308
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5411
 
11.4%
3477
 
7.3%
2318
 
4.9%
1756
 
3.7%
1380
 
2.9%
1297
 
2.7%
1137
 
2.4%
981
 
2.1%
930
 
2.0%
884
 
1.9%
Other values (110) 27737
58.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202302
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202302
2nd row202302
3rd row202302
4th row202302
5th row202302

Common Values

ValueCountFrequency (%)
202302 10000
100.0%

Length

2024-05-11T06:47:51.137787image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:47:51.699735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202302 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7506
Distinct (%)75.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4498065.3
Minimum-1641320
Maximum8.1540053 × 108
Zeros603
Zeros (%)6.0%
Negative7
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:47:52.275116image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1641320
5-th percentile0
Q1112880
median388000
Q31685415
95-th percentile21549663
Maximum8.1540053 × 108
Range8.1704185 × 108
Interquartile range (IQR)1572535

Descriptive statistics

Standard deviation19118615
Coefficient of variation (CV)4.2504085
Kurtosis696.6956
Mean4498065.3
Median Absolute Deviation (MAD)358000
Skewness20.106084
Sum4.4980653 × 1010
Variance3.6552144 × 1014
MonotonicityNot monotonic
2024-05-11T06:47:52.988942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 603
 
6.0%
200000 104
 
1.0%
100000 87
 
0.9%
300000 70
 
0.7%
150000 47
 
0.5%
78000 41
 
0.4%
50000 39
 
0.4%
110000 31
 
0.3%
500000 27
 
0.3%
250000 27
 
0.3%
Other values (7496) 8924
89.2%
ValueCountFrequency (%)
-1641320 1
 
< 0.1%
-549170 1
 
< 0.1%
-276920 1
 
< 0.1%
-269110 1
 
< 0.1%
-240000 1
 
< 0.1%
-173830 1
 
< 0.1%
-104 1
 
< 0.1%
0 603
6.0%
1 1
 
< 0.1%
2 1
 
< 0.1%
ValueCountFrequency (%)
815400530 1
< 0.1%
811006690 1
< 0.1%
375066224 1
< 0.1%
320390497 1
< 0.1%
306598670 1
< 0.1%
299683120 1
< 0.1%
279442580 1
< 0.1%
238206490 1
< 0.1%
224800800 1
< 0.1%
214550060 1
< 0.1%

Interactions

2024-05-11T06:47:44.955172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:47:53.394863image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.389
금액0.3891.000

Missing values

2024-05-11T06:47:45.354305image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:47:45.699303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
22527휘경동양1.2차A13009001보험료202302748730
82777우장산한화꿈에그린A15701004승강기유지비202302726000
45942정릉스카이쌍용A13676504업무추진비202302300000
23732이문대우1차A13082702정화조관리비202302516800
61569하계극동건영벽산A13987306고용안정사업수익2023020
46069정릉1차e-편한세상A13676703세대수도료20230215196910
16787서강GSA12114001통신비202302226380
61376하계1차청구아파트A13987205회계감사비20230290750
83093등촌IPARKA15703204연차수당2023021772070
12755신당약수하이츠A10045404사무용품비202302184400
아파트명아파트코드비용명년월일금액
5724항동하버라인2단지A10025387소독비202302283500
92029은평뉴타운우물골6단지A41279917충당부채전입이자비용2023020
77593신도림현대A15288803입주자대표회의운영비202302250000
38905강남신동아파밀리에2단지A13519002음식물처리비202302403410
7466목동롯데캐슬 마에스트로A10026023제수당2023022408020
39378수서삼익A13522003음식물처리비202302576900
63826시티파크2단지A14088201세대난방비20230224014800
9094서초푸르지오써밋A10026941급여20230234527150
71603양평성원A15086603국민연금202302237460
21001DMC자이1단지A12275501건강보험료202302431120