Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2242 (22.4%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:01.913752
Analysis finished2024-05-11 05:59:03.169952
Duration1.26 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2224
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:03.380104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2941
Min length2

Characters and Unicode

Total characters72941
Distinct characters437
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique136 ?
Unique (%)1.4%

Sample

1st row명수대현대
2nd row천왕이펜하우스1단지
3rd row청계한신휴플러스
4th row신길삼환
5th row개포7차우성
ValueCountFrequency (%)
아파트 139
 
1.3%
래미안 35
 
0.3%
e편한세상 20
 
0.2%
경남아너스빌 15
 
0.1%
마포 15
 
0.1%
아이파크 15
 
0.1%
신반포 15
 
0.1%
은평뉴타운상림마을6단지 15
 
0.1%
고덕 13
 
0.1%
sk뷰 13
 
0.1%
Other values (2291) 10316
97.2%
2024-05-11T14:59:03.957756image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2404
 
3.3%
2361
 
3.2%
2240
 
3.1%
1878
 
2.6%
1811
 
2.5%
1692
 
2.3%
1500
 
2.1%
1468
 
2.0%
1463
 
2.0%
1287
 
1.8%
Other values (427) 54837
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66866
91.7%
Decimal Number 3719
 
5.1%
Uppercase Letter 747
 
1.0%
Space Separator 681
 
0.9%
Lowercase Letter 355
 
0.5%
Close Punctuation 155
 
0.2%
Open Punctuation 155
 
0.2%
Dash Punctuation 144
 
0.2%
Other Punctuation 113
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2404
 
3.6%
2361
 
3.5%
2240
 
3.3%
1878
 
2.8%
1811
 
2.7%
1692
 
2.5%
1500
 
2.2%
1468
 
2.2%
1463
 
2.2%
1287
 
1.9%
Other values (382) 48762
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 125
16.7%
C 102
13.7%
K 92
12.3%
D 68
9.1%
M 68
9.1%
L 54
7.2%
H 48
 
6.4%
G 33
 
4.4%
E 32
 
4.3%
I 30
 
4.0%
Other values (7) 95
12.7%
Lowercase Letter
ValueCountFrequency (%)
e 201
56.6%
l 30
 
8.5%
i 27
 
7.6%
k 21
 
5.9%
s 20
 
5.6%
v 18
 
5.1%
c 16
 
4.5%
w 9
 
2.5%
h 7
 
2.0%
a 3
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1106
29.7%
2 1094
29.4%
3 475
12.8%
4 284
 
7.6%
5 210
 
5.6%
6 148
 
4.0%
7 132
 
3.5%
8 98
 
2.6%
9 94
 
2.5%
0 78
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 90
79.6%
. 23
 
20.4%
Space Separator
ValueCountFrequency (%)
681
100.0%
Close Punctuation
ValueCountFrequency (%)
) 155
100.0%
Open Punctuation
ValueCountFrequency (%)
( 155
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 144
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66866
91.7%
Common 4967
 
6.8%
Latin 1108
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2404
 
3.6%
2361
 
3.5%
2240
 
3.3%
1878
 
2.8%
1811
 
2.7%
1692
 
2.5%
1500
 
2.2%
1468
 
2.2%
1463
 
2.2%
1287
 
1.9%
Other values (382) 48762
72.9%
Latin
ValueCountFrequency (%)
e 201
18.1%
S 125
11.3%
C 102
 
9.2%
K 92
 
8.3%
D 68
 
6.1%
M 68
 
6.1%
L 54
 
4.9%
H 48
 
4.3%
G 33
 
3.0%
E 32
 
2.9%
Other values (19) 285
25.7%
Common
ValueCountFrequency (%)
1 1106
22.3%
2 1094
22.0%
681
13.7%
3 475
9.6%
4 284
 
5.7%
5 210
 
4.2%
) 155
 
3.1%
( 155
 
3.1%
6 148
 
3.0%
- 144
 
2.9%
Other values (6) 515
10.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66866
91.7%
ASCII 6069
 
8.3%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2404
 
3.6%
2361
 
3.5%
2240
 
3.3%
1878
 
2.8%
1811
 
2.7%
1692
 
2.5%
1500
 
2.2%
1468
 
2.2%
1463
 
2.2%
1287
 
1.9%
Other values (382) 48762
72.9%
ASCII
ValueCountFrequency (%)
1 1106
18.2%
2 1094
18.0%
681
11.2%
3 475
 
7.8%
4 284
 
4.7%
5 210
 
3.5%
e 201
 
3.3%
) 155
 
2.6%
( 155
 
2.6%
6 148
 
2.4%
Other values (34) 1560
25.7%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2229
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:04.329408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique137 ?
Unique (%)1.4%

Sample

1st rowA15679105
2nd rowA15213006
3rd rowA13003005
4th rowA15005705
5th rowA13594403
ValueCountFrequency (%)
a13592601 12
 
0.1%
a15721007 12
 
0.1%
a13920207 11
 
0.1%
a12201003 11
 
0.1%
a13821005 11
 
0.1%
a15784008 11
 
0.1%
a14006001 11
 
0.1%
a15805002 11
 
0.1%
a12185504 10
 
0.1%
a41279903 10
 
0.1%
Other values (2219) 9890
98.9%
2024-05-11T14:59:04.955017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18305
20.3%
1 17612
19.6%
A 9994
11.1%
3 8776
9.8%
2 8392
9.3%
5 6157
 
6.8%
8 5717
 
6.4%
7 4806
 
5.3%
4 3930
 
4.4%
6 3311
 
3.7%
Other values (2) 3000
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18305
22.9%
1 17612
22.0%
3 8776
11.0%
2 8392
10.5%
5 6157
 
7.7%
8 5717
 
7.1%
7 4806
 
6.0%
4 3930
 
4.9%
6 3311
 
4.1%
9 2994
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18305
22.9%
1 17612
22.0%
3 8776
11.0%
2 8392
10.5%
5 6157
 
7.7%
8 5717
 
7.1%
7 4806
 
6.0%
4 3930
 
4.9%
6 3311
 
4.1%
9 2994
 
3.7%
Latin
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18305
20.3%
1 17612
19.6%
A 9994
11.1%
3 8776
9.8%
2 8392
9.3%
5 6157
 
6.8%
8 5717
 
6.4%
7 4806
 
5.3%
4 3930
 
4.4%
6 3311
 
3.7%
Other values (2) 3000
 
3.3%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:05.341182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9775
Min length2

Characters and Unicode

Total characters59775
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row선수금
2nd row수선유지비충당부채
3rd row선급금
4th row기타인건비충당부채
5th row기타의비유동부채
ValueCountFrequency (%)
예금 325
 
3.2%
공동주택적립금 324
 
3.2%
당기순이익 321
 
3.2%
예수금 314
 
3.1%
미처분이익잉여금 308
 
3.1%
선급비용 307
 
3.1%
비품감가상각누계액 300
 
3.0%
비품 294
 
2.9%
현금 289
 
2.9%
미부과관리비 288
 
2.9%
Other values (67) 6930
69.3%
2024-05-11T14:59:05.905475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4648
 
7.8%
3762
 
6.3%
3076
 
5.1%
3030
 
5.1%
2998
 
5.0%
2868
 
4.8%
2570
 
4.3%
2438
 
4.1%
1867
 
3.1%
1772
 
3.0%
Other values (97) 30746
51.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59775
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4648
 
7.8%
3762
 
6.3%
3076
 
5.1%
3030
 
5.1%
2998
 
5.0%
2868
 
4.8%
2570
 
4.3%
2438
 
4.1%
1867
 
3.1%
1772
 
3.0%
Other values (97) 30746
51.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59775
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4648
 
7.8%
3762
 
6.3%
3076
 
5.1%
3030
 
5.1%
2998
 
5.0%
2868
 
4.8%
2570
 
4.3%
2438
 
4.1%
1867
 
3.1%
1772
 
3.0%
Other values (97) 30746
51.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59775
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4648
 
7.8%
3762
 
6.3%
3076
 
5.1%
3030
 
5.1%
2998
 
5.0%
2868
 
4.8%
2570
 
4.3%
2438
 
4.1%
1867
 
3.1%
1772
 
3.0%
Other values (97) 30746
51.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202106
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202106
2nd row202106
3rd row202106
4th row202106
5th row202106

Common Values

ValueCountFrequency (%)
202106 10000
100.0%

Length

2024-05-11T14:59:06.113073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:59:06.245406image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202106 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7423
Distinct (%)74.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70900741
Minimum-1.9470423 × 108
Maximum7.7686708 × 109
Zeros2242
Zeros (%)22.4%
Negative360
Negative (%)3.6%
Memory size166.0 KiB
2024-05-11T14:59:06.409505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1.9470423 × 108
5-th percentile0
Q10
median3042395
Q333948912
95-th percentile3.4010499 × 108
Maximum7.7686708 × 109
Range7.9633751 × 109
Interquartile range (IQR)33948912

Descriptive statistics

Standard deviation2.8033231 × 108
Coefficient of variation (CV)3.95387
Kurtosis218.59471
Mean70900741
Median Absolute Deviation (MAD)3042395
Skewness11.87336
Sum7.0900741 × 1011
Variance7.8586203 × 1016
MonotonicityNot monotonic
2024-05-11T14:59:06.634882image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2242
 
22.4%
500000 26
 
0.3%
250000 25
 
0.2%
242000 14
 
0.1%
300000 13
 
0.1%
2000000 13
 
0.1%
200000 12
 
0.1%
1000000 11
 
0.1%
100000 10
 
0.1%
10000000 9
 
0.1%
Other values (7413) 7625
76.2%
ValueCountFrequency (%)
-194704230 1
< 0.1%
-182775015 1
< 0.1%
-175231185 1
< 0.1%
-153861997 1
< 0.1%
-152040753 1
< 0.1%
-149241812 1
< 0.1%
-128999000 1
< 0.1%
-126083581 1
< 0.1%
-113580500 1
< 0.1%
-109090670 1
< 0.1%
ValueCountFrequency (%)
7768670842 1
< 0.1%
7740016860 1
< 0.1%
6223659276 1
< 0.1%
5490318343 1
< 0.1%
5241202454 1
< 0.1%
5201529526 1
< 0.1%
5030565723 1
< 0.1%
4471555406 1
< 0.1%
4114027730 1
< 0.1%
3967016615 1
< 0.1%

Interactions

2024-05-11T14:59:02.757075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:59:06.779457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.431
금액0.4311.000

Missing values

2024-05-11T14:59:02.976006image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:59:03.109509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
62859명수대현대A15679105선수금2021060
57855천왕이펜하우스1단지A15213006수선유지비충당부채2021063200180
14692청계한신휴플러스A13003005선급금202106609320
52303신길삼환A15005705기타인건비충당부채2021060
31464개포7차우성A13594403기타의비유동부채202106101473720
15604래미안장안2차A13010005가수금2021066801871
55945봉천벽산타운2차A15180701비품2021069407920
70031신정뉴타운롯데캐슬A15883402미수금20210611260
15307휘경주공1단지A13009002승강기유지비충당부채2021060
38248거여5단지A13811205기타유형자산20210626365200
아파트명아파트코드비용명년월일금액
44740상계성원A13982101미지급금20210613758230
70465신정학마을1단지A15886512가지급금2021061177680
57701오류삼천리A15210210예수금2021063794120
66415등촌태영A15783902주차장충당예금202106188903121
1700신내글로리움아파트A10025137저장품202106120000
67489화곡유림노르웨이숲A15791001선수수도료202106141650
55173신림동부센트레빌A15102202현금202106359110
59791신도림대림1,2차A15288814장기수선충당부채2021061631072258
44703상계한신2차A13982005장기수선충당부채202106625585928
1655DMC에코자이A10025130장기수선충당예금20210650656594