Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1529 (15.3%) zerosZeros

Reproduction

Analysis started2024-05-18 02:46:42.362893
Analysis finished2024-05-18 02:46:43.976443
Duration1.61 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2086
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T11:46:44.234967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3876
Min length2

Characters and Unicode

Total characters73876
Distinct characters430
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique88 ?
Unique (%)0.9%

Sample

1st row청년주택 와이엔타워
2nd row장위참누리
3rd row서울역한라비발디센트럴아파트
4th row송파파인타운11단지
5th row광장현대파크빌
ValueCountFrequency (%)
아파트 194
 
1.8%
래미안 49
 
0.4%
아이파크 37
 
0.3%
e편한세상 29
 
0.3%
sk뷰 24
 
0.2%
코오롱하늘채아파트 18
 
0.2%
자이 17
 
0.2%
래미안밤섬리베뉴 17
 
0.2%
북한산 17
 
0.2%
신길삼두 16
 
0.1%
Other values (2167) 10506
96.2%
2024-05-18T11:46:45.049729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2720
 
3.7%
2704
 
3.7%
2545
 
3.4%
1751
 
2.4%
1665
 
2.3%
1530
 
2.1%
1507
 
2.0%
1429
 
1.9%
1267
 
1.7%
1252
 
1.7%
Other values (420) 55506
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67731
91.7%
Decimal Number 3324
 
4.5%
Space Separator 1009
 
1.4%
Uppercase Letter 956
 
1.3%
Lowercase Letter 329
 
0.4%
Close Punctuation 155
 
0.2%
Open Punctuation 155
 
0.2%
Dash Punctuation 123
 
0.2%
Other Punctuation 84
 
0.1%
Letter Number 10
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2720
 
4.0%
2704
 
4.0%
2545
 
3.8%
1751
 
2.6%
1665
 
2.5%
1530
 
2.3%
1507
 
2.2%
1429
 
2.1%
1267
 
1.9%
1252
 
1.8%
Other values (375) 49361
72.9%
Uppercase Letter
ValueCountFrequency (%)
S 145
15.2%
C 140
14.6%
K 107
11.2%
D 103
10.8%
M 103
10.8%
L 66
6.9%
H 51
 
5.3%
I 49
 
5.1%
E 44
 
4.6%
G 33
 
3.5%
Other values (7) 115
12.0%
Lowercase Letter
ValueCountFrequency (%)
e 178
54.1%
l 36
 
10.9%
i 28
 
8.5%
s 23
 
7.0%
v 23
 
7.0%
k 20
 
6.1%
w 8
 
2.4%
h 5
 
1.5%
c 4
 
1.2%
g 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
2 1000
30.1%
1 996
30.0%
3 463
13.9%
4 223
 
6.7%
5 175
 
5.3%
6 129
 
3.9%
7 106
 
3.2%
8 99
 
3.0%
9 81
 
2.4%
0 52
 
1.6%
Other Punctuation
ValueCountFrequency (%)
, 63
75.0%
. 21
 
25.0%
Space Separator
ValueCountFrequency (%)
1009
100.0%
Close Punctuation
ValueCountFrequency (%)
) 155
100.0%
Open Punctuation
ValueCountFrequency (%)
( 155
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 123
100.0%
Letter Number
ValueCountFrequency (%)
10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67731
91.7%
Common 4850
 
6.6%
Latin 1295
 
1.8%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2720
 
4.0%
2704
 
4.0%
2545
 
3.8%
1751
 
2.6%
1665
 
2.5%
1530
 
2.3%
1507
 
2.2%
1429
 
2.1%
1267
 
1.9%
1252
 
1.8%
Other values (375) 49361
72.9%
Latin
ValueCountFrequency (%)
e 178
13.7%
S 145
11.2%
C 140
10.8%
K 107
 
8.3%
D 103
 
8.0%
M 103
 
8.0%
L 66
 
5.1%
H 51
 
3.9%
I 49
 
3.8%
E 44
 
3.4%
Other values (19) 309
23.9%
Common
ValueCountFrequency (%)
1009
20.8%
2 1000
20.6%
1 996
20.5%
3 463
9.5%
4 223
 
4.6%
5 175
 
3.6%
) 155
 
3.2%
( 155
 
3.2%
6 129
 
2.7%
- 123
 
2.5%
Other values (6) 422
8.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67731
91.7%
ASCII 6135
 
8.3%
Number Forms 10
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2720
 
4.0%
2704
 
4.0%
2545
 
3.8%
1751
 
2.6%
1665
 
2.5%
1530
 
2.3%
1507
 
2.2%
1429
 
2.1%
1267
 
1.9%
1252
 
1.8%
Other values (375) 49361
72.9%
ASCII
ValueCountFrequency (%)
1009
16.4%
2 1000
16.3%
1 996
16.2%
3 463
 
7.5%
4 223
 
3.6%
e 178
 
2.9%
5 175
 
2.9%
) 155
 
2.5%
( 155
 
2.5%
S 145
 
2.4%
Other values (34) 1636
26.7%
Number Forms
ValueCountFrequency (%)
10
100.0%
Distinct2090
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T11:46:45.578898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique88 ?
Unique (%)0.9%

Sample

1st rowA10023990
2nd rowA13614302
3rd rowA10026517
4th rowA13821002
5th rowA14381516
ValueCountFrequency (%)
a15083701 16
 
0.2%
a15086601 15
 
0.1%
a13084803 13
 
0.1%
a15681503 13
 
0.1%
a14072702 13
 
0.1%
a12081703 13
 
0.1%
a12281701 12
 
0.1%
a10024552 12
 
0.1%
a12012202 12
 
0.1%
a13485401 12
 
0.1%
Other values (2080) 9869
98.7%
2024-05-18T11:46:46.566642image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 19279
21.4%
1 17596
19.6%
A 10000
11.1%
3 9033
10.0%
2 8462
9.4%
5 5887
 
6.5%
8 5162
 
5.7%
7 4313
 
4.8%
4 4031
 
4.5%
6 3369
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 19279
24.1%
1 17596
22.0%
3 9033
11.3%
2 8462
10.6%
5 5887
 
7.4%
8 5162
 
6.5%
7 4313
 
5.4%
4 4031
 
5.0%
6 3369
 
4.2%
9 2868
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 19279
24.1%
1 17596
22.0%
3 9033
11.3%
2 8462
10.6%
5 5887
 
7.4%
8 5162
 
6.5%
7 4313
 
5.4%
4 4031
 
5.0%
6 3369
 
4.2%
9 2868
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 19279
21.4%
1 17596
19.6%
A 10000
11.1%
3 9033
10.0%
2 8462
9.4%
5 5887
 
6.5%
8 5162
 
5.7%
7 4313
 
4.8%
4 4031
 
4.5%
6 3369
 
3.7%
Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T11:46:47.118764image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.9
Min length2

Characters and Unicode

Total characters49000
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row검침수익
2nd row충당부채전입이자비용
3rd row사무용품비
4th row주차장수익
5th row공동가스료
ValueCountFrequency (%)
급여 231
 
2.3%
퇴직급여 224
 
2.2%
교육비 223
 
2.2%
소독비 222
 
2.2%
연체료수익 215
 
2.1%
도서인쇄비 215
 
2.1%
보험료 214
 
2.1%
경비비 212
 
2.1%
세대전기료 211
 
2.1%
소모품비 211
 
2.1%
Other values (75) 7822
78.2%
2024-05-18T11:46:48.284964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5383
 
11.0%
3522
 
7.2%
2071
 
4.2%
2053
 
4.2%
1691
 
3.5%
1286
 
2.6%
1055
 
2.2%
860
 
1.8%
782
 
1.6%
741
 
1.5%
Other values (110) 29556
60.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 49000
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5383
 
11.0%
3522
 
7.2%
2071
 
4.2%
2053
 
4.2%
1691
 
3.5%
1286
 
2.6%
1055
 
2.2%
860
 
1.8%
782
 
1.6%
741
 
1.5%
Other values (110) 29556
60.3%

Most occurring scripts

ValueCountFrequency (%)
Hangul 49000
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5383
 
11.0%
3522
 
7.2%
2071
 
4.2%
2053
 
4.2%
1691
 
3.5%
1286
 
2.6%
1055
 
2.2%
860
 
1.8%
782
 
1.6%
741
 
1.5%
Other values (110) 29556
60.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 49000
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5383
 
11.0%
3522
 
7.2%
2071
 
4.2%
2053
 
4.2%
1691
 
3.5%
1286
 
2.6%
1055
 
2.2%
860
 
1.8%
782
 
1.6%
741
 
1.5%
Other values (110) 29556
60.3%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202212
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202212
2nd row202212
3rd row202212
4th row202212
5th row202212

Common Values

ValueCountFrequency (%)
202212 10000
100.0%

Length

2024-05-18T11:46:48.729630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T11:46:49.037573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202212 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7032
Distinct (%)70.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4224578.2
Minimum-49368050
Maximum7.8209429 × 108
Zeros1529
Zeros (%)15.3%
Negative22
Negative (%)0.2%
Memory size166.0 KiB
2024-05-18T11:46:49.374410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-49368050
5-th percentile0
Q149495
median280000
Q31388295
95-th percentile18873855
Maximum7.8209429 × 108
Range8.3146234 × 108
Interquartile range (IQR)1338800

Descriptive statistics

Standard deviation19587429
Coefficient of variation (CV)4.6365408
Kurtosis410.68106
Mean4224578.2
Median Absolute Deviation (MAD)280000
Skewness15.90773
Sum4.2245782 × 1010
Variance3.8366739 × 1014
MonotonicityNot monotonic
2024-05-18T11:46:49.792159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1529
 
15.3%
200000 80
 
0.8%
100000 66
 
0.7%
300000 52
 
0.5%
400000 32
 
0.3%
30000 31
 
0.3%
150000 31
 
0.3%
600000 31
 
0.3%
500000 30
 
0.3%
50000 27
 
0.3%
Other values (7022) 8091
80.9%
ValueCountFrequency (%)
-49368050 1
< 0.1%
-26926030 1
< 0.1%
-13540546 1
< 0.1%
-2949560 1
< 0.1%
-1410454 1
< 0.1%
-601000 1
< 0.1%
-562820 1
< 0.1%
-517590 1
< 0.1%
-331890 1
< 0.1%
-315965 1
< 0.1%
ValueCountFrequency (%)
782094290 1
< 0.1%
534926470 1
< 0.1%
447455018 1
< 0.1%
442813660 1
< 0.1%
307257130 1
< 0.1%
294496908 1
< 0.1%
285865605 1
< 0.1%
273556047 1
< 0.1%
266343970 1
< 0.1%
264398490 1
< 0.1%

Interactions

2024-05-18T11:46:43.426133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T11:46:50.043437image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.568
금액0.5681.000

Missing values

2024-05-18T11:46:43.719641image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T11:46:43.899728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
1126청년주택 와이엔타워A10023990검침수익202212116100
52201장위참누리A13614302충당부채전입이자비용202212116284
9418서울역한라비발디센트럴아파트A10026517사무용품비20221263690
60240송파파인타운11단지A13821002주차장수익2022122289990
77401광장현대파크빌A14381516공동가스료202212251120
60884잠실우성4차A13822902주차장수익2022123488710
32391방학한화성원A13202306검침수익202212144480
83261신대림신동아파밀리에A15095002소독비202212230000
98997가양2단지A15780605기타운영비용2022120
64593상계주공10단지A13920804연차수당2022120
아파트명아파트코드비용명년월일금액
56438방배신동아A13784907급여20221218998390
95332상도쌍용A15683901광고료수익202212585000
85325우리유앤미A15205001청소비2022123174230
93360사당유니드A15609001입주자대표회의운영비202212250000
70205중계주공10단지A13986004세대수도료2022126870020
59367가락상아1차A13813004주차장수익202212630000
82561양평신동아아파트A15086202소모품비202212634220
86469개봉삼호아파트A15209202소모품비202212308130
49746돈암일신건영휴먼빌A13606003연체료수익20221214700
56619방배2차현대홈타운A13785201퇴직급여2022121440000