Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1610 (16.1%) zerosZeros

Reproduction

Analysis started2024-05-11 06:49:49.250965
Analysis finished2024-05-11 06:49:51.431239
Duration2.18 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2158
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:51.879966image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.4079
Min length2

Characters and Unicode

Total characters74079
Distinct characters431
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique96 ?
Unique (%)1.0%

Sample

1st rowe편한세상 염창
2nd row당산동1차효성아파트
3rd row송파한양2차
4th row한강타운아파트
5th row오류푸르지오
ValueCountFrequency (%)
아파트 211
 
1.9%
래미안 50
 
0.5%
아이파크 31
 
0.3%
e편한세상 29
 
0.3%
신반포 20
 
0.2%
sk뷰 18
 
0.2%
푸르지오 17
 
0.2%
힐스테이트 16
 
0.1%
북한산 15
 
0.1%
신내역 15
 
0.1%
Other values (2241) 10553
96.2%
2024-05-11T06:49:53.456183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2734
 
3.7%
2644
 
3.6%
2588
 
3.5%
1742
 
2.4%
1740
 
2.3%
1624
 
2.2%
1491
 
2.0%
1484
 
2.0%
1323
 
1.8%
1319
 
1.8%
Other values (421) 55390
74.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67951
91.7%
Decimal Number 3471
 
4.7%
Space Separator 1066
 
1.4%
Uppercase Letter 703
 
0.9%
Lowercase Letter 347
 
0.5%
Open Punctuation 157
 
0.2%
Close Punctuation 157
 
0.2%
Other Punctuation 116
 
0.2%
Dash Punctuation 107
 
0.1%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2734
 
4.0%
2644
 
3.9%
2588
 
3.8%
1742
 
2.6%
1740
 
2.6%
1624
 
2.4%
1491
 
2.2%
1484
 
2.2%
1323
 
1.9%
1319
 
1.9%
Other values (376) 49262
72.5%
Uppercase Letter
ValueCountFrequency (%)
C 122
17.4%
S 118
16.8%
K 85
12.1%
M 80
11.4%
D 80
11.4%
H 35
 
5.0%
E 34
 
4.8%
L 33
 
4.7%
I 24
 
3.4%
G 18
 
2.6%
Other values (7) 74
10.5%
Lowercase Letter
ValueCountFrequency (%)
e 190
54.8%
s 27
 
7.8%
k 26
 
7.5%
l 26
 
7.5%
i 25
 
7.2%
v 19
 
5.5%
c 12
 
3.5%
w 9
 
2.6%
h 7
 
2.0%
a 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
1 1040
30.0%
2 1032
29.7%
3 441
12.7%
4 257
 
7.4%
5 185
 
5.3%
6 146
 
4.2%
8 109
 
3.1%
7 105
 
3.0%
9 94
 
2.7%
0 62
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 92
79.3%
. 24
 
20.7%
Space Separator
ValueCountFrequency (%)
1066
100.0%
Open Punctuation
ValueCountFrequency (%)
( 157
100.0%
Close Punctuation
ValueCountFrequency (%)
) 157
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 107
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67951
91.7%
Common 5074
 
6.8%
Latin 1054
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2734
 
4.0%
2644
 
3.9%
2588
 
3.8%
1742
 
2.6%
1740
 
2.6%
1624
 
2.4%
1491
 
2.2%
1484
 
2.2%
1323
 
1.9%
1319
 
1.9%
Other values (376) 49262
72.5%
Latin
ValueCountFrequency (%)
e 190
18.0%
C 122
11.6%
S 118
11.2%
K 85
 
8.1%
M 80
 
7.6%
D 80
 
7.6%
H 35
 
3.3%
E 34
 
3.2%
L 33
 
3.1%
s 27
 
2.6%
Other values (19) 250
23.7%
Common
ValueCountFrequency (%)
1066
21.0%
1 1040
20.5%
2 1032
20.3%
3 441
8.7%
4 257
 
5.1%
5 185
 
3.6%
( 157
 
3.1%
) 157
 
3.1%
6 146
 
2.9%
8 109
 
2.1%
Other values (6) 484
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67951
91.7%
ASCII 6124
 
8.3%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2734
 
4.0%
2644
 
3.9%
2588
 
3.8%
1742
 
2.6%
1740
 
2.6%
1624
 
2.4%
1491
 
2.2%
1484
 
2.2%
1323
 
1.9%
1319
 
1.9%
Other values (376) 49262
72.5%
ASCII
ValueCountFrequency (%)
1066
17.4%
1 1040
17.0%
2 1032
16.9%
3 441
 
7.2%
4 257
 
4.2%
e 190
 
3.1%
5 185
 
3.0%
( 157
 
2.6%
) 157
 
2.6%
6 146
 
2.4%
Other values (34) 1453
23.7%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2162
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:54.640546image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)1.0%

Sample

1st rowA10025600
2nd rowA15004506
3rd rowA13885304
4th rowA15780604
5th rowA15210209
ValueCountFrequency (%)
a14272314 13
 
0.1%
a15009402 13
 
0.1%
a15805302 13
 
0.1%
a10025387 13
 
0.1%
a15210209 13
 
0.1%
a13486703 12
 
0.1%
a13606003 12
 
0.1%
a15602007 12
 
0.1%
a15722104 11
 
0.1%
a13922901 11
 
0.1%
Other values (2152) 9877
98.8%
2024-05-11T06:49:56.074802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18788
20.9%
1 17392
19.3%
A 10000
11.1%
3 8841
9.8%
2 8414
9.3%
5 6305
 
7.0%
8 5498
 
6.1%
7 4526
 
5.0%
4 3914
 
4.3%
6 3598
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18788
23.5%
1 17392
21.7%
3 8841
11.1%
2 8414
10.5%
5 6305
 
7.9%
8 5498
 
6.9%
7 4526
 
5.7%
4 3914
 
4.9%
6 3598
 
4.5%
9 2724
 
3.4%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18788
23.5%
1 17392
21.7%
3 8841
11.1%
2 8414
10.5%
5 6305
 
7.9%
8 5498
 
6.9%
7 4526
 
5.7%
4 3914
 
4.9%
6 3598
 
4.5%
9 2724
 
3.4%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18788
20.9%
1 17392
19.3%
A 10000
11.1%
3 8841
9.8%
2 8414
9.3%
5 6305
 
7.0%
8 5498
 
6.1%
7 4526
 
5.0%
4 3914
 
4.3%
6 3598
 
4.0%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:49:57.018733image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8444
Min length2

Characters and Unicode

Total characters48444
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st row국민연금
2nd row검침수익
3rd row복리후생비
4th row음식물처리비
5th row세대수도료
ValueCountFrequency (%)
퇴직급여 233
 
2.3%
수선유지비 232
 
2.3%
통신비 220
 
2.2%
세대전기료 218
 
2.2%
승강기유지비 218
 
2.2%
산재보험료 215
 
2.1%
청소비 214
 
2.1%
도서인쇄비 212
 
2.1%
사무용품비 212
 
2.1%
소독비 209
 
2.1%
Other values (77) 7817
78.2%
2024-05-11T06:49:58.340623image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5356
 
11.1%
3530
 
7.3%
2164
 
4.5%
1925
 
4.0%
1383
 
2.9%
1336
 
2.8%
1092
 
2.3%
886
 
1.8%
862
 
1.8%
815
 
1.7%
Other values (110) 29095
60.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48444
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5356
 
11.1%
3530
 
7.3%
2164
 
4.5%
1925
 
4.0%
1383
 
2.9%
1336
 
2.8%
1092
 
2.3%
886
 
1.8%
862
 
1.8%
815
 
1.7%
Other values (110) 29095
60.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48444
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5356
 
11.1%
3530
 
7.3%
2164
 
4.5%
1925
 
4.0%
1383
 
2.9%
1336
 
2.8%
1092
 
2.3%
886
 
1.8%
862
 
1.8%
815
 
1.7%
Other values (110) 29095
60.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48444
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5356
 
11.1%
3530
 
7.3%
2164
 
4.5%
1925
 
4.0%
1383
 
2.9%
1336
 
2.8%
1092
 
2.3%
886
 
1.8%
862
 
1.8%
815
 
1.7%
Other values (110) 29095
60.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202311
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202311
2nd row202311
3rd row202311
4th row202311
5th row202311

Common Values

ValueCountFrequency (%)
202311 10000
100.0%

Length

2024-05-11T06:49:58.924923image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:49:59.250157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202311 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6881
Distinct (%)68.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4160977.9
Minimum-900000
Maximum3.9555692 × 108
Zeros1610
Zeros (%)16.1%
Negative6
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:49:59.747643image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-900000
5-th percentile0
Q153187.5
median302250
Q31500767.5
95-th percentile20748695
Maximum3.9555692 × 108
Range3.9645692 × 108
Interquartile range (IQR)1447580

Descriptive statistics

Standard deviation15650890
Coefficient of variation (CV)3.761349
Kurtosis164.10257
Mean4160977.9
Median Absolute Deviation (MAD)302250
Skewness10.518958
Sum4.1609779 × 1010
Variance2.4495036 × 1014
MonotonicityNot monotonic
2024-05-11T06:50:00.324025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1610
 
16.1%
200000 64
 
0.6%
300000 54
 
0.5%
100000 44
 
0.4%
400000 42
 
0.4%
150000 40
 
0.4%
30000 32
 
0.3%
250000 32
 
0.3%
600000 31
 
0.3%
500000 28
 
0.3%
Other values (6871) 8023
80.2%
ValueCountFrequency (%)
-900000 1
 
< 0.1%
-708140 1
 
< 0.1%
-237930 1
 
< 0.1%
-35820 1
 
< 0.1%
-1169 1
 
< 0.1%
-603 1
 
< 0.1%
0 1610
16.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
5 2
 
< 0.1%
ValueCountFrequency (%)
395556924 1
< 0.1%
354799008 1
< 0.1%
322414270 1
< 0.1%
302398685 1
< 0.1%
293929600 1
< 0.1%
278272701 1
< 0.1%
254065410 1
< 0.1%
245627208 1
< 0.1%
215705432 1
< 0.1%
207323610 1
< 0.1%

Interactions

2024-05-11T06:49:50.427470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:50:00.707554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.416
금액0.4161.000

Missing values

2024-05-11T06:49:50.811838image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:49:51.225609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
7266e편한세상 염창A10025600국민연금202311255750
75721당산동1차효성아파트A15004506검침수익202311206400
60397송파한양2차A13885304복리후생비202311246091
94694한강타운아파트A15780604음식물처리비2023111593820
83295오류푸르지오A15210209세대수도료2023117288750
96548염창한화꿈에그린A15786424승강기수익202311325000
37565래미안하이리버A13380302소모품비2023111991054
20021공덕래미안5차A12170603산재보험료202311227330
68876공릉현대성우A13994501교통비2023110
68109하계현대우성A13987303회계감사비2023110
아파트명아파트코드비용명년월일금액
17379디엠씨현대A12013101지급수수료2023111757050
89759사당동작삼성래미안아파트A15609306연체료수익202311252440
98612목동13단지A15807605알뜰시장수익2023114565100
61738상계금호타운A13920501부과차익202311855
2973디에이치포레센트아파트A10024258주차장수익202311551670
28000신내동성4차A13113003잡수익202311650
4225위례포레샤인18단지A10024577세대난방비20231150225340
54875서초우성5차아파트A13785705광고료수익20231180000
44921도곡대림아크로빌A13527014세대수도료2023118988400
22481백련산힐스테이트1차A12201003소독비202311584000