Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 2426 (24.3%) zerosZeros

Reproduction

Analysis started2024-05-11 05:57:14.015750
Analysis finished2024-05-11 05:57:15.296448
Duration1.28 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2242
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:57:15.546303image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.3815
Min length2

Characters and Unicode

Total characters73815
Distinct characters437
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique145 ?
Unique (%)1.5%

Sample

1st row갈현1단지e-편한세상
2nd row중계주공5단지
3rd row도곡한신
4th row강남효성해링턴코트
5th row동아효성아파트(거여2단지)
ValueCountFrequency (%)
아파트 169
 
1.6%
래미안 33
 
0.3%
e편한세상 30
 
0.3%
아이파크 26
 
0.2%
sk뷰 18
 
0.2%
고덕 18
 
0.2%
푸르지오 16
 
0.1%
해모로 15
 
0.1%
경남아너스빌 14
 
0.1%
구로두산위브 13
 
0.1%
Other values (2326) 10424
96.7%
2024-05-11T14:57:16.230668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2471
 
3.3%
2464
 
3.3%
2325
 
3.1%
1900
 
2.6%
1735
 
2.4%
1629
 
2.2%
1459
 
2.0%
1418
 
1.9%
1402
 
1.9%
1392
 
1.9%
Other values (427) 55620
75.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67740
91.8%
Decimal Number 3596
 
4.9%
Space Separator 862
 
1.2%
Uppercase Letter 798
 
1.1%
Lowercase Letter 275
 
0.4%
Close Punctuation 149
 
0.2%
Open Punctuation 149
 
0.2%
Dash Punctuation 126
 
0.2%
Other Punctuation 115
 
0.2%
Letter Number 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2471
 
3.6%
2464
 
3.6%
2325
 
3.4%
1900
 
2.8%
1735
 
2.6%
1629
 
2.4%
1459
 
2.2%
1418
 
2.1%
1402
 
2.1%
1392
 
2.1%
Other values (382) 49545
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 137
17.2%
K 106
13.3%
C 106
13.3%
M 77
9.6%
D 77
9.6%
L 54
 
6.8%
H 46
 
5.8%
I 39
 
4.9%
E 35
 
4.4%
G 30
 
3.8%
Other values (7) 91
11.4%
Lowercase Letter
ValueCountFrequency (%)
e 182
66.2%
l 22
 
8.0%
i 15
 
5.5%
s 14
 
5.1%
k 13
 
4.7%
v 12
 
4.4%
c 6
 
2.2%
h 4
 
1.5%
a 3
 
1.1%
g 3
 
1.1%
Decimal Number
ValueCountFrequency (%)
1 1081
30.1%
2 1005
27.9%
3 503
14.0%
4 278
 
7.7%
5 202
 
5.6%
6 169
 
4.7%
7 119
 
3.3%
8 92
 
2.6%
9 83
 
2.3%
0 64
 
1.8%
Other Punctuation
ValueCountFrequency (%)
, 83
72.2%
. 32
 
27.8%
Space Separator
ValueCountFrequency (%)
862
100.0%
Close Punctuation
ValueCountFrequency (%)
) 149
100.0%
Open Punctuation
ValueCountFrequency (%)
( 149
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 126
100.0%
Letter Number
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67740
91.8%
Common 4997
 
6.8%
Latin 1078
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2471
 
3.6%
2464
 
3.6%
2325
 
3.4%
1900
 
2.8%
1735
 
2.6%
1629
 
2.4%
1459
 
2.2%
1418
 
2.1%
1402
 
2.1%
1392
 
2.1%
Other values (382) 49545
73.1%
Latin
ValueCountFrequency (%)
e 182
16.9%
S 137
12.7%
K 106
9.8%
C 106
9.8%
M 77
 
7.1%
D 77
 
7.1%
L 54
 
5.0%
H 46
 
4.3%
I 39
 
3.6%
E 35
 
3.2%
Other values (19) 219
20.3%
Common
ValueCountFrequency (%)
1 1081
21.6%
2 1005
20.1%
862
17.3%
3 503
10.1%
4 278
 
5.6%
5 202
 
4.0%
6 169
 
3.4%
) 149
 
3.0%
( 149
 
3.0%
- 126
 
2.5%
Other values (6) 473
9.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67740
91.8%
ASCII 6070
 
8.2%
Number Forms 5
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2471
 
3.6%
2464
 
3.6%
2325
 
3.4%
1900
 
2.8%
1735
 
2.6%
1629
 
2.4%
1459
 
2.2%
1418
 
2.1%
1402
 
2.1%
1392
 
2.1%
Other values (382) 49545
73.1%
ASCII
ValueCountFrequency (%)
1 1081
17.8%
2 1005
16.6%
862
14.2%
3 503
 
8.3%
4 278
 
4.6%
5 202
 
3.3%
e 182
 
3.0%
6 169
 
2.8%
) 149
 
2.5%
( 149
 
2.5%
Other values (34) 1490
24.5%
Number Forms
ValueCountFrequency (%)
5
100.0%
Distinct2247
Distinct (%)22.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:57:16.752290image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique147 ?
Unique (%)1.5%

Sample

1st rowA12205003
2nd rowA13922114
3rd rowA13550403
4th rowA10027558
5th rowA13811202
ValueCountFrequency (%)
a13203102 13
 
0.1%
a15205305 13
 
0.1%
a10027817 12
 
0.1%
a13981903 12
 
0.1%
a41279905 12
 
0.1%
a13114106 11
 
0.1%
a13201001 11
 
0.1%
a13201207 11
 
0.1%
a14003106 11
 
0.1%
a13187702 11
 
0.1%
Other values (2237) 9883
98.8%
2024-05-11T14:57:17.437595image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18525
20.6%
1 17584
19.5%
A 9996
11.1%
3 8853
9.8%
2 8355
9.3%
5 6088
 
6.8%
8 5492
 
6.1%
7 4711
 
5.2%
4 4083
 
4.5%
6 3216
 
3.6%
Other values (2) 3097
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18525
23.2%
1 17584
22.0%
3 8853
11.1%
2 8355
10.4%
5 6088
 
7.6%
8 5492
 
6.9%
7 4711
 
5.9%
4 4083
 
5.1%
6 3216
 
4.0%
9 3093
 
3.9%
Uppercase Letter
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18525
23.2%
1 17584
22.0%
3 8853
11.1%
2 8355
10.4%
5 6088
 
7.6%
8 5492
 
6.9%
7 4711
 
5.9%
4 4083
 
5.1%
6 3216
 
4.0%
9 3093
 
3.9%
Latin
ValueCountFrequency (%)
A 9996
> 99.9%
B 4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18525
20.6%
1 17584
19.5%
A 9996
11.1%
3 8853
9.8%
2 8355
9.3%
5 6088
 
6.8%
8 5492
 
6.1%
7 4711
 
5.2%
4 4083
 
4.5%
6 3216
 
3.6%
Other values (2) 3097
 
3.4%
Distinct77
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:57:17.854385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9531
Min length2

Characters and Unicode

Total characters59531
Distinct characters107
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row미지급금
2nd row관리비미수금
3rd row현금
4th row저장품
5th row승강기유지비충당부채
ValueCountFrequency (%)
당기순이익 324
 
3.2%
미처분이익잉여금 321
 
3.2%
선급비용 319
 
3.2%
예금 313
 
3.1%
예수금 311
 
3.1%
비품 308
 
3.1%
관리비미수금 307
 
3.1%
퇴직급여충당부채 304
 
3.0%
공동주택적립금 301
 
3.0%
연차수당충당부채 300
 
3.0%
Other values (67) 6892
68.9%
2024-05-11T14:57:18.481671image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4547
 
7.6%
3783
 
6.4%
3032
 
5.1%
3008
 
5.1%
2945
 
4.9%
2871
 
4.8%
2582
 
4.3%
2456
 
4.1%
1876
 
3.2%
1724
 
2.9%
Other values (97) 30707
51.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59531
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4547
 
7.6%
3783
 
6.4%
3032
 
5.1%
3008
 
5.1%
2945
 
4.9%
2871
 
4.8%
2582
 
4.3%
2456
 
4.1%
1876
 
3.2%
1724
 
2.9%
Other values (97) 30707
51.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59531
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4547
 
7.6%
3783
 
6.4%
3032
 
5.1%
3008
 
5.1%
2945
 
4.9%
2871
 
4.8%
2582
 
4.3%
2456
 
4.1%
1876
 
3.2%
1724
 
2.9%
Other values (97) 30707
51.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59531
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4547
 
7.6%
3783
 
6.4%
3032
 
5.1%
3008
 
5.1%
2945
 
4.9%
2871
 
4.8%
2582
 
4.3%
2456
 
4.1%
1876
 
3.2%
1724
 
2.9%
Other values (97) 30707
51.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202209
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202209
2nd row202209
3rd row202209
4th row202209
5th row202209

Common Values

ValueCountFrequency (%)
202209 10000
100.0%

Length

2024-05-11T14:57:18.712010image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:57:18.857087image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202209 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7242
Distinct (%)72.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74866336
Minimum-4.5199953 × 108
Maximum9.6357165 × 109
Zeros2426
Zeros (%)24.3%
Negative357
Negative (%)3.6%
Memory size166.0 KiB
2024-05-11T14:57:19.019936image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4.5199953 × 108
5-th percentile0
Q10
median2887230
Q335817586
95-th percentile3.614765 × 108
Maximum9.6357165 × 109
Range1.0087716 × 1010
Interquartile range (IQR)35817586

Descriptive statistics

Standard deviation3.020187 × 108
Coefficient of variation (CV)4.0341055
Kurtosis216.31019
Mean74866336
Median Absolute Deviation (MAD)2887230
Skewness11.650235
Sum7.4866336 × 1011
Variance9.1215293 × 1016
MonotonicityNot monotonic
2024-05-11T14:57:19.262475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2426
 
24.3%
250000 30
 
0.3%
500000 29
 
0.3%
300000 15
 
0.1%
484000 13
 
0.1%
200000 12
 
0.1%
1000000 11
 
0.1%
242000 11
 
0.1%
3000000 9
 
0.1%
250400 8
 
0.1%
Other values (7232) 7436
74.4%
ValueCountFrequency (%)
-451999527 1
< 0.1%
-379948708 1
< 0.1%
-372747190 1
< 0.1%
-327335316 1
< 0.1%
-277035390 1
< 0.1%
-251105541 1
< 0.1%
-212149238 1
< 0.1%
-204009106 1
< 0.1%
-134975569 1
< 0.1%
-123816990 1
< 0.1%
ValueCountFrequency (%)
9635716489 1
< 0.1%
6953129100 1
< 0.1%
6565058974 1
< 0.1%
5005823712 1
< 0.1%
4951086682 1
< 0.1%
4757952883 1
< 0.1%
4605943091 1
< 0.1%
4525008373 1
< 0.1%
4479709584 1
< 0.1%
4458505980 1
< 0.1%

Interactions

2024-05-11T14:57:14.824055image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:57:19.413704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.476
금액0.4761.000

Missing values

2024-05-11T14:57:15.011971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:57:15.172921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
14548갈현1단지e-편한세상A12205003미지급금2022097607980
43635중계주공5단지A13922114관리비미수금20220943575320
31273도곡한신A13550403현금20220986090
6677강남효성해링턴코트A10027558저장품2022090
39374동아효성아파트(거여2단지)A13811202승강기유지비충당부채2022090
41855거여우방A13881601당기순이익20220913903773
27270강일리버파크8단지A13410002관리비예치금20220993868000
15347경향파크아파트A12282201주차장충당부채2022090
51215광진트라팰리스A14319305공동주택적립금20220933583430
11525마포삼성A12104005단기보증금20220923100000
아파트명아파트코드비용명년월일금액
67081한사랑2차삼성아파트(등촌동)A15783907주차장충당예금2022092530283
66747가양5단지A15780806공동주택적립금2022090
21239쌍문극동A13203102당기순이익20220921844898
54515당산현대5차A15080507장기수선충당예금2022091638516684
65489마곡신안네트빌(1단지)A15722003가지급금20220990000
29985강남데시앙파크A13519005퇴직급여충당부채20220940435120
30620개포대치2단지A13524009공동주택적립금202209340119649
63379상도현대A15678101임대보증금2022093500000
53592문래두산위브A15009505공동주택적립금2022098472800
11778서강GSA12114001주차장충당예금2022090