Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells4
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1636 (16.4%) zerosZeros

Reproduction

Analysis started2024-05-11 06:51:39.588925
Analysis finished2024-05-11 06:51:41.341694
Duration1.75 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2129
Distinct (%)21.3%
Missing4
Missing (%)< 0.1%
Memory size156.2 KiB
2024-05-11T06:51:41.617225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length21
Mean length7.3686475
Min length2

Characters and Unicode

Total characters73657
Distinct characters429
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique99 ?
Unique (%)1.0%

Sample

1st row삼성동힐스테이트2단지
2nd row상계현대2차
3rd row정릉중앙하이츠
4th row창동주공19단지
5th row광진트라팰리스
ValueCountFrequency (%)
아파트 198
 
1.8%
e편한세상 39
 
0.4%
래미안 36
 
0.3%
아이파크 24
 
0.2%
신반포 20
 
0.2%
북한산 18
 
0.2%
sk뷰 18
 
0.2%
푸르지오 17
 
0.2%
고덕 16
 
0.1%
송파 16
 
0.1%
Other values (2209) 10505
96.3%
2024-05-11T06:51:42.526109image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2722
 
3.7%
2622
 
3.6%
2476
 
3.4%
1734
 
2.4%
1669
 
2.3%
1511
 
2.1%
1500
 
2.0%
1422
 
1.9%
1329
 
1.8%
1237
 
1.7%
Other values (419) 55435
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67496
91.6%
Decimal Number 3325
 
4.5%
Space Separator 1014
 
1.4%
Uppercase Letter 881
 
1.2%
Lowercase Letter 383
 
0.5%
Close Punctuation 153
 
0.2%
Open Punctuation 153
 
0.2%
Dash Punctuation 136
 
0.2%
Other Punctuation 108
 
0.1%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2722
 
4.0%
2622
 
3.9%
2476
 
3.7%
1734
 
2.6%
1669
 
2.5%
1511
 
2.2%
1500
 
2.2%
1422
 
2.1%
1329
 
2.0%
1237
 
1.8%
Other values (374) 49274
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 140
15.9%
C 112
12.7%
K 105
11.9%
M 77
8.7%
D 77
8.7%
L 68
7.7%
H 62
7.0%
I 46
 
5.2%
E 42
 
4.8%
G 30
 
3.4%
Other values (7) 122
13.8%
Lowercase Letter
ValueCountFrequency (%)
e 219
57.2%
l 37
 
9.7%
i 33
 
8.6%
v 25
 
6.5%
k 20
 
5.2%
s 19
 
5.0%
w 12
 
3.1%
c 8
 
2.1%
h 4
 
1.0%
a 3
 
0.8%
Decimal Number
ValueCountFrequency (%)
2 998
30.0%
1 969
29.1%
3 466
14.0%
4 237
 
7.1%
5 176
 
5.3%
6 134
 
4.0%
7 118
 
3.5%
8 93
 
2.8%
9 78
 
2.3%
0 56
 
1.7%
Other Punctuation
ValueCountFrequency (%)
, 89
82.4%
. 19
 
17.6%
Space Separator
ValueCountFrequency (%)
1014
100.0%
Close Punctuation
ValueCountFrequency (%)
) 153
100.0%
Open Punctuation
ValueCountFrequency (%)
( 153
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 136
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67496
91.6%
Common 4889
 
6.6%
Latin 1272
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2722
 
4.0%
2622
 
3.9%
2476
 
3.7%
1734
 
2.6%
1669
 
2.5%
1511
 
2.2%
1500
 
2.2%
1422
 
2.1%
1329
 
2.0%
1237
 
1.8%
Other values (374) 49274
73.0%
Latin
ValueCountFrequency (%)
e 219
17.2%
S 140
11.0%
C 112
 
8.8%
K 105
 
8.3%
M 77
 
6.1%
D 77
 
6.1%
L 68
 
5.3%
H 62
 
4.9%
I 46
 
3.6%
E 42
 
3.3%
Other values (19) 324
25.5%
Common
ValueCountFrequency (%)
1014
20.7%
2 998
20.4%
1 969
19.8%
3 466
9.5%
4 237
 
4.8%
5 176
 
3.6%
) 153
 
3.1%
( 153
 
3.1%
- 136
 
2.8%
6 134
 
2.7%
Other values (6) 453
9.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67496
91.6%
ASCII 6153
 
8.4%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2722
 
4.0%
2622
 
3.9%
2476
 
3.7%
1734
 
2.6%
1669
 
2.5%
1511
 
2.2%
1500
 
2.2%
1422
 
2.1%
1329
 
2.0%
1237
 
1.8%
Other values (374) 49274
73.0%
ASCII
ValueCountFrequency (%)
1014
16.5%
2 998
16.2%
1 969
15.7%
3 466
 
7.6%
4 237
 
3.9%
e 219
 
3.6%
5 176
 
2.9%
) 153
 
2.5%
( 153
 
2.5%
S 140
 
2.3%
Other values (34) 1628
26.5%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2134
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:43.299903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique99 ?
Unique (%)1.0%

Sample

1st rowA13570501
2nd rowA13983709
3rd rowA13684701
4th rowA13290107
5th rowA14319305
ValueCountFrequency (%)
a13204104 16
 
0.2%
a13593801 13
 
0.1%
a13611005 12
 
0.1%
a13384303 12
 
0.1%
a13276415 12
 
0.1%
a15780602 11
 
0.1%
a14003101 11
 
0.1%
a13981006 11
 
0.1%
a13009003 11
 
0.1%
a13570501 11
 
0.1%
Other values (2124) 9880
98.8%
2024-05-11T06:51:44.436213image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 19073
21.2%
1 17417
19.4%
A 10000
11.1%
3 8980
10.0%
2 8513
9.5%
5 5961
 
6.6%
8 5247
 
5.8%
7 4523
 
5.0%
4 4057
 
4.5%
6 3361
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 19073
23.8%
1 17417
21.8%
3 8980
11.2%
2 8513
10.6%
5 5961
 
7.5%
8 5247
 
6.6%
7 4523
 
5.7%
4 4057
 
5.1%
6 3361
 
4.2%
9 2868
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 19073
23.8%
1 17417
21.8%
3 8980
11.2%
2 8513
10.6%
5 5961
 
7.5%
8 5247
 
6.6%
7 4523
 
5.7%
4 4057
 
5.1%
6 3361
 
4.2%
9 2868
 
3.6%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 19073
21.2%
1 17417
19.4%
A 10000
11.1%
3 8980
10.0%
2 8513
9.5%
5 5961
 
6.6%
8 5247
 
5.8%
7 4523
 
5.0%
4 4057
 
4.5%
6 3361
 
3.7%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:45.172930image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8457
Min length2

Characters and Unicode

Total characters48457
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row사무용품비
2nd row잡수익
3rd row교통비
4th row고용안정사업비용
5th row교육비
ValueCountFrequency (%)
급여 240
 
2.4%
경비비 222
 
2.2%
통신비 222
 
2.2%
퇴직급여 220
 
2.2%
교육비 210
 
2.1%
수선유지비 210
 
2.1%
세대전기료 207
 
2.1%
소독비 207
 
2.1%
잡수익 207
 
2.1%
도서인쇄비 206
 
2.1%
Other values (76) 7849
78.5%
2024-05-11T06:51:46.262630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5307
 
11.0%
3627
 
7.5%
2109
 
4.4%
2081
 
4.3%
1600
 
3.3%
1290
 
2.7%
1031
 
2.1%
838
 
1.7%
777
 
1.6%
732
 
1.5%
Other values (110) 29065
60.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48457
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5307
 
11.0%
3627
 
7.5%
2109
 
4.4%
2081
 
4.3%
1600
 
3.3%
1290
 
2.7%
1031
 
2.1%
838
 
1.7%
777
 
1.6%
732
 
1.5%
Other values (110) 29065
60.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48457
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5307
 
11.0%
3627
 
7.5%
2109
 
4.4%
2081
 
4.3%
1600
 
3.3%
1290
 
2.7%
1031
 
2.1%
838
 
1.7%
777
 
1.6%
732
 
1.5%
Other values (110) 29065
60.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48457
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5307
 
11.0%
3627
 
7.5%
2109
 
4.4%
2081
 
4.3%
1600
 
3.3%
1290
 
2.7%
1031
 
2.1%
838
 
1.7%
777
 
1.6%
732
 
1.5%
Other values (110) 29065
60.0%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202208
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202208
2nd row202208
3rd row202208
4th row202208
5th row202208

Common Values

ValueCountFrequency (%)
202208 10000
100.0%

Length

2024-05-11T06:51:46.708483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:51:47.016117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202208 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6730
Distinct (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3570848.1
Minimum-32526190
Maximum3.9829037 × 108
Zeros1636
Zeros (%)16.4%
Negative13
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:51:47.394323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-32526190
5-th percentile0
Q141645
median264940
Q31311227.5
95-th percentile16698577
Maximum3.9829037 × 108
Range4.3081656 × 108
Interquartile range (IQR)1269582.5

Descriptive statistics

Standard deviation14939061
Coefficient of variation (CV)4.1836172
Kurtosis239.3391
Mean3570848.1
Median Absolute Deviation (MAD)264940
Skewness12.819706
Sum3.5708481 × 1010
Variance2.2317556 × 1014
MonotonicityNot monotonic
2024-05-11T06:51:47.904618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1636
 
16.4%
62500 91
 
0.9%
200000 77
 
0.8%
300000 58
 
0.6%
150000 49
 
0.5%
23000 48
 
0.5%
100000 44
 
0.4%
400000 37
 
0.4%
50000 35
 
0.4%
250000 30
 
0.3%
Other values (6720) 7895
79.0%
ValueCountFrequency (%)
-32526190 1
< 0.1%
-3014970 1
< 0.1%
-1575000 1
< 0.1%
-1495140 1
< 0.1%
-1102700 1
< 0.1%
-576000 1
< 0.1%
-450000 1
< 0.1%
-164000 1
< 0.1%
-156880 1
< 0.1%
-100000 1
< 0.1%
ValueCountFrequency (%)
398290368 1
< 0.1%
397223016 1
< 0.1%
382100490 1
< 0.1%
314974422 1
< 0.1%
302170630 1
< 0.1%
294100685 1
< 0.1%
261232360 1
< 0.1%
220344710 1
< 0.1%
217956555 1
< 0.1%
211760820 1
< 0.1%

Interactions

2024-05-11T06:51:40.519588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:51:48.237304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.606
금액0.6061.000

Missing values

2024-05-11T06:51:40.858413image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:51:41.202903image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
46493삼성동힐스테이트2단지A13570501사무용품비202208122100
68138상계현대2차A13983709잡수익2022084180
53149정릉중앙하이츠A13684701교통비20220812000
34878창동주공19단지A13290107고용안정사업비용2022080
75178광진트라팰리스A14319305교육비2022080
24654답십리두산A13003201연체료수익20220894060
71876후암미주A14019001퇴직급여202208894970
52308정릉대우A13676702기타운영비용2022081069250
19290공덕2삼성임대A12170602주차장수익2022081680000
61996방이코오롱A13883602지급수수료2022080
아파트명아파트코드비용명년월일금액
81507양평경남2차아너스빌A15086601이자수익2022082768
48010일원동 수서아파트A13593801국민연금202208947970
98084강서센트레빌4차A15781201검침수익20220889410
32307창동주공4단지A13204104기타운영비용2022080
32185창동신도브래뉴A13204002부과차익202208310
34039방학명품ESA1단지A13285404정화조관리비202208230000
93551흑석한강센트레빌2차A15679109공동주택지원금수익2022080
55688방배임광1,2차A13785005알뜰시장수익202208520000
88592신도림현대A15288803국민연금202208224710
93600대방우정A15681103교육비20220899000