Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15820/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 is highly skewed (γ1 = 35.10582981)Skewed
금액 has 2249 (22.5%) zerosZeros

Reproduction

Analysis started2024-05-11 05:59:47.347414
Analysis finished2024-05-11 05:59:48.456150
Duration1.11 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2208
Distinct (%)22.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:48.638554image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.2989
Min length2

Characters and Unicode

Total characters72989
Distinct characters437
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)1.2%

Sample

1st row공덕삼성임대
2nd row번동신원
3rd row온수힐스테이트
4th row행당대림제2
5th row목동현대아이파크
ValueCountFrequency (%)
아파트 159
 
1.5%
래미안 25
 
0.2%
아이파크 25
 
0.2%
e편한세상 16
 
0.1%
sk뷰 16
 
0.1%
신반포 15
 
0.1%
고덕 14
 
0.1%
북한산 14
 
0.1%
2단지 13
 
0.1%
힐스테이트 13
 
0.1%
Other values (2276) 10359
97.1%
2024-05-11T14:59:49.140953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2464
 
3.4%
2374
 
3.3%
2201
 
3.0%
1846
 
2.5%
1844
 
2.5%
1689
 
2.3%
1503
 
2.1%
1491
 
2.0%
1442
 
2.0%
1348
 
1.8%
Other values (427) 54787
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66915
91.7%
Decimal Number 3712
 
5.1%
Space Separator 751
 
1.0%
Uppercase Letter 728
 
1.0%
Lowercase Letter 349
 
0.5%
Open Punctuation 143
 
0.2%
Close Punctuation 143
 
0.2%
Dash Punctuation 128
 
0.2%
Other Punctuation 112
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2464
 
3.7%
2374
 
3.5%
2201
 
3.3%
1846
 
2.8%
1844
 
2.8%
1689
 
2.5%
1503
 
2.2%
1491
 
2.2%
1442
 
2.2%
1348
 
2.0%
Other values (381) 48713
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 120
16.5%
C 99
13.6%
K 95
13.0%
M 66
9.1%
D 66
9.1%
L 53
7.3%
H 38
 
5.2%
I 38
 
5.2%
G 37
 
5.1%
E 23
 
3.2%
Other values (7) 93
12.8%
Lowercase Letter
ValueCountFrequency (%)
e 174
49.9%
l 42
 
12.0%
i 34
 
9.7%
s 24
 
6.9%
v 23
 
6.6%
k 15
 
4.3%
h 12
 
3.4%
w 7
 
2.0%
a 6
 
1.7%
c 6
 
1.7%
Decimal Number
ValueCountFrequency (%)
1 1132
30.5%
2 1067
28.7%
3 496
13.4%
5 249
 
6.7%
4 237
 
6.4%
6 163
 
4.4%
7 119
 
3.2%
8 86
 
2.3%
9 84
 
2.3%
0 79
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 92
82.1%
. 20
 
17.9%
Space Separator
ValueCountFrequency (%)
751
100.0%
Open Punctuation
ValueCountFrequency (%)
( 143
100.0%
Close Punctuation
ValueCountFrequency (%)
) 143
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 128
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%
Math Symbol
ValueCountFrequency (%)
~ 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66915
91.7%
Common 4993
 
6.8%
Latin 1081
 
1.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2464
 
3.7%
2374
 
3.5%
2201
 
3.3%
1846
 
2.8%
1844
 
2.8%
1689
 
2.5%
1503
 
2.2%
1491
 
2.2%
1442
 
2.2%
1348
 
2.0%
Other values (381) 48713
72.8%
Latin
ValueCountFrequency (%)
e 174
16.1%
S 120
11.1%
C 99
 
9.2%
K 95
 
8.8%
M 66
 
6.1%
D 66
 
6.1%
L 53
 
4.9%
l 42
 
3.9%
H 38
 
3.5%
I 38
 
3.5%
Other values (19) 290
26.8%
Common
ValueCountFrequency (%)
1 1132
22.7%
2 1067
21.4%
751
15.0%
3 496
9.9%
5 249
 
5.0%
4 237
 
4.7%
6 163
 
3.3%
( 143
 
2.9%
) 143
 
2.9%
- 128
 
2.6%
Other values (7) 484
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66915
91.7%
ASCII 6070
 
8.3%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2464
 
3.7%
2374
 
3.5%
2201
 
3.3%
1846
 
2.8%
1844
 
2.8%
1689
 
2.5%
1503
 
2.2%
1491
 
2.2%
1442
 
2.2%
1348
 
2.0%
Other values (381) 48713
72.8%
ASCII
ValueCountFrequency (%)
1 1132
18.6%
2 1067
17.6%
751
12.4%
3 496
 
8.2%
5 249
 
4.1%
4 237
 
3.9%
e 174
 
2.9%
6 163
 
2.7%
( 143
 
2.4%
) 143
 
2.4%
Other values (35) 1515
25.0%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2216
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:49.499190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)1.2%

Sample

1st rowA12180404
2nd rowA14206306
3rd rowA15279101
4th rowA13377902
5th rowA15805102
ValueCountFrequency (%)
a15210209 13
 
0.1%
a10027105 12
 
0.1%
a13985201 12
 
0.1%
a41279902 11
 
0.1%
a15284906 11
 
0.1%
a13386702 11
 
0.1%
a13986302 11
 
0.1%
a13790701 11
 
0.1%
a13981006 11
 
0.1%
a13380803 11
 
0.1%
Other values (2206) 9886
98.9%
2024-05-11T14:59:50.094604image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18393
20.4%
1 17558
19.5%
A 9985
11.1%
3 8922
9.9%
2 8179
9.1%
5 6211
 
6.9%
8 5746
 
6.4%
7 4744
 
5.3%
4 3853
 
4.3%
6 3444
 
3.8%
Other values (2) 2965
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18393
23.0%
1 17558
21.9%
3 8922
11.2%
2 8179
10.2%
5 6211
 
7.8%
8 5746
 
7.2%
7 4744
 
5.9%
4 3853
 
4.8%
6 3444
 
4.3%
9 2950
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9985
99.9%
B 15
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18393
23.0%
1 17558
21.9%
3 8922
11.2%
2 8179
10.2%
5 6211
 
7.8%
8 5746
 
7.2%
7 4744
 
5.9%
4 3853
 
4.8%
6 3444
 
4.3%
9 2950
 
3.7%
Latin
ValueCountFrequency (%)
A 9985
99.9%
B 15
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18393
20.4%
1 17558
19.5%
A 9985
11.1%
3 8922
9.9%
2 8179
9.1%
5 6211
 
6.9%
8 5746
 
6.4%
7 4744
 
5.3%
4 3853
 
4.3%
6 3444
 
3.8%
Other values (2) 2965
 
3.3%
Distinct76
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:59:50.404477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length9
Mean length5.9971
Min length2

Characters and Unicode

Total characters59971
Distinct characters106
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row예금
2nd row미지급금
3rd row비품
4th row기타유동부채
5th row선수전기료
ValueCountFrequency (%)
퇴직급여충당부채 336
 
3.4%
당기순이익 332
 
3.3%
미처분이익잉여금 324
 
3.2%
비품 316
 
3.2%
관리비미수금 311
 
3.1%
예수금 308
 
3.1%
장기수선충당예금 308
 
3.1%
공동주택적립금 305
 
3.0%
예금 302
 
3.0%
장기수선충당부채 300
 
3.0%
Other values (66) 6858
68.6%
2024-05-11T14:59:50.857416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4635
 
7.7%
3793
 
6.3%
3099
 
5.2%
3066
 
5.1%
3025
 
5.0%
2970
 
5.0%
2681
 
4.5%
2417
 
4.0%
1868
 
3.1%
1767
 
2.9%
Other values (96) 30650
51.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59971
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4635
 
7.7%
3793
 
6.3%
3099
 
5.2%
3066
 
5.1%
3025
 
5.0%
2970
 
5.0%
2681
 
4.5%
2417
 
4.0%
1868
 
3.1%
1767
 
2.9%
Other values (96) 30650
51.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59971
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4635
 
7.7%
3793
 
6.3%
3099
 
5.2%
3066
 
5.1%
3025
 
5.0%
2970
 
5.0%
2681
 
4.5%
2417
 
4.0%
1868
 
3.1%
1767
 
2.9%
Other values (96) 30650
51.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59971
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4635
 
7.7%
3793
 
6.3%
3099
 
5.2%
3066
 
5.1%
3025
 
5.0%
2970
 
5.0%
2681
 
4.5%
2417
 
4.0%
1868
 
3.1%
1767
 
2.9%
Other values (96) 30650
51.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202010
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202010
2nd row202010
3rd row202010
4th row202010
5th row202010

Common Values

ValueCountFrequency (%)
202010 10000
100.0%

Length

2024-05-11T14:59:51.065207image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:59:51.193025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202010 10000
100.0%

금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct7420
Distinct (%)74.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73639584
Minimum-3.7738628 × 108
Maximum2.1615869 × 1010
Zeros2249
Zeros (%)22.5%
Negative329
Negative (%)3.3%
Memory size166.0 KiB
2024-05-11T14:59:51.367031image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3.7738628 × 108
5-th percentile0
Q10
median3678608
Q338513282
95-th percentile3.4265629 × 108
Maximum2.1615869 × 1010
Range2.1993255 × 1010
Interquartile range (IQR)38513282

Descriptive statistics

Standard deviation3.9628152 × 108
Coefficient of variation (CV)5.3813656
Kurtosis1775.2319
Mean73639584
Median Absolute Deviation (MAD)3678608
Skewness35.10583
Sum7.3639584 × 1011
Variance1.5703904 × 1017
MonotonicityNot monotonic
2024-05-11T14:59:51.586719image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2249
 
22.5%
500000 30
 
0.3%
250000 21
 
0.2%
300000 17
 
0.2%
1000000 15
 
0.1%
242000 14
 
0.1%
200000 14
 
0.1%
30000000 9
 
0.1%
484000 9
 
0.1%
3000000 8
 
0.1%
Other values (7410) 7614
76.1%
ValueCountFrequency (%)
-377386276 1
< 0.1%
-354885698 1
< 0.1%
-244322324 1
< 0.1%
-196126890 1
< 0.1%
-184997700 1
< 0.1%
-161866980 1
< 0.1%
-151802932 1
< 0.1%
-138881815 1
< 0.1%
-131194420 1
< 0.1%
-120561290 1
< 0.1%
ValueCountFrequency (%)
21615869006 1
< 0.1%
21537672006 1
< 0.1%
8961727613 1
< 0.1%
5564067547 1
< 0.1%
5264091580 1
< 0.1%
4931323810 1
< 0.1%
4523353887 1
< 0.1%
4076152578 1
< 0.1%
3868711988 1
< 0.1%
3488531881 1
< 0.1%

Interactions

2024-05-11T14:59:48.117447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:59:51.729081image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.153
금액0.1531.000

Missing values

2024-05-11T14:59:48.262106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:59:48.403847image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
11016공덕삼성임대A12180404예금20201049069104
47280번동신원A14206306미지급금20201013946290
56093온수힐스테이트A15279101비품20201039355080
23018행당대림제2A13377902기타유동부채2020100
65736목동현대아이파크A15805102선수전기료2020101020774
20634방학벽산2차A13283405공동체활성화단체지원적립금202010500000
30154역삼래미안A13592706기타시설운영충당부채2020100
7935문화촌현대A12009305기타유동부채20201026780
28279우성캐릭터199 아파트A13527003미부과관리비202010113162375
30298대청A13594007주차장충당부채2020100
아파트명아파트코드비용명년월일금액
36207잠원한신그린A13790701선급비용2020102863930
63101방화동부센트레빌A15722108미부과관리비20201085268580
3620힐스테이트 백련산4차 아파트A10026834전신전화가입권202010180000
26988삼성롯데A13509007상여충당부채2020100
54941신도림태영타운A15205513미처분이익잉여금2020100
33763잠원동아A13703027저장품2020101655600
57211구로현대상선A15286802공동주택적립금202010301133
4642상도파크자이 아파트A10027424비품20201028722970
2011신정이든채A10025649연차수당충당부채2020104384970
26176강동현대홈타운A13485301당기순이익20201067694554