Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1182 (11.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:59:42.530047
Analysis finished2024-05-11 06:59:44.986565
Duration2.46 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2092
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:45.659249image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1268
Min length2

Characters and Unicode

Total characters71268
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)0.8%

Sample

1st row상봉프레미어스엠코
2nd row서강쌍용예가
3rd row청담현대3차
4th row가락대림아파트
5th row신내건영2차아파트
ValueCountFrequency (%)
아파트 119
 
1.1%
래미안 22
 
0.2%
입주자대표회의 15
 
0.1%
경남아너스빌 15
 
0.1%
코오롱하늘채아파트 14
 
0.1%
힐스테이트 14
 
0.1%
신반포 12
 
0.1%
고덕현대 12
 
0.1%
신내5단지대림두산 12
 
0.1%
은평뉴타운상림마을6단지 12
 
0.1%
Other values (2146) 10257
97.6%
2024-05-11T06:59:47.456015image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2224
 
3.1%
2107
 
3.0%
1886
 
2.6%
1877
 
2.6%
1791
 
2.5%
1742
 
2.4%
1548
 
2.2%
1503
 
2.1%
1411
 
2.0%
1361
 
1.9%
Other values (421) 53818
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65367
91.7%
Decimal Number 3812
 
5.3%
Uppercase Letter 693
 
1.0%
Space Separator 548
 
0.8%
Lowercase Letter 313
 
0.4%
Other Punctuation 143
 
0.2%
Dash Punctuation 130
 
0.2%
Open Punctuation 125
 
0.2%
Close Punctuation 125
 
0.2%
Letter Number 7
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2224
 
3.4%
2107
 
3.2%
1886
 
2.9%
1877
 
2.9%
1791
 
2.7%
1742
 
2.7%
1548
 
2.4%
1503
 
2.3%
1411
 
2.2%
1361
 
2.1%
Other values (375) 47917
73.3%
Uppercase Letter
ValueCountFrequency (%)
S 121
17.5%
K 85
12.3%
C 74
10.7%
L 52
7.5%
D 49
7.1%
M 49
7.1%
G 43
 
6.2%
H 42
 
6.1%
I 41
 
5.9%
E 31
 
4.5%
Other values (7) 106
15.3%
Lowercase Letter
ValueCountFrequency (%)
e 179
57.2%
l 34
 
10.9%
i 31
 
9.9%
v 23
 
7.3%
w 12
 
3.8%
k 10
 
3.2%
s 9
 
2.9%
c 8
 
2.6%
h 3
 
1.0%
a 2
 
0.6%
Decimal Number
ValueCountFrequency (%)
1 1208
31.7%
2 1106
29.0%
3 478
 
12.5%
4 261
 
6.8%
5 236
 
6.2%
6 153
 
4.0%
9 98
 
2.6%
7 97
 
2.5%
8 89
 
2.3%
0 86
 
2.3%
Other Punctuation
ValueCountFrequency (%)
, 132
92.3%
. 11
 
7.7%
Space Separator
ValueCountFrequency (%)
548
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 130
100.0%
Open Punctuation
ValueCountFrequency (%)
( 125
100.0%
Close Punctuation
ValueCountFrequency (%)
) 125
100.0%
Letter Number
ValueCountFrequency (%)
7
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65367
91.7%
Common 4888
 
6.9%
Latin 1013
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2224
 
3.4%
2107
 
3.2%
1886
 
2.9%
1877
 
2.9%
1791
 
2.7%
1742
 
2.7%
1548
 
2.4%
1503
 
2.3%
1411
 
2.2%
1361
 
2.1%
Other values (375) 47917
73.3%
Latin
ValueCountFrequency (%)
e 179
17.7%
S 121
11.9%
K 85
 
8.4%
C 74
 
7.3%
L 52
 
5.1%
D 49
 
4.8%
M 49
 
4.8%
G 43
 
4.2%
H 42
 
4.1%
I 41
 
4.0%
Other values (19) 278
27.4%
Common
ValueCountFrequency (%)
1 1208
24.7%
2 1106
22.6%
548
11.2%
3 478
 
9.8%
4 261
 
5.3%
5 236
 
4.8%
6 153
 
3.1%
, 132
 
2.7%
- 130
 
2.7%
( 125
 
2.6%
Other values (7) 511
10.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65367
91.7%
ASCII 5894
 
8.3%
Number Forms 7
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2224
 
3.4%
2107
 
3.2%
1886
 
2.9%
1877
 
2.9%
1791
 
2.7%
1742
 
2.7%
1548
 
2.4%
1503
 
2.3%
1411
 
2.2%
1361
 
2.1%
Other values (375) 47917
73.3%
ASCII
ValueCountFrequency (%)
1 1208
20.5%
2 1106
18.8%
548
 
9.3%
3 478
 
8.1%
4 261
 
4.4%
5 236
 
4.0%
e 179
 
3.0%
6 153
 
2.6%
, 132
 
2.2%
- 130
 
2.2%
Other values (35) 1463
24.8%
Number Forms
ValueCountFrequency (%)
7
100.0%
Distinct2099
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:48.567616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique83 ?
Unique (%)0.8%

Sample

1st rowA13122002
2nd rowA12119006
3rd rowA13510102
4th rowA13880204
5th rowA13185607
ValueCountFrequency (%)
a13184610 12
 
0.1%
a12013003 12
 
0.1%
a15679109 11
 
0.1%
a15886507 11
 
0.1%
a13380104 11
 
0.1%
a15770801 11
 
0.1%
a13285503 11
 
0.1%
a15010502 11
 
0.1%
a13204510 11
 
0.1%
a15284603 11
 
0.1%
Other values (2089) 9888
98.9%
2024-05-11T06:59:50.398942image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18280
20.3%
1 17698
19.7%
A 9995
11.1%
3 8957
10.0%
2 8074
9.0%
5 6172
 
6.9%
8 5790
 
6.4%
7 4813
 
5.3%
4 3861
 
4.3%
6 3382
 
3.8%
Other values (2) 2978
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18280
22.9%
1 17698
22.1%
3 8957
11.2%
2 8074
10.1%
5 6172
 
7.7%
8 5790
 
7.2%
7 4813
 
6.0%
4 3861
 
4.8%
6 3382
 
4.2%
9 2973
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9995
> 99.9%
B 5
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18280
22.9%
1 17698
22.1%
3 8957
11.2%
2 8074
10.1%
5 6172
 
7.7%
8 5790
 
7.2%
7 4813
 
6.0%
4 3861
 
4.8%
6 3382
 
4.2%
9 2973
 
3.7%
Latin
ValueCountFrequency (%)
A 9995
> 99.9%
B 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18280
20.3%
1 17698
19.7%
A 9995
11.1%
3 8957
10.0%
2 8074
9.0%
5 6172
 
6.9%
8 5790
 
6.4%
7 4813
 
5.3%
4 3861
 
4.3%
6 3382
 
3.8%
Other values (2) 2978
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:59:51.393853image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8465
Min length2

Characters and Unicode

Total characters48465
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row잡수익
2nd row수선유지비
3rd row입주자대표회의운영비
4th row사무용품비
5th row상여
ValueCountFrequency (%)
소독비 229
 
2.3%
사무용품비 229
 
2.3%
퇴직급여 228
 
2.3%
청소비 224
 
2.2%
이자수익 221
 
2.2%
소모품비 221
 
2.2%
세대전기료 218
 
2.2%
수선유지비 216
 
2.2%
잡수익 215
 
2.1%
교육비 215
 
2.1%
Other values (77) 7784
77.8%
2024-05-11T06:59:53.052358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5416
 
11.2%
3478
 
7.2%
2060
 
4.3%
2003
 
4.1%
1750
 
3.6%
1279
 
2.6%
1065
 
2.2%
816
 
1.7%
802
 
1.7%
773
 
1.6%
Other values (110) 29023
59.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48465
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5416
 
11.2%
3478
 
7.2%
2060
 
4.3%
2003
 
4.1%
1750
 
3.6%
1279
 
2.6%
1065
 
2.2%
816
 
1.7%
802
 
1.7%
773
 
1.6%
Other values (110) 29023
59.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48465
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5416
 
11.2%
3478
 
7.2%
2060
 
4.3%
2003
 
4.1%
1750
 
3.6%
1279
 
2.6%
1065
 
2.2%
816
 
1.7%
802
 
1.7%
773
 
1.6%
Other values (110) 29023
59.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48465
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5416
 
11.2%
3478
 
7.2%
2060
 
4.3%
2003
 
4.1%
1750
 
3.6%
1279
 
2.6%
1065
 
2.2%
816
 
1.7%
802
 
1.7%
773
 
1.6%
Other values (110) 29023
59.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201905
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201905
2nd row201905
3rd row201905
4th row201905
5th row201905

Common Values

ValueCountFrequency (%)
201905 10000
100.0%

Length

2024-05-11T06:59:53.693609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:59:54.122394image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201905 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6997
Distinct (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2957268.1
Minimum-5559860
Maximum4.1858527 × 108
Zeros1182
Zeros (%)11.8%
Negative14
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:59:54.746130image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-5559860
5-th percentile0
Q178275
median320000
Q31407547.5
95-th percentile14015565
Maximum4.1858527 × 108
Range4.2414513 × 108
Interquartile range (IQR)1329272.5

Descriptive statistics

Standard deviation11323770
Coefficient of variation (CV)3.8291319
Kurtosis369.84126
Mean2957268.1
Median Absolute Deviation (MAD)319260
Skewness14.891137
Sum2.9572681 × 1010
Variance1.2822776 × 1014
MonotonicityNot monotonic
2024-05-11T06:59:55.456727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1182
 
11.8%
200000 97
 
1.0%
100000 66
 
0.7%
300000 57
 
0.6%
150000 53
 
0.5%
250000 40
 
0.4%
500000 39
 
0.4%
30000 33
 
0.3%
400000 32
 
0.3%
50000 31
 
0.3%
Other values (6987) 8370
83.7%
ValueCountFrequency (%)
-5559860 1
< 0.1%
-2039487 1
< 0.1%
-1298000 1
< 0.1%
-384260 1
< 0.1%
-48520 1
< 0.1%
-43670 2
< 0.1%
-21030 1
< 0.1%
-1035 1
< 0.1%
-70 1
< 0.1%
-30 1
< 0.1%
ValueCountFrequency (%)
418585270 1
< 0.1%
343049240 1
< 0.1%
279724884 1
< 0.1%
249849105 1
< 0.1%
199340387 1
< 0.1%
175540305 1
< 0.1%
168550584 1
< 0.1%
167452960 1
< 0.1%
162487455 1
< 0.1%
161978290 1
< 0.1%

Interactions

2024-05-11T06:59:43.781242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:59:55.836414image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.285
금액0.2851.000

Missing values

2024-05-11T06:59:44.270354image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:59:44.704669image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
21359상봉프레미어스엠코A13122002잡수익201905500
11381서강쌍용예가A12119006수선유지비2019051874400
36088청담현대3차A13510102입주자대표회의운영비201905250000
53088가락대림아파트A13880204사무용품비20190520300
22973신내건영2차아파트A13185607상여2019050
34378성내코오롱A13484102입주자대표회의운영비201905200000
95319은평뉴타운박석고개1단지A41279910기타운영수익2019052089000
27453창동주공19단지A13290107소모품비201905428950
9169DMC아이파크A12013002통신비20190555180
50681오금대림A13813008퇴직급여2019051991900
아파트명아파트코드비용명년월일금액
37372수서가람A13523003세대난방비2019057042560
56066중계주공5단지A13922114수도광열비2019054960
64863번동솔그린A14206307소독비201905230000
25265창동건영캐스빌A13204203임대료수익2019050
45546반포자이A13704104피복비2019050
15642신사한신휴플러스A12208103세대수도료2019055458710
24714북한산코오롱하늘채A13203002장기수선비2019053773970
52451장미3차A13872504잡비용2019050
36630세곡리엔파크3단지A13519003연차수당201905644200
33555둔촌현대3차A13470504국민연금201905116160