Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1123 (11.2%) zerosZeros

Reproduction

Analysis started2024-05-11 06:59:59.088401
Analysis finished2024-05-11 07:00:01.028654
Duration1.94 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2098
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:01.420356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1213
Min length2

Characters and Unicode

Total characters71213
Distinct characters428
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique115 ?
Unique (%)1.1%

Sample

1st row서초더샵포레
2nd row성수현대
3rd row은평뉴타운구파발10단지1관리
4th row신내동성7차
5th row마포동원베네스트
ValueCountFrequency (%)
아파트 122
 
1.2%
래미안 21
 
0.2%
코오롱하늘채아파트 19
 
0.2%
신동아파밀리에 18
 
0.2%
왕십리 17
 
0.2%
은평뉴타운상림마을6단지 16
 
0.2%
신길우성2차 15
 
0.1%
신내 14
 
0.1%
입주자대표회의 13
 
0.1%
우리유앤미 13
 
0.1%
Other values (2154) 10245
97.5%
2024-05-11T07:00:02.306292image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2247
 
3.2%
2159
 
3.0%
1975
 
2.8%
1902
 
2.7%
1808
 
2.5%
1621
 
2.3%
1558
 
2.2%
1539
 
2.2%
1459
 
2.0%
1368
 
1.9%
Other values (418) 53577
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65452
91.9%
Decimal Number 3868
 
5.4%
Uppercase Letter 598
 
0.8%
Space Separator 556
 
0.8%
Lowercase Letter 251
 
0.4%
Other Punctuation 126
 
0.2%
Open Punctuation 121
 
0.2%
Close Punctuation 121
 
0.2%
Dash Punctuation 109
 
0.2%
Letter Number 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2247
 
3.4%
2159
 
3.3%
1975
 
3.0%
1902
 
2.9%
1808
 
2.8%
1621
 
2.5%
1558
 
2.4%
1539
 
2.4%
1459
 
2.2%
1368
 
2.1%
Other values (372) 47816
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 108
18.1%
K 90
15.1%
C 77
12.9%
L 48
8.0%
E 38
 
6.4%
M 37
 
6.2%
D 37
 
6.2%
G 33
 
5.5%
I 31
 
5.2%
H 27
 
4.5%
Other values (7) 72
12.0%
Lowercase Letter
ValueCountFrequency (%)
e 157
62.5%
i 24
 
9.6%
l 22
 
8.8%
v 16
 
6.4%
w 11
 
4.4%
s 7
 
2.8%
k 6
 
2.4%
c 2
 
0.8%
a 2
 
0.8%
g 2
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 1180
30.5%
2 1107
28.6%
3 484
12.5%
4 294
 
7.6%
5 214
 
5.5%
6 183
 
4.7%
9 121
 
3.1%
7 104
 
2.7%
8 94
 
2.4%
0 87
 
2.2%
Other Punctuation
ValueCountFrequency (%)
, 111
88.1%
. 15
 
11.9%
Space Separator
ValueCountFrequency (%)
556
100.0%
Open Punctuation
ValueCountFrequency (%)
( 121
100.0%
Close Punctuation
ValueCountFrequency (%)
) 121
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 109
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65452
91.9%
Common 4906
 
6.9%
Latin 855
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2247
 
3.4%
2159
 
3.3%
1975
 
3.0%
1902
 
2.9%
1808
 
2.8%
1621
 
2.5%
1558
 
2.4%
1539
 
2.4%
1459
 
2.2%
1368
 
2.1%
Other values (372) 47816
73.1%
Latin
ValueCountFrequency (%)
e 157
18.4%
S 108
12.6%
K 90
10.5%
C 77
 
9.0%
L 48
 
5.6%
E 38
 
4.4%
M 37
 
4.3%
D 37
 
4.3%
G 33
 
3.9%
I 31
 
3.6%
Other values (19) 199
23.3%
Common
ValueCountFrequency (%)
1 1180
24.1%
2 1107
22.6%
556
11.3%
3 484
9.9%
4 294
 
6.0%
5 214
 
4.4%
6 183
 
3.7%
9 121
 
2.5%
( 121
 
2.5%
) 121
 
2.5%
Other values (7) 525
10.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65452
91.9%
ASCII 5755
 
8.1%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2247
 
3.4%
2159
 
3.3%
1975
 
3.0%
1902
 
2.9%
1808
 
2.8%
1621
 
2.5%
1558
 
2.4%
1539
 
2.4%
1459
 
2.2%
1368
 
2.1%
Other values (372) 47816
73.1%
ASCII
ValueCountFrequency (%)
1 1180
20.5%
2 1107
19.2%
556
9.7%
3 484
 
8.4%
4 294
 
5.1%
5 214
 
3.7%
6 183
 
3.2%
e 157
 
2.7%
9 121
 
2.1%
( 121
 
2.1%
Other values (35) 1338
23.2%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2103
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:03.052514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique115 ?
Unique (%)1.1%

Sample

1st rowA13718001
2nd rowA13382502
3rd rowA41279927
4th rowA13113001
5th rowA12170401
ValueCountFrequency (%)
a15086007 15
 
0.1%
a14086001 12
 
0.1%
a12174601 12
 
0.1%
a13983815 12
 
0.1%
a15601202 12
 
0.1%
a15083701 12
 
0.1%
a12201301 12
 
0.1%
a12127003 12
 
0.1%
a15205405 11
 
0.1%
a15807706 11
 
0.1%
Other values (2093) 9879
98.8%
2024-05-11T07:00:04.230332image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18297
20.3%
1 17632
19.6%
A 9994
11.1%
3 8803
9.8%
2 8069
9.0%
5 6311
 
7.0%
8 5775
 
6.4%
7 4867
 
5.4%
4 3699
 
4.1%
6 3553
 
3.9%
Other values (2) 3000
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18297
22.9%
1 17632
22.0%
3 8803
11.0%
2 8069
10.1%
5 6311
 
7.9%
8 5775
 
7.2%
7 4867
 
6.1%
4 3699
 
4.6%
6 3553
 
4.4%
9 2994
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18297
22.9%
1 17632
22.0%
3 8803
11.0%
2 8069
10.1%
5 6311
 
7.9%
8 5775
 
7.2%
7 4867
 
6.1%
4 3699
 
4.6%
6 3553
 
4.4%
9 2994
 
3.7%
Latin
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18297
20.3%
1 17632
19.6%
A 9994
11.1%
3 8803
9.8%
2 8069
9.0%
5 6311
 
7.0%
8 5775
 
6.4%
7 4867
 
5.4%
4 3699
 
4.1%
6 3553
 
3.9%
Other values (2) 3000
 
3.3%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:04.840529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8446
Min length2

Characters and Unicode

Total characters48446
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row부과차손
2nd row임대료수익
3rd row연차수당
4th row장기수선비
5th row고용안정사업비용
ValueCountFrequency (%)
교육비 231
 
2.3%
통신비 230
 
2.3%
세대전기료 227
 
2.3%
퇴직급여 223
 
2.2%
소독비 221
 
2.2%
장기수선비 213
 
2.1%
청소비 211
 
2.1%
소모품비 210
 
2.1%
경비비 209
 
2.1%
이자수익 207
 
2.1%
Other values (76) 7818
78.2%
2024-05-11T07:00:05.735091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5431
 
11.2%
3617
 
7.5%
2117
 
4.4%
1967
 
4.1%
1703
 
3.5%
1361
 
2.8%
1050
 
2.2%
851
 
1.8%
798
 
1.6%
757
 
1.6%
Other values (110) 28794
59.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48446
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5431
 
11.2%
3617
 
7.5%
2117
 
4.4%
1967
 
4.1%
1703
 
3.5%
1361
 
2.8%
1050
 
2.2%
851
 
1.8%
798
 
1.6%
757
 
1.6%
Other values (110) 28794
59.4%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48446
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5431
 
11.2%
3617
 
7.5%
2117
 
4.4%
1967
 
4.1%
1703
 
3.5%
1361
 
2.8%
1050
 
2.2%
851
 
1.8%
798
 
1.6%
757
 
1.6%
Other values (110) 28794
59.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48446
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5431
 
11.2%
3617
 
7.5%
2117
 
4.4%
1967
 
4.1%
1703
 
3.5%
1361
 
2.8%
1050
 
2.2%
851
 
1.8%
798
 
1.6%
757
 
1.6%
Other values (110) 28794
59.4%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201904
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201904
2nd row201904
3rd row201904
4th row201904
5th row201904

Common Values

ValueCountFrequency (%)
201904 10000
100.0%

Length

2024-05-11T07:00:06.225739image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T07:00:06.533673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201904 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7059
Distinct (%)70.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3259841
Minimum-2341030
Maximum4.2105318 × 108
Zeros1123
Zeros (%)11.2%
Negative13
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T07:00:06.913011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2341030
5-th percentile0
Q185652.5
median329580
Q31518840
95-th percentile15611872
Maximum4.2105318 × 108
Range4.2339421 × 108
Interquartile range (IQR)1433187.5

Descriptive statistics

Standard deviation12142479
Coefficient of variation (CV)3.7248685
Kurtosis277.52887
Mean3259841
Median Absolute Deviation (MAD)328054
Skewness13.0065
Sum3.259841 × 1010
Variance1.474398 × 1014
MonotonicityNot monotonic
2024-05-11T07:00:07.488060image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1123
 
11.2%
200000 99
 
1.0%
100000 69
 
0.7%
150000 60
 
0.6%
300000 51
 
0.5%
500000 45
 
0.4%
30000 44
 
0.4%
110000 31
 
0.3%
250000 30
 
0.3%
50000 30
 
0.3%
Other values (7049) 8418
84.2%
ValueCountFrequency (%)
-2341030 1
< 0.1%
-1725000 1
< 0.1%
-1059910 1
< 0.1%
-1041315 1
< 0.1%
-624300 1
< 0.1%
-164980 1
< 0.1%
-95450 1
< 0.1%
-58224 1
< 0.1%
-38000 1
< 0.1%
-7030 1
< 0.1%
ValueCountFrequency (%)
421053180 1
< 0.1%
278984274 1
< 0.1%
258817380 1
< 0.1%
247531574 1
< 0.1%
243841075 1
< 0.1%
243441075 1
< 0.1%
226143722 1
< 0.1%
207617660 1
< 0.1%
162487455 1
< 0.1%
154683140 1
< 0.1%

Interactions

2024-05-11T07:00:00.098934image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T07:00:07.780281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.333
금액0.3331.000

Missing values

2024-05-11T07:00:00.445683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T07:00:00.864322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
45528서초더샵포레A13718001부과차손20190410693
30290성수현대A13382502임대료수익2019040
94505은평뉴타운구파발10단지1관리A41279927연차수당201904395600
20242신내동성7차A13113001장기수선비2019042730300
11733마포동원베네스트A12170401고용안정사업비용2019041370000
29253행당한진타운A13377703세대전기료20190476802410
84647등촌주공2단지A15703304감가상각비201904110380
92241신정푸른마을1단지임대A15879501잡비용201904527910
77200고척동아한신A15283706승강기유지비201904554400
13866래미안용강아파트A12187602재활용품비용201904210000
아파트명아파트코드비용명년월일금액
18523전농동아임대A13071301교육비20190432000
3073금천롯데캐슬골드파크1차아파트A10027188감가상각비201904678620
67132자양한양A14387605청소비2019049513500
23257신내새한아파트A13187406퇴직급여2019041543560
39732역삼래미안A13592706이자수익2019040
86954등촌우성101동A15772901제수당2019041465000
31028강변건영A13392307고용안정사업수익2019041421930
57548상계성림(미라보)A13980903소독비201904165000
43514삼선푸르지오아파트A13672101위탁관리수수료201904696100
24050방학삼익세라믹A13202308재활용품수익201904216364