Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 701 (7.0%) zerosZeros

Reproduction

Analysis started2024-05-11 07:00:10.512689
Analysis finished2024-05-11 07:00:13.083028
Duration2.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2103
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:13.421660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1466
Min length2

Characters and Unicode

Total characters71466
Distinct characters429
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st row천호우성
2nd row마포펜트라우스
3rd row상암월드컵파크7단지
4th row목동6단지
5th row마포강변힐스테이트
ValueCountFrequency (%)
아파트 117
 
1.1%
래미안 27
 
0.3%
코오롱하늘채아파트 18
 
0.2%
신동아파밀리에 17
 
0.2%
힐스테이트 15
 
0.1%
입주자대표회의 14
 
0.1%
서초포레스타2단지아파트 14
 
0.1%
번동기산그린 14
 
0.1%
브라운스톤 12
 
0.1%
묵동브라운스톤태릉 12
 
0.1%
Other values (2159) 10270
97.5%
2024-05-11T07:00:15.283145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2207
 
3.1%
2157
 
3.0%
1935
 
2.7%
1907
 
2.7%
1796
 
2.5%
1698
 
2.4%
1581
 
2.2%
1525
 
2.1%
1440
 
2.0%
1387
 
1.9%
Other values (419) 53833
75.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65533
91.7%
Decimal Number 3878
 
5.4%
Uppercase Letter 638
 
0.9%
Space Separator 575
 
0.8%
Lowercase Letter 347
 
0.5%
Dash Punctuation 132
 
0.2%
Other Punctuation 120
 
0.2%
Open Punctuation 117
 
0.2%
Close Punctuation 117
 
0.2%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2207
 
3.4%
2157
 
3.3%
1935
 
3.0%
1907
 
2.9%
1796
 
2.7%
1698
 
2.6%
1581
 
2.4%
1525
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (373) 47900
73.1%
Uppercase Letter
ValueCountFrequency (%)
S 101
15.8%
C 90
14.1%
K 79
12.4%
M 50
7.8%
D 50
7.8%
L 46
7.2%
H 43
6.7%
I 35
 
5.5%
G 33
 
5.2%
A 29
 
4.5%
Other values (7) 82
12.9%
Lowercase Letter
ValueCountFrequency (%)
e 160
46.1%
l 52
 
15.0%
i 43
 
12.4%
v 34
 
9.8%
s 15
 
4.3%
w 12
 
3.5%
k 10
 
2.9%
h 7
 
2.0%
a 5
 
1.4%
g 5
 
1.4%
Decimal Number
ValueCountFrequency (%)
1 1194
30.8%
2 1152
29.7%
3 479
12.4%
4 262
 
6.8%
5 229
 
5.9%
6 157
 
4.0%
7 109
 
2.8%
9 104
 
2.7%
8 97
 
2.5%
0 95
 
2.4%
Other Punctuation
ValueCountFrequency (%)
, 102
85.0%
. 18
 
15.0%
Space Separator
ValueCountFrequency (%)
575
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 132
100.0%
Open Punctuation
ValueCountFrequency (%)
( 117
100.0%
Close Punctuation
ValueCountFrequency (%)
) 117
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65533
91.7%
Common 4944
 
6.9%
Latin 989
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2207
 
3.4%
2157
 
3.3%
1935
 
3.0%
1907
 
2.9%
1796
 
2.7%
1698
 
2.6%
1581
 
2.4%
1525
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (373) 47900
73.1%
Latin
ValueCountFrequency (%)
e 160
16.2%
S 101
 
10.2%
C 90
 
9.1%
K 79
 
8.0%
l 52
 
5.3%
M 50
 
5.1%
D 50
 
5.1%
L 46
 
4.7%
i 43
 
4.3%
H 43
 
4.3%
Other values (19) 275
27.8%
Common
ValueCountFrequency (%)
1 1194
24.2%
2 1152
23.3%
575
11.6%
3 479
9.7%
4 262
 
5.3%
5 229
 
4.6%
6 157
 
3.2%
- 132
 
2.7%
( 117
 
2.4%
) 117
 
2.4%
Other values (7) 530
10.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65533
91.7%
ASCII 5929
 
8.3%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2207
 
3.4%
2157
 
3.3%
1935
 
3.0%
1907
 
2.9%
1796
 
2.7%
1698
 
2.6%
1581
 
2.4%
1525
 
2.3%
1440
 
2.2%
1387
 
2.1%
Other values (373) 47900
73.1%
ASCII
ValueCountFrequency (%)
1 1194
20.1%
2 1152
19.4%
575
 
9.7%
3 479
 
8.1%
4 262
 
4.4%
5 229
 
3.9%
e 160
 
2.7%
6 157
 
2.6%
- 132
 
2.2%
( 117
 
2.0%
Other values (35) 1472
24.8%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2109
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:16.251018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st rowA13402103
2nd rowA12179004
3rd rowA12127005
4th rowA15875103
5th rowA12112002
ValueCountFrequency (%)
a14206305 14
 
0.1%
a10028021 14
 
0.1%
a13185508 12
 
0.1%
a13790703 12
 
0.1%
a13611005 11
 
0.1%
a15703304 11
 
0.1%
a15009402 11
 
0.1%
a12281701 11
 
0.1%
a13872504 11
 
0.1%
a13613005 11
 
0.1%
Other values (2099) 9882
98.8%
2024-05-11T07:00:17.748611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18412
20.5%
1 17479
19.4%
A 9994
11.1%
3 9038
10.0%
2 7921
8.8%
5 6193
 
6.9%
8 5849
 
6.5%
7 4888
 
5.4%
4 3877
 
4.3%
6 3351
 
3.7%
Other values (2) 2998
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18412
23.0%
1 17479
21.8%
3 9038
11.3%
2 7921
9.9%
5 6193
 
7.7%
8 5849
 
7.3%
7 4888
 
6.1%
4 3877
 
4.8%
6 3351
 
4.2%
9 2992
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18412
23.0%
1 17479
21.8%
3 9038
11.3%
2 7921
9.9%
5 6193
 
7.7%
8 5849
 
7.3%
7 4888
 
6.1%
4 3877
 
4.8%
6 3351
 
4.2%
9 2992
 
3.7%
Latin
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18412
20.5%
1 17479
19.4%
A 9994
11.1%
3 9038
10.0%
2 7921
8.8%
5 6193
 
6.9%
8 5849
 
6.5%
7 4888
 
5.4%
4 3877
 
4.3%
6 3351
 
3.7%
Other values (2) 2998
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:18.491106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.809
Min length2

Characters and Unicode

Total characters48090
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row소독비
2nd row기타부대비
3rd row수도광열비
4th row광고료수익
5th row급여
ValueCountFrequency (%)
급여 248
 
2.5%
통신비 235
 
2.4%
경비비 231
 
2.3%
세대전기료 225
 
2.2%
사무용품비 224
 
2.2%
청소비 224
 
2.2%
연체료수익 223
 
2.2%
소독비 223
 
2.2%
승강기유지비 222
 
2.2%
교육비 220
 
2.2%
Other values (77) 7725
77.2%
2024-05-11T07:00:20.119133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5371
 
11.2%
3674
 
7.6%
2186
 
4.5%
2012
 
4.2%
1703
 
3.5%
1326
 
2.8%
1102
 
2.3%
827
 
1.7%
795
 
1.7%
775
 
1.6%
Other values (110) 28319
58.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48090
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5371
 
11.2%
3674
 
7.6%
2186
 
4.5%
2012
 
4.2%
1703
 
3.5%
1326
 
2.8%
1102
 
2.3%
827
 
1.7%
795
 
1.7%
775
 
1.6%
Other values (110) 28319
58.9%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48090
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5371
 
11.2%
3674
 
7.6%
2186
 
4.5%
2012
 
4.2%
1703
 
3.5%
1326
 
2.8%
1102
 
2.3%
827
 
1.7%
795
 
1.7%
775
 
1.6%
Other values (110) 28319
58.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48090
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5371
 
11.2%
3674
 
7.6%
2186
 
4.5%
2012
 
4.2%
1703
 
3.5%
1326
 
2.8%
1102
 
2.3%
827
 
1.7%
795
 
1.7%
775
 
1.6%
Other values (110) 28319
58.9%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201903
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201903
2nd row201903
3rd row201903
4th row201903
5th row201903

Common Values

ValueCountFrequency (%)
201903 10000
100.0%

Length

2024-05-11T07:00:20.691941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T07:00:21.140552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201903 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7371
Distinct (%)73.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3227028.7
Minimum-4463302
Maximum3.2755867 × 108
Zeros701
Zeros (%)7.0%
Negative12
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T07:00:21.782312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-4463302
5-th percentile0
Q196355
median357470
Q31650000
95-th percentile15559099
Maximum3.2755867 × 108
Range3.3202197 × 108
Interquartile range (IQR)1553645

Descriptive statistics

Standard deviation10464269
Coefficient of variation (CV)3.2426947
Kurtosis181.03887
Mean3227028.7
Median Absolute Deviation (MAD)340370
Skewness10.126246
Sum3.2270287 × 1010
Variance1.0950093 × 1014
MonotonicityNot monotonic
2024-05-11T07:00:22.462825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 701
 
7.0%
200000 119
 
1.2%
38000 78
 
0.8%
300000 69
 
0.7%
100000 56
 
0.6%
150000 45
 
0.4%
500000 37
 
0.4%
400000 35
 
0.4%
250000 35
 
0.4%
110000 33
 
0.3%
Other values (7361) 8792
87.9%
ValueCountFrequency (%)
-4463302 1
< 0.1%
-2268660 1
< 0.1%
-1725000 1
< 0.1%
-640910 1
< 0.1%
-563592 1
< 0.1%
-85670 1
< 0.1%
-55000 1
< 0.1%
-50000 1
< 0.1%
-6556 1
< 0.1%
-5214 1
< 0.1%
ValueCountFrequency (%)
327558671 1
< 0.1%
226970256 1
< 0.1%
221971936 1
< 0.1%
193147800 1
< 0.1%
160316550 1
< 0.1%
143811500 1
< 0.1%
133713850 1
< 0.1%
133320920 1
< 0.1%
132999760 1
< 0.1%
125007175 1
< 0.1%

Interactions

2024-05-11T07:00:11.972202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T07:00:22.868824image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.394
금액0.3941.000

Missing values

2024-05-11T07:00:12.431617image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T07:00:12.882940image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
30711천호우성A13402103소독비201903350800
12214마포펜트라우스A12179004기타부대비20190337800
11332상암월드컵파크7단지A12127005수도광열비2019030
91070목동6단지A15875103광고료수익201903600000
10691마포강변힐스테이트A12112002급여20190313674430
21895묵동세방A13185501고용안정사업비용201903500000
10049홍제태영으뜸A12086001광고료수익20190320000
80101상도엠코타운애스톤파크A15603008퇴직급여2019032040000
81525사당제일A15609501잡수익20190355000
30726천호우성A13402103교통비20190364540
아파트명아파트코드비용명년월일금액
59111중계현대6차A13985406기타운영수익2019030
14683북한산힐스테이트3차A12204004고용안정사업수익2019032821930
9375연희자이엘라A12071104소모품비20190333000
36715우성캐릭터199 아파트A13527003복리후생비201903359500
20479면목금호어울림A13120704건강보험료201903148970
70676신길우성2차A15086007기타인건비2019030
27325마장세림A13305007충당부채전입이자비용2019030
68522문래현대2차A15009607도서인쇄비201903127000
67490당산삼성래미안A15004507사무용품비201903857860
36756도곡삼성A13527004잡비용201903350000