Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text2
Categorical2
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15822/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1094 (10.9%) zerosZeros

Reproduction

Analysis started2024-05-11 05:47:25.311478
Analysis finished2024-05-11 05:47:26.286602
Duration0.98 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2098
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:26.523322image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1492
Min length2

Characters and Unicode

Total characters71492
Distinct characters429
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique81 ?
Unique (%)0.8%

Sample

1st row묵동신안3차
2nd row등촌우성102동
3rd row은평뉴타운박석고개제12단지아파트
4th row신도림대림3차
5th row염창롯데캐슬
ValueCountFrequency (%)
아파트 94
 
0.9%
래미안 31
 
0.3%
송파 13
 
0.1%
잠원신화 13
 
0.1%
광장힐스테이트 12
 
0.1%
신길우성2차 12
 
0.1%
응암경남 12
 
0.1%
dmc자이1단지 12
 
0.1%
암사선사현대 12
 
0.1%
신내 12
 
0.1%
Other values (2152) 10282
97.9%
2024-05-11T14:47:27.045043image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2243
 
3.1%
2131
 
3.0%
1883
 
2.6%
1842
 
2.6%
1838
 
2.6%
1730
 
2.4%
1538
 
2.2%
1529
 
2.1%
1474
 
2.1%
1353
 
1.9%
Other values (419) 53931
75.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65497
91.6%
Decimal Number 3876
 
5.4%
Uppercase Letter 697
 
1.0%
Space Separator 553
 
0.8%
Lowercase Letter 308
 
0.4%
Open Punctuation 148
 
0.2%
Close Punctuation 148
 
0.2%
Dash Punctuation 133
 
0.2%
Other Punctuation 120
 
0.2%
Math Symbol 6
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2243
 
3.4%
2131
 
3.3%
1883
 
2.9%
1842
 
2.8%
1838
 
2.8%
1730
 
2.6%
1538
 
2.3%
1529
 
2.3%
1474
 
2.3%
1353
 
2.1%
Other values (373) 47936
73.2%
Uppercase Letter
ValueCountFrequency (%)
S 122
17.5%
K 88
12.6%
C 80
11.5%
L 64
9.2%
H 53
7.6%
M 44
 
6.3%
D 44
 
6.3%
G 42
 
6.0%
I 38
 
5.5%
E 33
 
4.7%
Other values (7) 89
12.8%
Lowercase Letter
ValueCountFrequency (%)
e 171
55.5%
l 40
 
13.0%
i 30
 
9.7%
v 22
 
7.1%
s 10
 
3.2%
c 8
 
2.6%
h 8
 
2.6%
w 7
 
2.3%
k 6
 
1.9%
a 3
 
1.0%
Decimal Number
ValueCountFrequency (%)
1 1221
31.5%
2 1129
29.1%
3 499
12.9%
4 262
 
6.8%
5 195
 
5.0%
6 154
 
4.0%
7 113
 
2.9%
8 107
 
2.8%
0 98
 
2.5%
9 98
 
2.5%
Other Punctuation
ValueCountFrequency (%)
, 99
82.5%
. 21
 
17.5%
Space Separator
ValueCountFrequency (%)
553
100.0%
Open Punctuation
ValueCountFrequency (%)
( 148
100.0%
Close Punctuation
ValueCountFrequency (%)
) 148
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 133
100.0%
Math Symbol
ValueCountFrequency (%)
~ 6
100.0%
Letter Number
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65497
91.6%
Common 4984
 
7.0%
Latin 1011
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2243
 
3.4%
2131
 
3.3%
1883
 
2.9%
1842
 
2.8%
1838
 
2.8%
1730
 
2.6%
1538
 
2.3%
1529
 
2.3%
1474
 
2.3%
1353
 
2.1%
Other values (373) 47936
73.2%
Latin
ValueCountFrequency (%)
e 171
16.9%
S 122
12.1%
K 88
 
8.7%
C 80
 
7.9%
L 64
 
6.3%
H 53
 
5.2%
M 44
 
4.4%
D 44
 
4.4%
G 42
 
4.2%
l 40
 
4.0%
Other values (19) 263
26.0%
Common
ValueCountFrequency (%)
1 1221
24.5%
2 1129
22.7%
553
11.1%
3 499
10.0%
4 262
 
5.3%
5 195
 
3.9%
6 154
 
3.1%
( 148
 
3.0%
) 148
 
3.0%
- 133
 
2.7%
Other values (7) 542
10.9%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65497
91.6%
ASCII 5989
 
8.4%
Number Forms 6
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2243
 
3.4%
2131
 
3.3%
1883
 
2.9%
1842
 
2.8%
1838
 
2.8%
1730
 
2.6%
1538
 
2.3%
1529
 
2.3%
1474
 
2.3%
1353
 
2.1%
Other values (373) 47936
73.2%
ASCII
ValueCountFrequency (%)
1 1221
20.4%
2 1129
18.9%
553
 
9.2%
3 499
 
8.3%
4 262
 
4.4%
5 195
 
3.3%
e 171
 
2.9%
6 154
 
2.6%
( 148
 
2.5%
) 148
 
2.5%
Other values (35) 1509
25.2%
Number Forms
ValueCountFrequency (%)
6
100.0%
Distinct2105
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:27.512376image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique82 ?
Unique (%)0.8%

Sample

1st rowA13114106
2nd rowA15772902
3rd rowA41279911
4th rowA15288802
5th rowA15704015
ValueCountFrequency (%)
a13790703 13
 
0.1%
a15086007 12
 
0.1%
a14005001 12
 
0.1%
a12201301 12
 
0.1%
a15010306 12
 
0.1%
a13920205 12
 
0.1%
a14375301 12
 
0.1%
a13405201 12
 
0.1%
a12275501 12
 
0.1%
a15083701 12
 
0.1%
Other values (2095) 9879
98.8%
2024-05-11T14:47:28.267725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18197
20.2%
1 17560
19.5%
A 9988
11.1%
3 8883
9.9%
2 8178
9.1%
5 6270
 
7.0%
8 5784
 
6.4%
7 4839
 
5.4%
4 3835
 
4.3%
6 3422
 
3.8%
Other values (2) 3044
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18197
22.7%
1 17560
21.9%
3 8883
11.1%
2 8178
10.2%
5 6270
 
7.8%
8 5784
 
7.2%
7 4839
 
6.0%
4 3835
 
4.8%
6 3422
 
4.3%
9 3032
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9988
99.9%
B 12
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18197
22.7%
1 17560
21.9%
3 8883
11.1%
2 8178
10.2%
5 6270
 
7.8%
8 5784
 
7.2%
7 4839
 
6.0%
4 3835
 
4.8%
6 3422
 
4.3%
9 3032
 
3.8%
Latin
ValueCountFrequency (%)
A 9988
99.9%
B 12
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18197
20.2%
1 17560
19.5%
A 9988
11.1%
3 8883
9.9%
2 8178
9.1%
5 6270
 
7.0%
8 5784
 
6.4%
7 4839
 
5.4%
4 3835
 
4.3%
6 3422
 
3.8%
Other values (2) 3044
 
3.4%

비용명
Categorical

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
통신비
 
470
교육비
 
456
급여
 
455
제수당
 
441
퇴직급여
 
440
Other values (39)
7738 

Length

Max length7
Median length5
Mean length4.2933
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row업무추진비
2nd row소모품비
3rd row식대
4th row세대수도료
5th row지급수수료

Common Values

ValueCountFrequency (%)
통신비 470
 
4.7%
교육비 456
 
4.6%
급여 455
 
4.5%
제수당 441
 
4.4%
퇴직급여 440
 
4.4%
사무용품비 438
 
4.4%
세대전기료 436
 
4.4%
산재보험료 419
 
4.2%
복리후생비 418
 
4.2%
소모품비 408
 
4.1%
Other values (34) 5619
56.2%

Length

2024-05-11T14:47:28.506289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
통신비 470
 
4.7%
교육비 456
 
4.6%
급여 455
 
4.5%
제수당 441
 
4.4%
퇴직급여 440
 
4.4%
사무용품비 438
 
4.4%
세대전기료 436
 
4.4%
산재보험료 419
 
4.2%
복리후생비 418
 
4.2%
소모품비 408
 
4.1%
Other values (34) 5619
56.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201905
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201905
2nd row201905
3rd row201905
4th row201905
5th row201905

Common Values

ValueCountFrequency (%)
201905 10000
100.0%

Length

2024-05-11T14:47:28.675105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:47:28.819337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201905 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7378
Distinct (%)73.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2790094.2
Minimum-1538510
Maximum1.7469711 × 108
Zeros1094
Zeros (%)10.9%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T14:47:28.981596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-1538510
5-th percentile0
Q166790
median227430
Q31053542.5
95-th percentile15021018
Maximum1.7469711 × 108
Range1.7623562 × 108
Interquartile range (IQR)986752.5

Descriptive statistics

Standard deviation8620419.1
Coefficient of variation (CV)3.0896516
Kurtosis100.65088
Mean2790094.2
Median Absolute Deviation (MAD)221055
Skewness7.9549078
Sum2.7900942 × 1010
Variance7.4311625 × 1013
MonotonicityNot monotonic
2024-05-11T14:47:29.211225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1094
 
10.9%
200000 126
 
1.3%
300000 59
 
0.6%
100000 59
 
0.6%
110000 44
 
0.4%
30000 39
 
0.4%
150000 28
 
0.3%
400000 25
 
0.2%
600000 25
 
0.2%
500000 23
 
0.2%
Other values (7368) 8478
84.8%
ValueCountFrequency (%)
-1538510 1
 
< 0.1%
-852400 1
 
< 0.1%
-229350 1
 
< 0.1%
-79270 1
 
< 0.1%
-49720 1
 
< 0.1%
-43670 2
 
< 0.1%
-5000 1
 
< 0.1%
0 1094
10.9%
380 1
 
< 0.1%
400 1
 
< 0.1%
ValueCountFrequency (%)
174697110 1
< 0.1%
168550584 1
< 0.1%
161978290 1
< 0.1%
161745530 1
< 0.1%
139898460 1
< 0.1%
136750370 1
< 0.1%
132069176 1
< 0.1%
131251920 1
< 0.1%
131159600 1
< 0.1%
113444460 1
< 0.1%

Interactions

2024-05-11T14:47:25.856533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:47:29.412832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.470
금액0.4701.000
2024-05-11T14:47:29.558899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
금액비용명
금액1.0000.181
비용명0.1811.000

Missing values

2024-05-11T14:47:26.055102image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:47:26.217501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
9743묵동신안3차A13114106업무추진비201905150000
41601등촌우성102동A15772902소모품비2019050
44969은평뉴타운박석고개제12단지아파트A41279911식대201905723300
37621신도림대림3차A15288802세대수도료2019054156800
40728염창롯데캐슬A15704015지급수수료2019050
10755신내우남푸르미아A13186502감가상각비201905121800
27481불암현대A13981208교통비2019056500
41111방화한진로즈힐A15722005사무용품비2019050
4733디엠씨한양A12081703통신비201905121340
37463구로현대연예인A15286807제수당2019053296410
아파트명아파트코드비용명년월일금액
26164진로유통조합대림A13922002소모품비20190512500
24308잠실아시아선수촌A13822701기타부대비2019051020370
13640옥수하이츠제2A13375904세대수도료201905541410
38338현대공무원A15384101도서인쇄비20190577000
44365양천롯데캐슬A15883202사무용품비20190539810
38586신대방신동아A15601201통신비20190535770
10604묵동브라운스톤태릉A13185508사무용품비20190544000
21324방배아크로리버A13706001사무용품비20190564290
36515오류금강수목원A15210211급여20190516606530
21110정릉푸른마을동아A13684605기타사용료2019050