Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 677 (6.8%) zerosZeros

Reproduction

Analysis started2024-05-11 06:54:56.324000
Analysis finished2024-05-11 06:54:58.065544
Duration1.74 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2229
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:58.280974image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length19
Mean length7.2994
Min length2

Characters and Unicode

Total characters72994
Distinct characters435
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique162 ?
Unique (%)1.6%

Sample

1st row마곡엠밸리14단지
2nd row한강타운
3rd row은평뉴타운제각말5단지제1
4th row양재우성
5th row여의대우트럼프월드2차
ValueCountFrequency (%)
아파트 163
 
1.5%
아이파크 26
 
0.2%
e편한세상 23
 
0.2%
래미안 22
 
0.2%
해모로 20
 
0.2%
브라운스톤 16
 
0.1%
래미안밤섬리베뉴 16
 
0.1%
이편한세상 15
 
0.1%
휘경 15
 
0.1%
보라매 15
 
0.1%
Other values (2295) 10361
96.9%
2024-05-11T06:54:59.140921image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2433
 
3.3%
2367
 
3.2%
2228
 
3.1%
1840
 
2.5%
1793
 
2.5%
1667
 
2.3%
1541
 
2.1%
1475
 
2.0%
1406
 
1.9%
1323
 
1.8%
Other values (425) 54921
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66840
91.6%
Decimal Number 3648
 
5.0%
Uppercase Letter 801
 
1.1%
Space Separator 787
 
1.1%
Lowercase Letter 340
 
0.5%
Open Punctuation 161
 
0.2%
Close Punctuation 161
 
0.2%
Dash Punctuation 143
 
0.2%
Other Punctuation 102
 
0.1%
Letter Number 11
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2433
 
3.6%
2367
 
3.5%
2228
 
3.3%
1840
 
2.8%
1793
 
2.7%
1667
 
2.5%
1541
 
2.3%
1475
 
2.2%
1406
 
2.1%
1323
 
2.0%
Other values (380) 48767
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 129
16.1%
C 108
13.5%
K 90
11.2%
D 82
10.2%
M 82
10.2%
L 67
8.4%
H 52
6.5%
I 41
 
5.1%
G 31
 
3.9%
E 28
 
3.5%
Other values (7) 91
11.4%
Lowercase Letter
ValueCountFrequency (%)
e 198
58.2%
l 30
 
8.8%
i 29
 
8.5%
v 19
 
5.6%
s 19
 
5.6%
k 15
 
4.4%
w 8
 
2.4%
h 6
 
1.8%
g 6
 
1.8%
a 6
 
1.8%
Decimal Number
ValueCountFrequency (%)
1 1069
29.3%
2 1058
29.0%
3 514
14.1%
4 284
 
7.8%
5 208
 
5.7%
6 168
 
4.6%
7 101
 
2.8%
9 99
 
2.7%
8 77
 
2.1%
0 70
 
1.9%
Other Punctuation
ValueCountFrequency (%)
, 80
78.4%
. 22
 
21.6%
Space Separator
ValueCountFrequency (%)
787
100.0%
Open Punctuation
ValueCountFrequency (%)
( 161
100.0%
Close Punctuation
ValueCountFrequency (%)
) 161
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 143
100.0%
Letter Number
ValueCountFrequency (%)
11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66840
91.6%
Common 5002
 
6.9%
Latin 1152
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2433
 
3.6%
2367
 
3.5%
2228
 
3.3%
1840
 
2.8%
1793
 
2.7%
1667
 
2.5%
1541
 
2.3%
1475
 
2.2%
1406
 
2.1%
1323
 
2.0%
Other values (380) 48767
73.0%
Latin
ValueCountFrequency (%)
e 198
17.2%
S 129
11.2%
C 108
 
9.4%
K 90
 
7.8%
D 82
 
7.1%
M 82
 
7.1%
L 67
 
5.8%
H 52
 
4.5%
I 41
 
3.6%
G 31
 
2.7%
Other values (19) 272
23.6%
Common
ValueCountFrequency (%)
1 1069
21.4%
2 1058
21.2%
787
15.7%
3 514
10.3%
4 284
 
5.7%
5 208
 
4.2%
6 168
 
3.4%
( 161
 
3.2%
) 161
 
3.2%
- 143
 
2.9%
Other values (6) 449
9.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66840
91.6%
ASCII 6143
 
8.4%
Number Forms 11
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2433
 
3.6%
2367
 
3.5%
2228
 
3.3%
1840
 
2.8%
1793
 
2.7%
1667
 
2.5%
1541
 
2.3%
1475
 
2.2%
1406
 
2.1%
1323
 
2.0%
Other values (380) 48767
73.0%
ASCII
ValueCountFrequency (%)
1 1069
17.4%
2 1058
17.2%
787
12.8%
3 514
 
8.4%
4 284
 
4.6%
5 208
 
3.4%
e 198
 
3.2%
6 168
 
2.7%
( 161
 
2.6%
) 161
 
2.6%
Other values (34) 1535
25.0%
Number Forms
ValueCountFrequency (%)
11
100.0%
Distinct2236
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:59.803163image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique162 ?
Unique (%)1.6%

Sample

1st rowA15721010
2nd rowA14004001
3rd rowA41279923
4th rowA13789203
5th rowA15089307
ValueCountFrequency (%)
a15086601 14
 
0.1%
a10025115 12
 
0.1%
a12071101 12
 
0.1%
a15807606 12
 
0.1%
a13980010 11
 
0.1%
a10028177 11
 
0.1%
a13984004 11
 
0.1%
a13983709 11
 
0.1%
a15208003 11
 
0.1%
a13813010 11
 
0.1%
Other values (2226) 9884
98.8%
2024-05-11T06:55:01.017828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18565
20.6%
1 17543
19.5%
A 9993
11.1%
3 8697
9.7%
2 8327
9.3%
5 6351
 
7.1%
8 5641
 
6.3%
7 4805
 
5.3%
4 3833
 
4.3%
6 3320
 
3.7%
Other values (2) 2925
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18565
23.2%
1 17543
21.9%
3 8697
10.9%
2 8327
10.4%
5 6351
 
7.9%
8 5641
 
7.1%
7 4805
 
6.0%
4 3833
 
4.8%
6 3320
 
4.2%
9 2918
 
3.6%
Uppercase Letter
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18565
23.2%
1 17543
21.9%
3 8697
10.9%
2 8327
10.4%
5 6351
 
7.9%
8 5641
 
7.1%
7 4805
 
6.0%
4 3833
 
4.8%
6 3320
 
4.2%
9 2918
 
3.6%
Latin
ValueCountFrequency (%)
A 9993
99.9%
B 7
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18565
20.6%
1 17543
19.5%
A 9993
11.1%
3 8697
9.7%
2 8327
9.3%
5 6351
 
7.1%
8 5641
 
6.3%
7 4805
 
5.3%
4 3833
 
4.3%
6 3320
 
3.7%
Other values (2) 2925
 
3.2%
Distinct85
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:55:01.627091image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8472
Min length2

Characters and Unicode

Total characters48472
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row산재보험료
2nd row수선유지비
3rd row입주자대표회의운영비
4th row경비비
5th row기타운영수익
ValueCountFrequency (%)
수선유지비 240
 
2.4%
경비비 239
 
2.4%
통신비 231
 
2.3%
급여 229
 
2.3%
세대전기료 224
 
2.2%
복리후생비 224
 
2.2%
사무용품비 223
 
2.2%
국민연금 218
 
2.2%
퇴직급여 218
 
2.2%
보험료 217
 
2.2%
Other values (75) 7737
77.4%
2024-05-11T06:55:02.495135image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5469
 
11.3%
3513
 
7.2%
2133
 
4.4%
1902
 
3.9%
1695
 
3.5%
1341
 
2.8%
1092
 
2.3%
898
 
1.9%
816
 
1.7%
774
 
1.6%
Other values (110) 28839
59.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48472
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5469
 
11.3%
3513
 
7.2%
2133
 
4.4%
1902
 
3.9%
1695
 
3.5%
1341
 
2.8%
1092
 
2.3%
898
 
1.9%
816
 
1.7%
774
 
1.6%
Other values (110) 28839
59.5%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48472
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5469
 
11.3%
3513
 
7.2%
2133
 
4.4%
1902
 
3.9%
1695
 
3.5%
1341
 
2.8%
1092
 
2.3%
898
 
1.9%
816
 
1.7%
774
 
1.6%
Other values (110) 28839
59.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48472
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5469
 
11.3%
3513
 
7.2%
2133
 
4.4%
1902
 
3.9%
1695
 
3.5%
1341
 
2.8%
1092
 
2.3%
898
 
1.9%
816
 
1.7%
774
 
1.6%
Other values (110) 28839
59.5%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202103
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202103
2nd row202103
3rd row202103
4th row202103
5th row202103

Common Values

ValueCountFrequency (%)
202103 10000
100.0%

Length

2024-05-11T06:55:02.758380image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:55:03.108237image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202103 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7347
Distinct (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3413179
Minimum-950040
Maximum5.7529123 × 108
Zeros677
Zeros (%)6.8%
Negative11
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:55:03.450729image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-950040
5-th percentile0
Q198030
median349360
Q31484647.5
95-th percentile16723622
Maximum5.7529123 × 108
Range5.7624127 × 108
Interquartile range (IQR)1386617.5

Descriptive statistics

Standard deviation12654210
Coefficient of variation (CV)3.7074557
Kurtosis519.56763
Mean3413179
Median Absolute Deviation (MAD)327691
Skewness16.329837
Sum3.413179 × 1010
Variance1.6012903 × 1014
MonotonicityNot monotonic
2024-05-11T06:55:03.900505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 677
 
6.8%
200000 108
 
1.1%
100000 64
 
0.6%
300000 62
 
0.6%
400000 55
 
0.5%
23000 47
 
0.5%
150000 45
 
0.4%
250000 38
 
0.4%
50000 36
 
0.4%
500000 33
 
0.3%
Other values (7337) 8835
88.3%
ValueCountFrequency (%)
-950040 1
< 0.1%
-764700 1
< 0.1%
-480000 1
< 0.1%
-429604 1
< 0.1%
-370224 1
< 0.1%
-250000 1
< 0.1%
-130000 1
< 0.1%
-52800 1
< 0.1%
-41660 1
< 0.1%
-4218 1
< 0.1%
ValueCountFrequency (%)
575291230 1
< 0.1%
313254980 1
< 0.1%
262036160 1
< 0.1%
239281500 1
< 0.1%
205596307 1
< 0.1%
189227631 1
< 0.1%
188844150 1
< 0.1%
168364732 1
< 0.1%
158958400 1
< 0.1%
136061790 1
< 0.1%

Interactions

2024-05-11T06:54:57.170117image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:55:04.177569image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.293
금액0.2931.000

Missing values

2024-05-11T06:54:57.564792image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:54:57.929453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
89526마곡엠밸리14단지A15721010산재보험료202103284790
65854한강타운A14004001수선유지비2021032123700
98133은평뉴타운제각말5단지제1A41279923입주자대표회의운영비202103532200
51817양재우성A13789203경비비20210329507470
75753여의대우트럼프월드2차A15089307기타운영수익202103740000
72837문래두산위브A15009505수선유지비2021034706610
95096목동11단지A15807705제수당2021034059020
92650염창태영송화A15786314세대수도료2021034619970
80726신개봉삼환A15280602건강보험료202103104220
25730면목현대A13184208통신비20210390840
아파트명아파트코드비용명년월일금액
54738잠실리센츠A13822003검침수익2021032392090
98458은평뉴타운기자촌11단지A41279932피복비20210350600
97104신정푸른마을2단지A15886508교통비20210340000
82862신도림동아1차A15288813잡수익202103346013
5041금천롯데캐슬골드파크3차아파트A10025946교육비20210323000
94157신트리1단지A15807002국민연금202103438560
77370봉천건영6차아파트A15176602기타운영수익20210379120
53737오금현대아파트A13813010도서인쇄비202103439000
33419행당대림제2A13377902소모품비20210387490
82040구로현대상선A15286802교육비20210315000