Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text2
Categorical2
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15822/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1028 (10.3%) zerosZeros

Reproduction

Analysis started2024-05-11 05:47:31.848465
Analysis finished2024-05-11 05:47:32.905863
Duration1.06 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2113
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:33.078651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length20
Mean length7.1724
Min length2

Characters and Unicode

Total characters71724
Distinct characters432
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st row서빙고금호베스트빌
2nd row신내5단지대림두산
3rd row동부(돌타운)아파트
4th row광장현대8단지
5th row창전현대홈타운
ValueCountFrequency (%)
아파트 112
 
1.1%
래미안 20
 
0.2%
여의도진주 17
 
0.2%
고덕현대 16
 
0.2%
신도림현대 16
 
0.2%
신동아파밀리에 14
 
0.1%
은평뉴타운상림마을6단지 13
 
0.1%
힐스테이트 13
 
0.1%
입주자대표회의 13
 
0.1%
월드컵참누리 13
 
0.1%
Other values (2169) 10263
97.6%
2024-05-11T14:47:33.592122image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2265
 
3.2%
2168
 
3.0%
1949
 
2.7%
1931
 
2.7%
1852
 
2.6%
1701
 
2.4%
1592
 
2.2%
1522
 
2.1%
1516
 
2.1%
1355
 
1.9%
Other values (422) 53873
75.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 65695
91.6%
Decimal Number 3994
 
5.6%
Uppercase Letter 600
 
0.8%
Space Separator 553
 
0.8%
Lowercase Letter 320
 
0.4%
Dash Punctuation 148
 
0.2%
Close Punctuation 140
 
0.2%
Open Punctuation 140
 
0.2%
Other Punctuation 127
 
0.2%
Letter Number 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2265
 
3.4%
2168
 
3.3%
1949
 
3.0%
1931
 
2.9%
1852
 
2.8%
1701
 
2.6%
1592
 
2.4%
1522
 
2.3%
1516
 
2.3%
1355
 
2.1%
Other values (376) 47844
72.8%
Uppercase Letter
ValueCountFrequency (%)
S 117
19.5%
K 81
13.5%
C 62
10.3%
H 48
8.0%
L 42
 
7.0%
D 37
 
6.2%
M 37
 
6.2%
I 33
 
5.5%
E 30
 
5.0%
G 30
 
5.0%
Other values (7) 83
13.8%
Lowercase Letter
ValueCountFrequency (%)
e 193
60.3%
l 36
 
11.2%
i 32
 
10.0%
v 23
 
7.2%
w 11
 
3.4%
s 8
 
2.5%
k 6
 
1.9%
a 3
 
0.9%
h 3
 
0.9%
g 3
 
0.9%
Decimal Number
ValueCountFrequency (%)
1 1231
30.8%
2 1184
29.6%
3 525
13.1%
4 271
 
6.8%
5 205
 
5.1%
6 181
 
4.5%
7 109
 
2.7%
9 105
 
2.6%
8 101
 
2.5%
0 82
 
2.1%
Other Punctuation
ValueCountFrequency (%)
, 108
85.0%
. 19
 
15.0%
Space Separator
ValueCountFrequency (%)
553
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 148
100.0%
Close Punctuation
ValueCountFrequency (%)
) 140
100.0%
Open Punctuation
ValueCountFrequency (%)
( 140
100.0%
Letter Number
ValueCountFrequency (%)
4
100.0%
Math Symbol
ValueCountFrequency (%)
~ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 65695
91.6%
Common 5105
 
7.1%
Latin 924
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2265
 
3.4%
2168
 
3.3%
1949
 
3.0%
1931
 
2.9%
1852
 
2.8%
1701
 
2.6%
1592
 
2.4%
1522
 
2.3%
1516
 
2.3%
1355
 
2.1%
Other values (376) 47844
72.8%
Latin
ValueCountFrequency (%)
e 193
20.9%
S 117
12.7%
K 81
 
8.8%
C 62
 
6.7%
H 48
 
5.2%
L 42
 
4.5%
D 37
 
4.0%
M 37
 
4.0%
l 36
 
3.9%
I 33
 
3.6%
Other values (19) 238
25.8%
Common
ValueCountFrequency (%)
1 1231
24.1%
2 1184
23.2%
553
10.8%
3 525
10.3%
4 271
 
5.3%
5 205
 
4.0%
6 181
 
3.5%
- 148
 
2.9%
) 140
 
2.7%
( 140
 
2.7%
Other values (7) 527
10.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 65695
91.6%
ASCII 6025
 
8.4%
Number Forms 4
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2265
 
3.4%
2168
 
3.3%
1949
 
3.0%
1931
 
2.9%
1852
 
2.8%
1701
 
2.6%
1592
 
2.4%
1522
 
2.3%
1516
 
2.3%
1355
 
2.1%
Other values (376) 47844
72.8%
ASCII
ValueCountFrequency (%)
1 1231
20.4%
2 1184
19.7%
553
 
9.2%
3 525
 
8.7%
4 271
 
4.5%
5 205
 
3.4%
e 193
 
3.2%
6 181
 
3.0%
- 148
 
2.5%
) 140
 
2.3%
Other values (35) 1394
23.1%
Number Forms
ValueCountFrequency (%)
4
100.0%
Distinct2119
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T14:47:34.095457image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique107 ?
Unique (%)1.1%

Sample

1st rowA14024001
2nd rowA13184610
3rd rowA15210103
4th rowA14381510
5th rowA12188202
ValueCountFrequency (%)
a15089513 17
 
0.2%
a12187906 13
 
0.1%
a13203302 13
 
0.1%
a12114001 12
 
0.1%
a15721001 12
 
0.1%
a12009102 12
 
0.1%
a15780703 12
 
0.1%
a15102902 12
 
0.1%
a13290809 11
 
0.1%
a13922903 11
 
0.1%
Other values (2109) 9875
98.8%
2024-05-11T14:47:34.751601image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18299
20.3%
1 17713
19.7%
A 9992
11.1%
3 8897
9.9%
2 7940
8.8%
5 6211
 
6.9%
8 5862
 
6.5%
7 4776
 
5.3%
4 3897
 
4.3%
6 3352
 
3.7%
Other values (2) 3061
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18299
22.9%
1 17713
22.1%
3 8897
11.1%
2 7940
9.9%
5 6211
 
7.8%
8 5862
 
7.3%
7 4776
 
6.0%
4 3897
 
4.9%
6 3352
 
4.2%
9 3053
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18299
22.9%
1 17713
22.1%
3 8897
11.1%
2 7940
9.9%
5 6211
 
7.8%
8 5862
 
7.3%
7 4776
 
6.0%
4 3897
 
4.9%
6 3352
 
4.2%
9 3053
 
3.8%
Latin
ValueCountFrequency (%)
A 9992
99.9%
B 8
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18299
20.3%
1 17713
19.7%
A 9992
11.1%
3 8897
9.9%
2 7940
8.8%
5 6211
 
6.9%
8 5862
 
6.5%
7 4776
 
5.3%
4 3897
 
4.3%
6 3352
 
3.7%
Other values (2) 3061
 
3.4%

비용명
Categorical

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
교육비
 
464
도서인쇄비
 
456
급여
 
455
사무용품비
 
447
통신비
 
438
Other values (39)
7740 

Length

Max length7
Median length5
Mean length4.3145
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row수도광열비
2nd row국민연금
3rd row교육비
4th row수도광열비
5th row기타사용료

Common Values

ValueCountFrequency (%)
교육비 464
 
4.6%
도서인쇄비 456
 
4.6%
급여 455
 
4.5%
사무용품비 447
 
4.5%
통신비 438
 
4.4%
세대전기료 435
 
4.3%
세대수도료 429
 
4.3%
퇴직급여 425
 
4.2%
산재보험료 414
 
4.1%
제수당 413
 
4.1%
Other values (34) 5624
56.2%

Length

2024-05-11T14:47:35.134219image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
교육비 464
 
4.6%
도서인쇄비 456
 
4.6%
급여 455
 
4.5%
사무용품비 447
 
4.5%
통신비 438
 
4.4%
세대전기료 435
 
4.3%
세대수도료 429
 
4.3%
퇴직급여 425
 
4.2%
산재보험료 414
 
4.1%
제수당 413
 
4.1%
Other values (34) 5624
56.2%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201904
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201904
2nd row201904
3rd row201904
4th row201904
5th row201904

Common Values

ValueCountFrequency (%)
201904 10000
100.0%

Length

2024-05-11T14:47:35.298201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T14:47:35.439344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201904 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7390
Distinct (%)73.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3349629.6
Minimum-2627960
Maximum3.3375535 × 108
Zeros1028
Zeros (%)10.3%
Negative9
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T14:47:35.608315image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-2627960
5-th percentile0
Q170157.5
median250435
Q31113225
95-th percentile17781610
Maximum3.3375535 × 108
Range3.3638331 × 108
Interquartile range (IQR)1043067.5

Descriptive statistics

Standard deviation10902978
Coefficient of variation (CV)3.2549803
Kurtosis157.24517
Mean3349629.6
Median Absolute Deviation (MAD)240590
Skewness9.3131543
Sum3.3496296 × 1010
Variance1.1887494 × 1014
MonotonicityNot monotonic
2024-05-11T14:47:35.846386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1028
 
10.3%
200000 121
 
1.2%
300000 57
 
0.6%
110000 45
 
0.4%
100000 43
 
0.4%
150000 38
 
0.4%
30000 34
 
0.3%
10000 27
 
0.3%
165000 25
 
0.2%
400000 24
 
0.2%
Other values (7380) 8558
85.6%
ValueCountFrequency (%)
-2627960 1
 
< 0.1%
-1292560 1
 
< 0.1%
-757062 1
 
< 0.1%
-700000 1
 
< 0.1%
-535200 1
 
< 0.1%
-350000 1
 
< 0.1%
-201100 1
 
< 0.1%
-186247 1
 
< 0.1%
-172000 1
 
< 0.1%
0 1028
10.3%
ValueCountFrequency (%)
333755350 1
< 0.1%
205216900 1
< 0.1%
199690970 1
< 0.1%
189988400 1
< 0.1%
189299140 1
< 0.1%
171402010 1
< 0.1%
158456903 1
< 0.1%
143793110 1
< 0.1%
136235501 1
< 0.1%
133920000 1
< 0.1%

Interactions

2024-05-11T14:47:32.409397image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T14:47:35.992459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.407
금액0.4071.000
2024-05-11T14:47:36.099509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
금액비용명
금액1.0000.165
비용명0.1651.000

Missing values

2024-05-11T14:47:32.649582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T14:47:32.816599image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
29510서빙고금호베스트빌A14024001수도광열비201904168240
10284신내5단지대림두산A13184610국민연금2019041215880
35817동부(돌타운)아파트A15210103교육비2019040
31478광장현대8단지A14381510수도광열비201904190310
6639창전현대홈타운A12188202기타사용료201904941000
4662연희한양아파트A12081703제수당2019041829110
29837현대한강A14085501건강보험료201904585820
30248우이대우A14209001교육비2019040
25186풍납 현대리버빌1차A13887405복리후생비201904700000
44587은평뉴타운폭포동4단지제2A41279924건강보험료201904559340
아파트명아파트코드비용명년월일금액
18010개나리SKVIEWA13579506제수당20190440020
13876마장중앙하이츠A13381601교통비2019041750
14309무학현대A13385802교육비2019040
31298광장극동1차A14380409공동수도료2019041309110
37371관악벽산타운5단지A15303205퇴직급여2019043080000
19342종암우림카이저팰리스A13609001세대전기료2019047088620
20717돈암동부센트레빌A13681303국민연금201904466760
35685개봉삼환A15209205회계감사비201904132000
13928사근중앙하이츠A13381701고용보험료20190464830
32186당산2차효성타운A15004503세대수도료2019049309810