Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 1543 (15.4%) zerosZeros

Reproduction

Analysis started2024-05-11 06:54:04.952227
Analysis finished2024-05-11 06:54:06.713105
Duration1.76 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2121
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:06.942422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.2512
Min length2

Characters and Unicode

Total characters72512
Distinct characters427
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st row장안현대힐스테이트
2nd row공릉2단지라이프
3rd row남가좌현대아파트
4th row벽산라이브파크
5th row응암금호
ValueCountFrequency (%)
아파트 174
 
1.6%
래미안 33
 
0.3%
아이파크 28
 
0.3%
e편한세상 21
 
0.2%
송파 18
 
0.2%
힐스테이트 17
 
0.2%
북한산 16
 
0.1%
이편한세상 15
 
0.1%
신도림현대 15
 
0.1%
독립문극동 14
 
0.1%
Other values (2193) 10407
96.7%
2024-05-11T06:54:07.851874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2567
 
3.5%
2463
 
3.4%
2293
 
3.2%
1866
 
2.6%
1620
 
2.2%
1618
 
2.2%
1425
 
2.0%
1420
 
2.0%
1403
 
1.9%
1307
 
1.8%
Other values (417) 54530
75.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 66559
91.8%
Decimal Number 3341
 
4.6%
Space Separator 858
 
1.2%
Uppercase Letter 784
 
1.1%
Lowercase Letter 370
 
0.5%
Open Punctuation 171
 
0.2%
Close Punctuation 171
 
0.2%
Other Punctuation 136
 
0.2%
Dash Punctuation 119
 
0.2%
Letter Number 3
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2567
 
3.9%
2463
 
3.7%
2293
 
3.4%
1866
 
2.8%
1620
 
2.4%
1618
 
2.4%
1425
 
2.1%
1420
 
2.1%
1403
 
2.1%
1307
 
2.0%
Other values (372) 48577
73.0%
Uppercase Letter
ValueCountFrequency (%)
S 116
14.8%
C 115
14.7%
K 90
11.5%
D 77
9.8%
M 77
9.8%
L 61
7.8%
H 55
7.0%
I 37
 
4.7%
G 30
 
3.8%
E 27
 
3.4%
Other values (7) 99
12.6%
Lowercase Letter
ValueCountFrequency (%)
e 200
54.1%
l 38
 
10.3%
i 34
 
9.2%
v 25
 
6.8%
k 21
 
5.7%
s 19
 
5.1%
w 13
 
3.5%
c 12
 
3.2%
h 4
 
1.1%
g 2
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 1019
30.5%
2 1004
30.1%
3 444
13.3%
4 215
 
6.4%
5 188
 
5.6%
6 142
 
4.3%
7 99
 
3.0%
8 90
 
2.7%
9 73
 
2.2%
0 67
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 108
79.4%
. 28
 
20.6%
Space Separator
ValueCountFrequency (%)
858
100.0%
Open Punctuation
ValueCountFrequency (%)
( 171
100.0%
Close Punctuation
ValueCountFrequency (%)
) 171
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 119
100.0%
Letter Number
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 66559
91.8%
Common 4796
 
6.6%
Latin 1157
 
1.6%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2567
 
3.9%
2463
 
3.7%
2293
 
3.4%
1866
 
2.8%
1620
 
2.4%
1618
 
2.4%
1425
 
2.1%
1420
 
2.1%
1403
 
2.1%
1307
 
2.0%
Other values (372) 48577
73.0%
Latin
ValueCountFrequency (%)
e 200
17.3%
S 116
 
10.0%
C 115
 
9.9%
K 90
 
7.8%
D 77
 
6.7%
M 77
 
6.7%
L 61
 
5.3%
H 55
 
4.8%
l 38
 
3.3%
I 37
 
3.2%
Other values (19) 291
25.2%
Common
ValueCountFrequency (%)
1 1019
21.2%
2 1004
20.9%
858
17.9%
3 444
9.3%
4 215
 
4.5%
5 188
 
3.9%
( 171
 
3.6%
) 171
 
3.6%
6 142
 
3.0%
- 119
 
2.5%
Other values (6) 465
9.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 66559
91.8%
ASCII 5950
 
8.2%
Number Forms 3
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2567
 
3.9%
2463
 
3.7%
2293
 
3.4%
1866
 
2.8%
1620
 
2.4%
1618
 
2.4%
1425
 
2.1%
1420
 
2.1%
1403
 
2.1%
1307
 
2.0%
Other values (372) 48577
73.0%
ASCII
ValueCountFrequency (%)
1 1019
17.1%
2 1004
16.9%
858
14.4%
3 444
 
7.5%
4 215
 
3.6%
e 200
 
3.4%
5 188
 
3.2%
( 171
 
2.9%
) 171
 
2.9%
6 142
 
2.4%
Other values (34) 1538
25.8%
Number Forms
ValueCountFrequency (%)
3
100.0%
Distinct2126
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:08.670978image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique109 ?
Unique (%)1.1%

Sample

1st rowA13010004
2nd rowA13980510
3rd rowA12012203
4th rowA14272305
5th rowA12201102
ValueCountFrequency (%)
a12008003 14
 
0.1%
a13981405 13
 
0.1%
a13813002 13
 
0.1%
a13501006 12
 
0.1%
a10027632 12
 
0.1%
a13880806 12
 
0.1%
a13287801 12
 
0.1%
a13985107 11
 
0.1%
a15083601 11
 
0.1%
a10025770 11
 
0.1%
Other values (2116) 9879
98.8%
2024-05-11T06:54:09.857867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18938
21.0%
1 17503
19.4%
A 10000
11.1%
3 8948
9.9%
2 8351
9.3%
5 6210
 
6.9%
8 5309
 
5.9%
7 4646
 
5.2%
4 3891
 
4.3%
6 3401
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18938
23.7%
1 17503
21.9%
3 8948
11.2%
2 8351
10.4%
5 6210
 
7.8%
8 5309
 
6.6%
7 4646
 
5.8%
4 3891
 
4.9%
6 3401
 
4.3%
9 2803
 
3.5%
Uppercase Letter
ValueCountFrequency (%)
A 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18938
23.7%
1 17503
21.9%
3 8948
11.2%
2 8351
10.4%
5 6210
 
7.8%
8 5309
 
6.6%
7 4646
 
5.8%
4 3891
 
4.9%
6 3401
 
4.3%
9 2803
 
3.5%
Latin
ValueCountFrequency (%)
A 10000
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18938
21.0%
1 17503
19.4%
A 10000
11.1%
3 8948
9.9%
2 8351
9.3%
5 6210
 
6.9%
8 5309
 
5.9%
7 4646
 
5.2%
4 3891
 
4.3%
6 3401
 
3.8%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:54:10.398946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8695
Min length2

Characters and Unicode

Total characters48695
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row산재보험료
2nd row국민연금
3rd row기타운영수익
4th row지급수수료
5th row교육비
ValueCountFrequency (%)
급여 225
 
2.2%
이자수익 223
 
2.2%
승강기유지비 220
 
2.2%
제수당 220
 
2.2%
경비비 219
 
2.2%
수선유지비 215
 
2.1%
산재보험료 213
 
2.1%
세대전기료 213
 
2.1%
보험료 213
 
2.1%
퇴직급여 213
 
2.1%
Other values (76) 7826
78.3%
2024-05-11T06:54:11.323312image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5337
 
11.0%
3559
 
7.3%
2072
 
4.3%
2040
 
4.2%
1670
 
3.4%
1326
 
2.7%
1045
 
2.1%
816
 
1.7%
812
 
1.7%
773
 
1.6%
Other values (110) 29245
60.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48695
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5337
 
11.0%
3559
 
7.3%
2072
 
4.3%
2040
 
4.2%
1670
 
3.4%
1326
 
2.7%
1045
 
2.1%
816
 
1.7%
812
 
1.7%
773
 
1.6%
Other values (110) 29245
60.1%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48695
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5337
 
11.0%
3559
 
7.3%
2072
 
4.3%
2040
 
4.2%
1670
 
3.4%
1326
 
2.7%
1045
 
2.1%
816
 
1.7%
812
 
1.7%
773
 
1.6%
Other values (110) 29245
60.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48695
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5337
 
11.0%
3559
 
7.3%
2072
 
4.3%
2040
 
4.2%
1670
 
3.4%
1326
 
2.7%
1045
 
2.1%
816
 
1.7%
812
 
1.7%
773
 
1.6%
Other values (110) 29245
60.1%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202108
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202108
2nd row202108
3rd row202108
4th row202108
5th row202108

Common Values

ValueCountFrequency (%)
202108 10000
100.0%

Length

2024-05-11T06:54:11.565529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:54:11.820825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202108 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct6648
Distinct (%)66.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3347634.5
Minimum-10690910
Maximum4.0309941 × 108
Zeros1543
Zeros (%)15.4%
Negative17
Negative (%)0.2%
Memory size166.0 KiB
2024-05-11T06:54:12.167439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-10690910
5-th percentile0
Q155015
median290000
Q31318037.5
95-th percentile15843758
Maximum4.0309941 × 108
Range4.1379032 × 108
Interquartile range (IQR)1263022.5

Descriptive statistics

Standard deviation12969932
Coefficient of variation (CV)3.8743574
Kurtosis238.64925
Mean3347634.5
Median Absolute Deviation (MAD)290000
Skewness12.02382
Sum3.3476345 × 1010
Variance1.6821914 × 1014
MonotonicityNot monotonic
2024-05-11T06:54:12.625781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1543
 
15.4%
62500 101
 
1.0%
200000 98
 
1.0%
100000 58
 
0.6%
300000 58
 
0.6%
400000 43
 
0.4%
250000 39
 
0.4%
150000 36
 
0.4%
600000 35
 
0.4%
500000 33
 
0.3%
Other values (6638) 7956
79.6%
ValueCountFrequency (%)
-10690910 1
< 0.1%
-4000000 1
< 0.1%
-3559280 1
< 0.1%
-973103 1
< 0.1%
-899496 1
< 0.1%
-891990 1
< 0.1%
-500000 1
< 0.1%
-331500 1
< 0.1%
-293101 1
< 0.1%
-210000 1
< 0.1%
ValueCountFrequency (%)
403099407 1
< 0.1%
384887510 1
< 0.1%
262944193 1
< 0.1%
217920012 1
< 0.1%
204625840 1
< 0.1%
194219160 1
< 0.1%
185065449 1
< 0.1%
181708170 1
< 0.1%
179142345 1
< 0.1%
178249310 1
< 0.1%

Interactions

2024-05-11T06:54:05.858552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:54:12.877998image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.443
금액0.4431.000

Missing values

2024-05-11T06:54:06.188518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:54:06.509933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
24288장안현대힐스테이트A13010004산재보험료202108232060
64776공릉2단지라이프A13980510국민연금202108340370
14702남가좌현대아파트A12012203기타운영수익2021083371862
72329벽산라이브파크A14272305지급수수료2021086000
20589응암금호A12201102교육비2021080
89677상도더샵2차A15603009국민연금20210880860
34318행당대림A13307204교통비20210835300
55186우면코오롱A13790002장기수선비2021085784000
82723구로우성(1동)A15205103소독비202108190000
92845보라매파크빌A15685503주차장수익2021082184000
아파트명아파트코드비용명년월일금액
95334마곡수명산파크6단지A15728002교통비2021080
194디에이치반포라클라스A10024254부과차익2021080
20513녹번역센트레빌A12201005고용안정사업수익202108120000
76040당산2차삼성A15004405고용보험료20210891640
11049강남한신휴플러스 6단지A10027912교통비20210820000
11129대림쌍용플래티넘SA10027935소독비202108108000
9619역삼자이아파트A10027474교육비2021080
42153청담2차현대아파트A13510202보험료202108420130
33995마장신성미소지움A13305003국민연금202108418310
43523개포2차 현대아파트A13524006정화조관리비202108529182