Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 666 (6.7%) zerosZeros

Reproduction

Analysis started2024-05-11 06:51:14.889237
Analysis finished2024-05-11 06:51:17.332322
Duration2.44 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2230
Distinct (%)22.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:17.715443image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length20
Mean length7.3967
Min length2

Characters and Unicode

Total characters73967
Distinct characters433
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique126 ?
Unique (%)1.3%

Sample

1st row개포더샵트리에
2nd row영등포 중흥S-클래스
3rd row방배3차e편한세상
4th row강남데시앙파크
5th row월계삼창
ValueCountFrequency (%)
아파트 195
 
1.8%
래미안 36
 
0.3%
e편한세상 31
 
0.3%
아이파크 25
 
0.2%
북한산 21
 
0.2%
sk뷰 18
 
0.2%
고덕 18
 
0.2%
송파 16
 
0.1%
길음뉴타운 16
 
0.1%
경남아너스빌 15
 
0.1%
Other values (2312) 10527
96.4%
2024-05-11T06:51:18.936169image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2565
 
3.5%
2496
 
3.4%
2353
 
3.2%
1828
 
2.5%
1725
 
2.3%
1710
 
2.3%
1480
 
2.0%
1467
 
2.0%
1437
 
1.9%
1434
 
1.9%
Other values (423) 55472
75.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 67543
91.3%
Decimal Number 3648
 
4.9%
Space Separator 1003
 
1.4%
Uppercase Letter 899
 
1.2%
Lowercase Letter 324
 
0.4%
Close Punctuation 147
 
0.2%
Open Punctuation 147
 
0.2%
Dash Punctuation 137
 
0.2%
Other Punctuation 111
 
0.2%
Letter Number 8
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2565
 
3.8%
2496
 
3.7%
2353
 
3.5%
1828
 
2.7%
1725
 
2.6%
1710
 
2.5%
1480
 
2.2%
1467
 
2.2%
1437
 
2.1%
1434
 
2.1%
Other values (380) 49048
72.6%
Uppercase Letter
ValueCountFrequency (%)
S 148
16.5%
C 106
11.8%
K 104
11.6%
D 88
9.8%
M 88
9.8%
L 57
 
6.3%
I 54
 
6.0%
E 52
 
5.8%
H 50
 
5.6%
V 42
 
4.7%
Other values (7) 110
12.2%
Decimal Number
ValueCountFrequency (%)
2 1118
30.6%
1 1066
29.2%
3 446
 
12.2%
4 255
 
7.0%
5 236
 
6.5%
7 131
 
3.6%
6 125
 
3.4%
9 108
 
3.0%
8 95
 
2.6%
0 68
 
1.9%
Lowercase Letter
ValueCountFrequency (%)
e 198
61.1%
l 26
 
8.0%
i 23
 
7.1%
k 20
 
6.2%
s 16
 
4.9%
v 14
 
4.3%
w 12
 
3.7%
c 10
 
3.1%
h 5
 
1.5%
Other Punctuation
ValueCountFrequency (%)
, 93
83.8%
. 18
 
16.2%
Space Separator
ValueCountFrequency (%)
1003
100.0%
Close Punctuation
ValueCountFrequency (%)
) 147
100.0%
Open Punctuation
ValueCountFrequency (%)
( 147
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 137
100.0%
Letter Number
ValueCountFrequency (%)
8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 67543
91.3%
Common 5193
 
7.0%
Latin 1231
 
1.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2565
 
3.8%
2496
 
3.7%
2353
 
3.5%
1828
 
2.7%
1725
 
2.6%
1710
 
2.5%
1480
 
2.2%
1467
 
2.2%
1437
 
2.1%
1434
 
2.1%
Other values (380) 49048
72.6%
Latin
ValueCountFrequency (%)
e 198
16.1%
S 148
12.0%
C 106
 
8.6%
K 104
 
8.4%
D 88
 
7.1%
M 88
 
7.1%
L 57
 
4.6%
I 54
 
4.4%
E 52
 
4.2%
H 50
 
4.1%
Other values (17) 286
23.2%
Common
ValueCountFrequency (%)
2 1118
21.5%
1 1066
20.5%
1003
19.3%
3 446
 
8.6%
4 255
 
4.9%
5 236
 
4.5%
) 147
 
2.8%
( 147
 
2.8%
- 137
 
2.6%
7 131
 
2.5%
Other values (6) 507
9.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 67543
91.3%
ASCII 6416
 
8.7%
Number Forms 8
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2565
 
3.8%
2496
 
3.7%
2353
 
3.5%
1828
 
2.7%
1725
 
2.6%
1710
 
2.5%
1480
 
2.2%
1467
 
2.2%
1437
 
2.1%
1434
 
2.1%
Other values (380) 49048
72.6%
ASCII
ValueCountFrequency (%)
2 1118
17.4%
1 1066
16.6%
1003
15.6%
3 446
 
7.0%
4 255
 
4.0%
5 236
 
3.7%
e 198
 
3.1%
S 148
 
2.3%
) 147
 
2.3%
( 147
 
2.3%
Other values (32) 1652
25.7%
Number Forms
ValueCountFrequency (%)
8
100.0%
Distinct2236
Distinct (%)22.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:19.727833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique126 ?
Unique (%)1.3%

Sample

1st rowA10023996
2nd rowA10024316
3rd rowA13783001
4th rowA13519005
5th rowA13984603
ValueCountFrequency (%)
a13905202 14
 
0.1%
a15679107 13
 
0.1%
a13776508 12
 
0.1%
a13982005 11
 
0.1%
a10027553 11
 
0.1%
a13302206 11
 
0.1%
a14380414 11
 
0.1%
a13921005 11
 
0.1%
a13528103 11
 
0.1%
a15807606 11
 
0.1%
Other values (2226) 9884
98.8%
2024-05-11T06:51:20.854740image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18509
20.6%
1 17426
19.4%
A 9998
11.1%
3 8674
9.6%
2 8343
9.3%
5 6308
 
7.0%
8 5626
 
6.3%
7 4746
 
5.3%
4 4073
 
4.5%
6 3346
 
3.7%
Other values (2) 2951
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18509
23.1%
1 17426
21.8%
3 8674
10.8%
2 8343
10.4%
5 6308
 
7.9%
8 5626
 
7.0%
7 4746
 
5.9%
4 4073
 
5.1%
6 3346
 
4.2%
9 2949
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9998
> 99.9%
B 2
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18509
23.1%
1 17426
21.8%
3 8674
10.8%
2 8343
10.4%
5 6308
 
7.9%
8 5626
 
7.0%
7 4746
 
5.9%
4 4073
 
5.1%
6 3346
 
4.2%
9 2949
 
3.7%
Latin
ValueCountFrequency (%)
A 9998
> 99.9%
B 2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18509
20.6%
1 17426
19.4%
A 9998
11.1%
3 8674
9.6%
2 8343
9.3%
5 6308
 
7.0%
8 5626
 
6.3%
7 4746
 
5.3%
4 4073
 
4.5%
6 3346
 
3.7%
Other values (2) 2951
 
3.3%
Distinct87
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T06:51:21.428884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.8051
Min length2

Characters and Unicode

Total characters48051
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row고용보험료
2nd row부과차익
3rd row복리후생비
4th row위탁관리수수료
5th row감가상각비
ValueCountFrequency (%)
연체료수익 244
 
2.4%
통신비 243
 
2.4%
급여 240
 
2.4%
소독비 238
 
2.4%
도서인쇄비 237
 
2.4%
사무용품비 237
 
2.4%
제수당 236
 
2.4%
승강기유지비 233
 
2.3%
퇴직급여 232
 
2.3%
세대전기료 232
 
2.3%
Other values (77) 7628
76.3%
2024-05-11T06:51:22.479318image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5385
 
11.2%
3604
 
7.5%
2199
 
4.6%
1914
 
4.0%
1641
 
3.4%
1386
 
2.9%
1123
 
2.3%
887
 
1.8%
853
 
1.8%
806
 
1.7%
Other values (110) 28253
58.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 48051
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5385
 
11.2%
3604
 
7.5%
2199
 
4.6%
1914
 
4.0%
1641
 
3.4%
1386
 
2.9%
1123
 
2.3%
887
 
1.8%
853
 
1.8%
806
 
1.7%
Other values (110) 28253
58.8%

Most occurring scripts

ValueCountFrequency (%)
Hangul 48051
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5385
 
11.2%
3604
 
7.5%
2199
 
4.6%
1914
 
4.0%
1641
 
3.4%
1386
 
2.9%
1123
 
2.3%
887
 
1.8%
853
 
1.8%
806
 
1.7%
Other values (110) 28253
58.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 48051
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5385
 
11.2%
3604
 
7.5%
2199
 
4.6%
1914
 
4.0%
1641
 
3.4%
1386
 
2.9%
1123
 
2.3%
887
 
1.8%
853
 
1.8%
806
 
1.7%
Other values (110) 28253
58.8%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
202202
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row202202
2nd row202202
3rd row202202
4th row202202
5th row202202

Common Values

ValueCountFrequency (%)
202202 10000
100.0%

Length

2024-05-11T06:51:23.042519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T06:51:23.494574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
202202 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7361
Distinct (%)73.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4024405.4
Minimum-3041160
Maximum5.7585519 × 108
Zeros666
Zeros (%)6.7%
Negative8
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T06:51:23.873716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-3041160
5-th percentile0
Q1100000
median351035
Q31500252.5
95-th percentile20389597
Maximum5.7585519 × 108
Range5.7889635 × 108
Interquartile range (IQR)1400252.5

Descriptive statistics

Standard deviation14833177
Coefficient of variation (CV)3.6858059
Kurtosis311.81506
Mean4024405.4
Median Absolute Deviation (MAD)327035
Skewness12.787864
Sum4.0244054 × 1010
Variance2.2002315 × 1014
MonotonicityNot monotonic
2024-05-11T06:51:24.397225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 666
 
6.7%
200000 82
 
0.8%
300000 72
 
0.7%
100000 72
 
0.7%
30000 47
 
0.5%
150000 46
 
0.5%
400000 42
 
0.4%
50000 41
 
0.4%
48000 41
 
0.4%
500000 40
 
0.4%
Other values (7351) 8851
88.5%
ValueCountFrequency (%)
-3041160 1
 
< 0.1%
-2000000 1
 
< 0.1%
-920000 1
 
< 0.1%
-766350 1
 
< 0.1%
-323400 1
 
< 0.1%
-100000 1
 
< 0.1%
-44300 1
 
< 0.1%
-4557 1
 
< 0.1%
0 666
6.7%
1 1
 
< 0.1%
ValueCountFrequency (%)
575855190 1
< 0.1%
353828011 1
< 0.1%
273642060 1
< 0.1%
267969109 1
< 0.1%
225276784 1
< 0.1%
204074720 1
< 0.1%
194203956 1
< 0.1%
193937080 1
< 0.1%
193876970 1
< 0.1%
184251670 1
< 0.1%

Interactions

2024-05-11T06:51:16.305529image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T06:51:24.757682image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.433
금액0.4331.000

Missing values

2024-05-11T06:51:16.731582image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T06:51:17.154842image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
388개포더샵트리에A10023996고용보험료202202158890
1120영등포 중흥S-클래스A10024316부과차익20220219410
49204방배3차e편한세상A13783001복리후생비2022020
38987강남데시앙파크A13519005위탁관리수수료202202319757
61258월계삼창A13984603감가상각비202202128000
94431은평뉴타운폭포동4단지제1A41279930세금과공과2022020
68255광장청구A14381513입주자대표회의운영비202202180290
83572흑석한강센트레빌A15679107퇴직급여2022021912050
24790상봉건영캐스빌A13122001음식물처리비202202684560
92229목동현대2차A15882006통신비20220232290
아파트명아파트코드비용명년월일금액
11243북한산힐스테이트7차제2 (임대)A10028056이자수익2022020
27127도봉서광A13201001도서인쇄비202202143000
1334공덕SK리더스뷰 1단지A10024408위탁관리수수료202202344159
37570논현신동아A13501004경비비20220220457750
73615삼성산주공3단지A15101506교통비2022026700
36379고덕현대A13480401승강기유지비2022021008130
60599상계한신A13983608건강보험료202202412190
87893등촌임광A15783701장기수선비2022021658100
82867사당동작삼성래미안아파트A15609306고용보험료202202213700
81753상도경향렉스빌A15603401퇴직급여202202664310