Overview

Dataset statistics

Number of variables5
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory488.3 KiB
Average record size in memory50.0 B

Variable types

Text3
Categorical1
Numeric1

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15821/S/1/datasetView.do

Alerts

년월일 has constant value ""Constant
금액 has 426 (4.3%) zerosZeros

Reproduction

Analysis started2024-05-11 07:00:41.889976
Analysis finished2024-05-11 07:00:43.781710
Duration1.89 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct2093
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:44.049096image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length19
Mean length7.0493
Min length2

Characters and Unicode

Total characters70493
Distinct characters431
Distinct categories11 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique105 ?
Unique (%)1.1%

Sample

1st row서초3차e편한세상
2nd row수서삼익
3rd row번동주공4단지
4th row래미안신당하이베르
5th row송파파인타운9단지
ValueCountFrequency (%)
아파트 93
 
0.9%
래미안 32
 
0.3%
입주자대표회의 21
 
0.2%
서강gs 15
 
0.1%
수서삼익 14
 
0.1%
봉천은천1단지 13
 
0.1%
잠원신화 13
 
0.1%
성산시영아파트 13
 
0.1%
보문파크뷰자이아파트 13
 
0.1%
포레스트힐시티 12
 
0.1%
Other values (2146) 10196
97.7%
2024-05-11T07:00:45.082959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2091
 
3.0%
1967
 
2.8%
1965
 
2.8%
1847
 
2.6%
1749
 
2.5%
1721
 
2.4%
1598
 
2.3%
1545
 
2.2%
1427
 
2.0%
1371
 
1.9%
Other values (421) 53212
75.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 64775
91.9%
Decimal Number 3715
 
5.3%
Uppercase Letter 683
 
1.0%
Space Separator 469
 
0.7%
Lowercase Letter 326
 
0.5%
Open Punctuation 146
 
0.2%
Close Punctuation 146
 
0.2%
Dash Punctuation 123
 
0.2%
Other Punctuation 104
 
0.1%
Math Symbol 5
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2091
 
3.2%
1967
 
3.0%
1965
 
3.0%
1847
 
2.9%
1749
 
2.7%
1721
 
2.7%
1598
 
2.5%
1545
 
2.4%
1427
 
2.2%
1371
 
2.1%
Other values (375) 47494
73.3%
Uppercase Letter
ValueCountFrequency (%)
S 141
20.6%
K 104
15.2%
C 65
9.5%
L 47
 
6.9%
G 43
 
6.3%
H 40
 
5.9%
E 39
 
5.7%
I 36
 
5.3%
D 32
 
4.7%
M 32
 
4.7%
Other values (7) 104
15.2%
Lowercase Letter
ValueCountFrequency (%)
e 178
54.6%
l 38
 
11.7%
i 34
 
10.4%
v 24
 
7.4%
s 12
 
3.7%
w 11
 
3.4%
k 8
 
2.5%
h 7
 
2.1%
c 6
 
1.8%
g 4
 
1.2%
Decimal Number
ValueCountFrequency (%)
1 1160
31.2%
2 1072
28.9%
3 488
13.1%
4 253
 
6.8%
5 202
 
5.4%
6 173
 
4.7%
7 103
 
2.8%
9 102
 
2.7%
8 86
 
2.3%
0 76
 
2.0%
Other Punctuation
ValueCountFrequency (%)
, 88
84.6%
. 16
 
15.4%
Space Separator
ValueCountFrequency (%)
469
100.0%
Open Punctuation
ValueCountFrequency (%)
( 146
100.0%
Close Punctuation
ValueCountFrequency (%)
) 146
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 123
100.0%
Math Symbol
ValueCountFrequency (%)
~ 5
100.0%
Letter Number
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 64775
91.9%
Common 4708
 
6.7%
Latin 1010
 
1.4%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2091
 
3.2%
1967
 
3.0%
1965
 
3.0%
1847
 
2.9%
1749
 
2.7%
1721
 
2.7%
1598
 
2.5%
1545
 
2.4%
1427
 
2.2%
1371
 
2.1%
Other values (375) 47494
73.3%
Latin
ValueCountFrequency (%)
e 178
17.6%
S 141
14.0%
K 104
 
10.3%
C 65
 
6.4%
L 47
 
4.7%
G 43
 
4.3%
H 40
 
4.0%
E 39
 
3.9%
l 38
 
3.8%
I 36
 
3.6%
Other values (19) 279
27.6%
Common
ValueCountFrequency (%)
1 1160
24.6%
2 1072
22.8%
3 488
10.4%
469
10.0%
4 253
 
5.4%
5 202
 
4.3%
6 173
 
3.7%
( 146
 
3.1%
) 146
 
3.1%
- 123
 
2.6%
Other values (7) 476
10.1%

Most occurring blocks

ValueCountFrequency (%)
Hangul 64775
91.9%
ASCII 5717
 
8.1%
Number Forms 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
2091
 
3.2%
1967
 
3.0%
1965
 
3.0%
1847
 
2.9%
1749
 
2.7%
1721
 
2.7%
1598
 
2.5%
1545
 
2.4%
1427
 
2.2%
1371
 
2.1%
Other values (375) 47494
73.3%
ASCII
ValueCountFrequency (%)
1 1160
20.3%
2 1072
18.8%
3 488
 
8.5%
469
 
8.2%
4 253
 
4.4%
5 202
 
3.5%
e 178
 
3.1%
6 173
 
3.0%
( 146
 
2.6%
) 146
 
2.6%
Other values (35) 1430
25.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Distinct2099
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:45.855996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters90000
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique106 ?
Unique (%)1.1%

Sample

1st rowA13786803
2nd rowA13522003
3rd rowA14206202
4th rowA10078901
5th rowA13821007
ValueCountFrequency (%)
a12114001 15
 
0.1%
a13522003 14
 
0.1%
a15106101 13
 
0.1%
a12185004 13
 
0.1%
a13790703 13
 
0.1%
a10027189 13
 
0.1%
a13887405 12
 
0.1%
a12208102 12
 
0.1%
a13877501 12
 
0.1%
a15208202 12
 
0.1%
Other values (2089) 9871
98.7%
2024-05-11T07:00:47.003698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 18359
20.4%
1 17760
19.7%
A 9994
11.1%
3 9012
10.0%
2 7907
8.8%
5 6175
 
6.9%
8 5827
 
6.5%
7 4798
 
5.3%
4 3758
 
4.2%
6 3432
 
3.8%
Other values (2) 2978
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
88.9%
Uppercase Letter 10000
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 18359
22.9%
1 17760
22.2%
3 9012
11.3%
2 7907
9.9%
5 6175
 
7.7%
8 5827
 
7.3%
7 4798
 
6.0%
4 3758
 
4.7%
6 3432
 
4.3%
9 2972
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 80000
88.9%
Latin 10000
 
11.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0 18359
22.9%
1 17760
22.2%
3 9012
11.3%
2 7907
9.9%
5 6175
 
7.7%
8 5827
 
7.3%
7 4798
 
6.0%
4 3758
 
4.7%
6 3432
 
4.3%
9 2972
 
3.7%
Latin
ValueCountFrequency (%)
A 9994
99.9%
B 6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 18359
20.4%
1 17760
19.7%
A 9994
11.1%
3 9012
10.0%
2 7907
8.8%
5 6175
 
6.9%
8 5827
 
6.5%
7 4798
 
5.3%
4 3758
 
4.2%
6 3432
 
3.8%
Other values (2) 2978
 
3.3%
Distinct86
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-11T07:00:47.575235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length4.7322
Min length2

Characters and Unicode

Total characters47322
Distinct characters120
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row부과차손
2nd row도서인쇄비
3rd row세대전기료
4th row업무추진비
5th row도서인쇄비
ValueCountFrequency (%)
경비비 262
 
2.6%
청소비 251
 
2.5%
세대전기료 249
 
2.5%
승강기유지비 247
 
2.5%
소독비 247
 
2.5%
제수당 242
 
2.4%
수선유지비 238
 
2.4%
급여 238
 
2.4%
통신비 233
 
2.3%
연체료수익 230
 
2.3%
Other values (76) 7563
75.6%
2024-05-11T07:00:48.714716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5567
 
11.8%
3538
 
7.5%
2237
 
4.7%
1665
 
3.5%
1559
 
3.3%
1377
 
2.9%
1129
 
2.4%
919
 
1.9%
834
 
1.8%
785
 
1.7%
Other values (110) 27712
58.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 47322
100.0%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
5567
 
11.8%
3538
 
7.5%
2237
 
4.7%
1665
 
3.5%
1559
 
3.3%
1377
 
2.9%
1129
 
2.4%
919
 
1.9%
834
 
1.8%
785
 
1.7%
Other values (110) 27712
58.6%

Most occurring scripts

ValueCountFrequency (%)
Hangul 47322
100.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
5567
 
11.8%
3538
 
7.5%
2237
 
4.7%
1665
 
3.5%
1559
 
3.3%
1377
 
2.9%
1129
 
2.4%
919
 
1.9%
834
 
1.8%
785
 
1.7%
Other values (110) 27712
58.6%

Most occurring blocks

ValueCountFrequency (%)
Hangul 47322
100.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
5567
 
11.8%
3538
 
7.5%
2237
 
4.7%
1665
 
3.5%
1559
 
3.3%
1377
 
2.9%
1129
 
2.4%
919
 
1.9%
834
 
1.8%
785
 
1.7%
Other values (110) 27712
58.6%

년월일
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
201901
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row201901
2nd row201901
3rd row201901
4th row201901
5th row201901

Common Values

ValueCountFrequency (%)
201901 10000
100.0%

Length

2024-05-11T07:00:49.142973image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T07:00:49.454962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
201901 10000
100.0%

금액
Real number (ℝ)

ZEROS 

Distinct7594
Distinct (%)75.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4058669.2
Minimum-10594720
Maximum6.3761019 × 108
Zeros426
Zeros (%)4.3%
Negative6
Negative (%)0.1%
Memory size166.0 KiB
2024-05-11T07:00:49.822066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-10594720
5-th percentile559.1
Q1112273.25
median390445
Q31745775
95-th percentile18306260
Maximum6.3761019 × 108
Range6.4820491 × 108
Interquartile range (IQR)1633501.8

Descriptive statistics

Standard deviation16499701
Coefficient of variation (CV)4.0652984
Kurtosis392.15579
Mean4058669.2
Median Absolute Deviation (MAD)352065
Skewness15.313611
Sum4.0586692 × 1010
Variance2.7224015 × 1014
MonotonicityNot monotonic
2024-05-11T07:00:50.248592image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 426
 
4.3%
200000 107
 
1.1%
100000 75
 
0.8%
300000 67
 
0.7%
78000 52
 
0.5%
150000 40
 
0.4%
110000 39
 
0.4%
400000 38
 
0.4%
250000 33
 
0.3%
500000 31
 
0.3%
Other values (7584) 9092
90.9%
ValueCountFrequency (%)
-10594720 1
 
< 0.1%
-2913440 1
 
< 0.1%
-700000 1
 
< 0.1%
-612000 1
 
< 0.1%
-13900 1
 
< 0.1%
-7100 1
 
< 0.1%
0 426
4.3%
1 3
 
< 0.1%
2 1
 
< 0.1%
3 1
 
< 0.1%
ValueCountFrequency (%)
637610193 1
< 0.1%
480668350 1
< 0.1%
423388692 1
< 0.1%
343506510 1
< 0.1%
280325400 1
< 0.1%
245169650 1
< 0.1%
229951550 1
< 0.1%
213553380 1
< 0.1%
209035870 1
< 0.1%
205956800 1
< 0.1%

Interactions

2024-05-11T07:00:42.854519image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-11T07:00:50.490371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
비용명금액
비용명1.0000.408
금액0.4081.000

Missing values

2024-05-11T07:00:43.230799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T07:00:43.611157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

아파트명아파트코드비용명년월일금액
42816서초3차e편한세상A13786803부과차손2019011045
32884수서삼익A13522003도서인쇄비2019010
57660번동주공4단지A14206202세대전기료20190132120003
5586래미안신당하이베르A10078901업무추진비201901250000
45967송파파인타운9단지A13821007도서인쇄비20190180000
14033갈현베르빌주상복합아파트A12271402급여2019015040000
51912공릉풍림아이원A13980513음식물처리비2019012695120
64279문래국화A15083601경비비2019017402340
10208마포동원베네스트A12170401고용안정사업비용2019010
14763뉴신사신성A12289401위탁관리수수료201901220000
아파트명아파트코드비용명년월일금액
58194한일유앤아이A14272303공동수도료201901296650
64571신길남서울A15085805퇴직급여2019011061540
38123래미안길음1차A13611103광고료수익201901450000
75460대방우정A15681103수선유지비2019011463290
33811도곡개포한신아파트A13527016알뜰시장수익201901500000
11457도화현대홈타운A12181404회계감사비201901-2913440
79156가양한강타운A15780604도서인쇄비201901606330
539마포자이3차아파트A10026036경비비20190122483090
63953대림우성A15081503퇴직급여2019010
62239신길삼환A15005705고용안정사업비용201901160330