Overview

Dataset statistics

Number of variables7
Number of observations10000
Missing cells4603
Missing cells (%)6.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory654.3 KiB
Average record size in memory67.0 B

Variable types

Text3
Categorical2
Numeric2

Dataset

Description관리_오수정화시설_pk,관리_건축물대장_pk,대표_여부,형식_코드,기타_형식,용량_인용,용량_루베
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15657/S/1/datasetView.do

Alerts

대표_여부 is highly imbalanced (94.8%)Imbalance
형식_코드 is highly imbalanced (61.6%)Imbalance
기타_형식 has 4603 (46.0%) missing valuesMissing
용량_인용 is highly skewed (γ1 = 58.23216834)Skewed
용량_루베 is highly skewed (γ1 = 80.13116203)Skewed
관리_오수정화시설_pk has unique valuesUnique
용량_인용 has 4846 (48.5%) zerosZeros
용량_루베 has 9572 (95.7%) zerosZeros

Reproduction

Analysis started2024-05-18 03:32:32.630131
Analysis finished2024-05-18 03:32:36.232088
Duration3.6 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T12:32:37.117029image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length11
Mean length11.303
Min length7

Characters and Unicode

Total characters113030
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11200-10937
2nd row11110-20570
3rd row11140-16902
4th row11200-17739
5th row11170-14112
ValueCountFrequency (%)
11200-10937 1
 
< 0.1%
11200-13133 1
 
< 0.1%
11200-23879 1
 
< 0.1%
11200-5599 1
 
< 0.1%
11200-16381 1
 
< 0.1%
11170-15878 1
 
< 0.1%
11200-19639 1
 
< 0.1%
11110-14084 1
 
< 0.1%
11170-22744 1
 
< 0.1%
11170-18868 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-05-18T12:32:38.786581image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 40466
35.8%
0 19263
17.0%
- 10000
 
8.8%
2 9737
 
8.6%
7 6354
 
5.6%
4 5655
 
5.0%
5 5489
 
4.9%
6 4146
 
3.7%
8 4130
 
3.7%
3 4106
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 103030
91.2%
Dash Punctuation 10000
 
8.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 40466
39.3%
0 19263
18.7%
2 9737
 
9.5%
7 6354
 
6.2%
4 5655
 
5.5%
5 5489
 
5.3%
6 4146
 
4.0%
8 4130
 
4.0%
3 4106
 
4.0%
9 3684
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 113030
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 40466
35.8%
0 19263
17.0%
- 10000
 
8.8%
2 9737
 
8.6%
7 6354
 
5.6%
4 5655
 
5.0%
5 5489
 
4.9%
6 4146
 
3.7%
8 4130
 
3.7%
3 4106
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 113030
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 40466
35.8%
0 19263
17.0%
- 10000
 
8.8%
2 9737
 
8.6%
7 6354
 
5.6%
4 5655
 
5.0%
5 5489
 
4.9%
6 4146
 
3.7%
8 4130
 
3.7%
3 4106
 
3.6%
Distinct9989
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-05-18T12:32:39.803385image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length11
Mean length11.3016
Min length8

Characters and Unicode

Total characters113016
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9978 ?
Unique (%)99.8%

Sample

1st row11200-12511
2nd row11110-21679
3rd row11140-18519
4th row11200-19434
5th row11170-14538
ValueCountFrequency (%)
11170-10765 2
 
< 0.1%
11110-15981 2
 
< 0.1%
11110-20482 2
 
< 0.1%
11110-18913 2
 
< 0.1%
11200-4355 2
 
< 0.1%
11200-100199933 2
 
< 0.1%
11170-2938 2
 
< 0.1%
11110-6099 2
 
< 0.1%
11215-100204653 2
 
< 0.1%
11200-100200218 2
 
< 0.1%
Other values (9979) 9980
99.8%
2024-05-18T12:32:41.887506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 40295
35.7%
0 17928
15.9%
2 10660
 
9.4%
- 10000
 
8.8%
7 6675
 
5.9%
4 5716
 
5.1%
5 5436
 
4.8%
9 4238
 
3.7%
3 4136
 
3.7%
8 4014
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 103016
91.2%
Dash Punctuation 10000
 
8.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 40295
39.1%
0 17928
17.4%
2 10660
 
10.3%
7 6675
 
6.5%
4 5716
 
5.5%
5 5436
 
5.3%
9 4238
 
4.1%
3 4136
 
4.0%
8 4014
 
3.9%
6 3918
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 113016
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 40295
35.7%
0 17928
15.9%
2 10660
 
9.4%
- 10000
 
8.8%
7 6675
 
5.9%
4 5716
 
5.1%
5 5436
 
4.8%
9 4238
 
3.7%
3 4136
 
3.7%
8 4014
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 113016
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 40295
35.7%
0 17928
15.9%
2 10660
 
9.4%
- 10000
 
8.8%
7 6675
 
5.9%
4 5716
 
5.1%
5 5436
 
4.8%
9 4238
 
3.7%
3 4136
 
3.7%
8 4014
 
3.6%

대표_여부
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
1
9903 
0
 
91
<NA>
 
6

Length

Max length4
Median length1
Mean length1.0018
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 9903
99.0%
0 91
 
0.9%
<NA> 6
 
0.1%

Length

2024-05-18T12:32:42.315893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-18T12:32:42.681408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1 9903
99.0%
0 91
 
0.9%
na 6
 
0.1%

형식_코드
Categorical

IMBALANCE 

Distinct28
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
<NA>
4429 
부패탱크방법
4230 
살수여상방법(정화조)
 
326
접촉폭기방법
 
267
기타오수처리시설
 
253
Other values (23)
495 

Length

Max length11
Median length9
Mean length5.3038
Min length3

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st row부패탱크방법
2nd row부패탱크방법
3rd row접촉폭기방법
4th row부패탱크방법
5th row부패탱크방법

Common Values

ValueCountFrequency (%)
<NA> 4429
44.3%
부패탱크방법 4230
42.3%
살수여상방법(정화조) 326
 
3.3%
접촉폭기방법 267
 
2.7%
기타오수처리시설 253
 
2.5%
임호프탱크방법 147
 
1.5%
임호프방식 107
 
1.1%
201 85
 
0.9%
기타단독정화조 41
 
0.4%
살수형부패탱크방법 22
 
0.2%
Other values (18) 93
 
0.9%

Length

2024-05-18T12:32:43.162289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
na 4429
44.3%
부패탱크방법 4230
42.3%
살수여상방법(정화조 326
 
3.3%
접촉폭기방법 267
 
2.7%
기타오수처리시설 253
 
2.5%
임호프탱크방법 147
 
1.5%
임호프방식 107
 
1.1%
201 85
 
0.9%
기타단독정화조 41
 
0.4%
살수형부패탱크방법 22
 
0.2%
Other values (18) 93
 
0.9%

기타_형식
Text

MISSING 

Distinct498
Distinct (%)9.2%
Missing4603
Missing (%)46.0%
Memory size156.2 KiB
2024-05-18T12:32:44.100193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length6
Mean length6.4956457
Min length2

Characters and Unicode

Total characters35057
Distinct characters163
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique317 ?
Unique (%)5.9%

Sample

1st row철근콘크리트각형
2nd row부패탱크방법
3rd row접촉폭기식
4th rowF.R.P
5th row콘크리트 부패탱크식
ValueCountFrequency (%)
부패탱크방법 2019
32.8%
콘크리트각형 531
 
8.6%
콘크리트 372
 
6.0%
부패탱크식 363
 
5.9%
살수여과상식 276
 
4.5%
에프.알.피 244
 
4.0%
f.r.p 172
 
2.8%
frp 160
 
2.6%
임호프탱크방법 155
 
2.5%
pe 128
 
2.1%
Other values (395) 1731
28.1%
2024-05-18T12:32:45.677339image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3996
 
11.4%
3006
 
8.6%
2892
 
8.2%
2867
 
8.2%
2453
 
7.0%
2402
 
6.9%
1161
 
3.3%
. 1112
 
3.2%
1013
 
2.9%
999
 
2.8%
Other values (153) 13156
37.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 30243
86.3%
Uppercase Letter 2246
 
6.4%
Other Punctuation 1212
 
3.5%
Space Separator 757
 
2.2%
Close Punctuation 165
 
0.5%
Open Punctuation 164
 
0.5%
Decimal Number 158
 
0.5%
Dash Punctuation 57
 
0.2%
Lowercase Letter 45
 
0.1%
Math Symbol 6
 
< 0.1%
Other values (2) 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3996
13.2%
3006
 
9.9%
2892
 
9.6%
2867
 
9.5%
2453
 
8.1%
2402
 
7.9%
1161
 
3.8%
1013
 
3.3%
999
 
3.3%
992
 
3.3%
Other values (103) 8462
28.0%
Uppercase Letter
ValueCountFrequency (%)
P 756
33.7%
R 516
23.0%
F 514
22.9%
E 243
 
10.8%
C 148
 
6.6%
O 31
 
1.4%
N 29
 
1.3%
V 4
 
0.2%
B 2
 
0.1%
I 1
 
< 0.1%
Other values (2) 2
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
c 12
26.7%
p 8
17.8%
r 7
15.6%
f 5
11.1%
o 3
 
6.7%
e 2
 
4.4%
n 2
 
4.4%
v 2
 
4.4%
b 1
 
2.2%
i 1
 
2.2%
Other values (2) 2
 
4.4%
Decimal Number
ValueCountFrequency (%)
3 92
58.2%
5 25
 
15.8%
1 15
 
9.5%
0 13
 
8.2%
2 6
 
3.8%
4 3
 
1.9%
6 2
 
1.3%
8 1
 
0.6%
7 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
. 1112
91.7%
, 73
 
6.0%
' 10
 
0.8%
/ 9
 
0.7%
: 8
 
0.7%
Math Symbol
ValueCountFrequency (%)
+ 4
66.7%
< 1
 
16.7%
> 1
 
16.7%
Open Punctuation
ValueCountFrequency (%)
( 163
99.4%
[ 1
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 162
98.2%
] 3
 
1.8%
Other Symbol
ValueCountFrequency (%)
2
66.7%
1
33.3%
Space Separator
ValueCountFrequency (%)
757
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 57
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 30243
86.3%
Common 2523
 
7.2%
Latin 2291
 
6.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3996
13.2%
3006
 
9.9%
2892
 
9.6%
2867
 
9.5%
2453
 
8.1%
2402
 
7.9%
1161
 
3.8%
1013
 
3.3%
999
 
3.3%
992
 
3.3%
Other values (103) 8462
28.0%
Common
ValueCountFrequency (%)
. 1112
44.1%
757
30.0%
( 163
 
6.5%
) 162
 
6.4%
3 92
 
3.6%
, 73
 
2.9%
- 57
 
2.3%
5 25
 
1.0%
1 15
 
0.6%
0 13
 
0.5%
Other values (16) 54
 
2.1%
Latin
ValueCountFrequency (%)
P 756
33.0%
R 516
22.5%
F 514
22.4%
E 243
 
10.6%
C 148
 
6.5%
O 31
 
1.4%
N 29
 
1.3%
c 12
 
0.5%
p 8
 
0.3%
r 7
 
0.3%
Other values (14) 27
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
Hangul 30243
86.3%
ASCII 4811
 
13.7%
CJK Compat 3
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3996
13.2%
3006
 
9.9%
2892
 
9.6%
2867
 
9.5%
2453
 
8.1%
2402
 
7.9%
1161
 
3.8%
1013
 
3.3%
999
 
3.3%
992
 
3.3%
Other values (103) 8462
28.0%
ASCII
ValueCountFrequency (%)
. 1112
23.1%
757
15.7%
P 756
15.7%
R 516
10.7%
F 514
10.7%
E 243
 
5.1%
( 163
 
3.4%
) 162
 
3.4%
C 148
 
3.1%
3 92
 
1.9%
Other values (38) 348
 
7.2%
CJK Compat
ValueCountFrequency (%)
2
66.7%
1
33.3%

용량_인용
Real number (ℝ)

SKEWED  ZEROS 

Distinct183
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.092514
Minimum0
Maximum51200
Zeros4846
Zeros (%)48.5%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T12:32:46.225366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median5
Q330
95-th percentile150
Maximum51200
Range51200
Interquartile range (IQR)30

Descriptive statistics

Standard deviation626.26218
Coefficient of variation (CV)10.085953
Kurtosis4503.1528
Mean62.092514
Median Absolute Deviation (MAD)5
Skewness58.232168
Sum620925.14
Variance392204.32
MonotonicityNot monotonic
2024-05-18T12:32:46.775056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 4846
48.5%
10.0 663
 
6.6%
20.0 600
 
6.0%
30.0 575
 
5.8%
15.0 508
 
5.1%
40.0 419
 
4.2%
50.0 349
 
3.5%
25.0 313
 
3.1%
60.0 175
 
1.8%
5.0 166
 
1.7%
Other values (173) 1386
 
13.9%
ValueCountFrequency (%)
0.0 4846
48.5%
1.0 1
 
< 0.1%
1.8 1
 
< 0.1%
2.0 4
 
< 0.1%
2.2 1
 
< 0.1%
2.5 2
 
< 0.1%
2.6 1
 
< 0.1%
3.0 20
 
0.2%
3.1 1
 
< 0.1%
4.0 4
 
< 0.1%
ValueCountFrequency (%)
51200.0 1
< 0.1%
14000.0 1
< 0.1%
10200.0 1
< 0.1%
9400.0 1
< 0.1%
9180.0 2
< 0.1%
9000.0 1
< 0.1%
7300.0 1
< 0.1%
6300.0 1
< 0.1%
6100.0 1
< 0.1%
5800.0 1
< 0.1%

용량_루베
Real number (ℝ)

SKEWED  ZEROS 

Distinct174
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6575482
Minimum0
Maximum24500
Zeros9572
Zeros (%)95.7%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T12:32:47.636405image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum24500
Range24500
Interquartile range (IQR)0

Descriptive statistics

Standard deviation272.43292
Coefficient of variation (CV)58.492776
Kurtosis6841.611
Mean4.6575482
Median Absolute Deviation (MAD)0
Skewness80.131162
Sum46575.482
Variance74219.698
MonotonicityNot monotonic
2024-05-18T12:32:48.375276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 9572
95.7%
1.55 25
 
0.2%
2.05 23
 
0.2%
3.0 22
 
0.2%
4.0 19
 
0.2%
2.55 18
 
0.2%
2.0 18
 
0.2%
1.25 16
 
0.2%
3.05 10
 
0.1%
6.0 10
 
0.1%
Other values (164) 267
 
2.7%
ValueCountFrequency (%)
0.0 9572
95.7%
0.39 1
 
< 0.1%
0.75 6
 
0.1%
1.05 1
 
< 0.1%
1.25 16
 
0.2%
1.35 1
 
< 0.1%
1.5 3
 
< 0.1%
1.526 2
 
< 0.1%
1.53 1
 
< 0.1%
1.55 25
 
0.2%
ValueCountFrequency (%)
24500.0 1
< 0.1%
11363.0 1
< 0.1%
3200.0 1
< 0.1%
950.0 1
< 0.1%
731.0 2
< 0.1%
450.0 1
< 0.1%
389.0 1
< 0.1%
350.0 1
< 0.1%
281.0 1
< 0.1%
230.0 1
< 0.1%

Interactions

2024-05-18T12:32:34.739238image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T12:32:34.043049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T12:32:35.012749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T12:32:34.374255image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T12:32:48.859105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
대표_여부형식_코드용량_인용용량_루베
대표_여부1.0000.1030.0000.000
형식_코드0.1031.0000.0000.000
용량_인용0.0000.0001.0000.000
용량_루베0.0000.0000.0001.000
2024-05-18T12:32:49.271611image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
형식_코드대표_여부
형식_코드1.0000.089
대표_여부0.0891.000
2024-05-18T12:32:49.631027image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
용량_인용용량_루베대표_여부형식_코드
용량_인용1.0000.1500.0000.000
용량_루베0.1501.0000.0000.000
대표_여부0.0000.0001.0000.089
형식_코드0.0000.0000.0891.000

Missing values

2024-05-18T12:32:35.499073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T12:32:36.033430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

관리_오수정화시설_pk관리_건축물대장_pk대표_여부형식_코드기타_형식용량_인용용량_루베
7131411200-1093711200-125111부패탱크방법철근콘크리트각형80.00.0
1256711110-2057011110-216791부패탱크방법부패탱크방법50.00.0
3295611140-1690211140-185191접촉폭기방법접촉폭기식5.00.0
7538511200-1773911200-194341부패탱크방법F.R.P20.00.0
4939611170-1411211170-145381부패탱크방법콘크리트 부패탱크식50.00.0
868811110-165311110-27071<NA><NA>0.00.0
1746611110-2567311110-267901부패탱크방법<NA>40.00.0
5482811170-1975511170-201931<NA><NA>0.00.0
3851811140-2477511140-1750부패탱크방법<NA>0.00.0
4626511170-108111170-14821부패탱크방법FRP 부패탱크식10.00.0
관리_오수정화시설_pk관리_건축물대장_pk대표_여부형식_코드기타_형식용량_인용용량_루베
8331211200-490111200-62181부패탱크방법<NA>0.00.0
663611110-1439811110-154891<NA><NA>0.00.0
5752911170-22611170-4291<NA><NA>0.00.0
2201711110-435111110-54361부패탱크방법<NA>15.00.0
5651411170-215011170-25541<NA><NA>0.00.0
3702511140-2152911140-232111부패탱크방법부패탱크방법12.00.0
4754611170-1215511170-125781부패탱크방법PE부패탱크식3.01.25
1401911110-2207111110-231841<NA><NA>0.00.0
1212011110-2010411110-212061<NA><NA>0.00.0
8769911215-10000916911215-1001847671부패탱크방법부패탱크방법30.00.0