Overview

Dataset statistics

Number of variables5
Number of observations519
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory21.4 KiB
Average record size in memory42.3 B

Variable types

Numeric2
Text2
Boolean1

Dataset

Description상수도사업본부 수질검사 항목 목록에 관한 데이터로 검사항목명, 항목정의, 항목발생원, 중요오염물질여부 등의 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15118749/fileData.do

Alerts

사용여부 is highly imbalanced (86.3%)Imbalance
검사항목일련번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 02:09:46.786960
Analysis finished2023-12-12 02:09:47.758118
Duration0.97 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

검사항목일련번호
Real number (ℝ)

UNIQUE 

Distinct519
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean278.5896
Minimum1
Maximum550
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.7 KiB
2023-12-12T11:09:48.145772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile32.8
Q1140.5
median280
Q3414.5
95-th percentile523.2
Maximum550
Range549
Interquartile range (IQR)274

Descriptive statistics

Standard deviation158.05894
Coefficient of variation (CV)0.56735407
Kurtosis-1.1952264
Mean278.5896
Median Absolute Deviation (MAD)137
Skewness-0.011380216
Sum144588
Variance24982.629
MonotonicityNot monotonic
2023-12-12T11:09:48.307000image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
201 1
 
0.2%
398 1
 
0.2%
106 1
 
0.2%
105 1
 
0.2%
412 1
 
0.2%
451 1
 
0.2%
104 1
 
0.2%
385 1
 
0.2%
332 1
 
0.2%
103 1
 
0.2%
Other values (509) 509
98.1%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
4 1
0.2%
6 1
0.2%
9 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
ValueCountFrequency (%)
550 1
0.2%
549 1
0.2%
548 1
0.2%
547 1
0.2%
546 1
0.2%
545 1
0.2%
544 1
0.2%
543 1
0.2%
542 1
0.2%
541 1
0.2%
Distinct508
Distinct (%)97.9%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
2023-12-12T11:09:48.618075image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length17
Mean length6.5491329
Min length1

Characters and Unicode

Total characters3399
Distinct characters336
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks5 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique499 ?
Unique (%)96.1%

Sample

1st row1,1-디클로로에탄
2nd row1,1-디클로로프로펜
3rd row1,1,1-트리클로로아세톤
4th row1,1,1,2-테트라클로로에탄
5th row1,1,2-트리클로로에탄
ValueCountFrequency (%)
pacsⅱ 3
 
0.6%
불활성화비 3
 
0.6%
지아디아 3
 
0.6%
대장균 3
 
0.6%
탁도 3
 
0.6%
아조벤젠 2
 
0.4%
잔류염소 2
 
0.4%
과산화수소 2
 
0.4%
브로모클로로아이요드메탄 2
 
0.4%
크립토스포리디움 2
 
0.4%
Other values (502) 505
95.3%
2023-12-12T11:09:49.067516image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
217
 
6.4%
- 100
 
2.9%
93
 
2.7%
81
 
2.4%
74
 
2.2%
70
 
2.1%
68
 
2.0%
) 62
 
1.8%
( 62
 
1.8%
1 60
 
1.8%
Other values (326) 2512
73.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2498
73.5%
Uppercase Letter 226
 
6.6%
Lowercase Letter 203
 
6.0%
Decimal Number 156
 
4.6%
Dash Punctuation 100
 
2.9%
Other Punctuation 73
 
2.1%
Close Punctuation 62
 
1.8%
Open Punctuation 62
 
1.8%
Space Separator 13
 
0.4%
Letter Number 3
 
0.1%
Other values (2) 3
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
217
 
8.7%
93
 
3.7%
81
 
3.2%
74
 
3.0%
70
 
2.8%
68
 
2.7%
56
 
2.2%
48
 
1.9%
46
 
1.8%
39
 
1.6%
Other values (263) 1706
68.3%
Lowercase Letter
ValueCountFrequency (%)
n 34
16.7%
e 22
10.8%
a 22
10.8%
c 16
 
7.9%
r 13
 
6.4%
o 12
 
5.9%
h 10
 
4.9%
l 9
 
4.4%
i 9
 
4.4%
s 8
 
3.9%
Other values (13) 48
23.6%
Uppercase Letter
ValueCountFrequency (%)
C 32
14.2%
A 31
13.7%
P 24
10.6%
S 23
10.2%
B 20
8.8%
T 17
7.5%
D 15
6.6%
N 12
 
5.3%
M 11
 
4.9%
I 7
 
3.1%
Other values (10) 34
15.0%
Decimal Number
ValueCountFrequency (%)
1 60
38.5%
2 39
25.0%
4 26
16.7%
3 17
 
10.9%
6 7
 
4.5%
0 4
 
2.6%
5 2
 
1.3%
7 1
 
0.6%
Other Punctuation
ValueCountFrequency (%)
, 45
61.6%
. 26
35.6%
/ 1
 
1.4%
% 1
 
1.4%
Other Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 100
100.0%
Close Punctuation
ValueCountFrequency (%)
) 62
100.0%
Open Punctuation
ValueCountFrequency (%)
( 62
100.0%
Space Separator
ValueCountFrequency (%)
13
100.0%
Letter Number
ValueCountFrequency (%)
3
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2498
73.5%
Common 470
 
13.8%
Latin 431
 
12.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
217
 
8.7%
93
 
3.7%
81
 
3.2%
74
 
3.0%
70
 
2.8%
68
 
2.7%
56
 
2.2%
48
 
1.9%
46
 
1.8%
39
 
1.6%
Other values (263) 1706
68.3%
Latin
ValueCountFrequency (%)
n 34
 
7.9%
C 32
 
7.4%
A 31
 
7.2%
P 24
 
5.6%
S 23
 
5.3%
e 22
 
5.1%
a 22
 
5.1%
B 20
 
4.6%
T 17
 
3.9%
c 16
 
3.7%
Other values (33) 190
44.1%
Common
ValueCountFrequency (%)
- 100
21.3%
) 62
13.2%
( 62
13.2%
1 60
12.8%
, 45
9.6%
2 39
 
8.3%
. 26
 
5.5%
4 26
 
5.5%
3 17
 
3.6%
13
 
2.8%
Other values (10) 20
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2498
73.5%
ASCII 895
 
26.3%
Number Forms 3
 
0.1%
None 2
 
0.1%
Letterlike Symbols 1
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
217
 
8.7%
93
 
3.7%
81
 
3.2%
74
 
3.0%
70
 
2.8%
68
 
2.7%
56
 
2.2%
48
 
1.9%
46
 
1.8%
39
 
1.6%
Other values (263) 1706
68.3%
ASCII
ValueCountFrequency (%)
- 100
 
11.2%
) 62
 
6.9%
( 62
 
6.9%
1 60
 
6.7%
, 45
 
5.0%
2 39
 
4.4%
n 34
 
3.8%
C 32
 
3.6%
A 31
 
3.5%
. 26
 
2.9%
Other values (49) 404
45.1%
Number Forms
ValueCountFrequency (%)
3
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1
100.0%
None
ValueCountFrequency (%)
1
50.0%
1
50.0%

사용여부
Boolean

IMBALANCE 

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size651.0 B
True
509 
False
 
10
ValueCountFrequency (%)
True 509
98.1%
False 10
 
1.9%
2023-12-12T11:09:49.220106image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

수정자일련번호
Real number (ℝ)

Distinct13
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1015.6474
Minimum0
Maximum9627
Zeros1
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size4.7 KiB
2023-12-12T11:09:49.313487image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q31
95-th percentile9332
Maximum9627
Range9627
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2600.7568
Coefficient of variation (CV)2.5606887
Kurtosis5.6751341
Mean1015.6474
Median Absolute Deviation (MAD)0
Skewness2.6468953
Sum527121
Variance6763935.8
MonotonicityNot monotonic
2023-12-12T11:09:49.438874image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
1 426
82.1%
2510 32
 
6.2%
9627 20
 
3.9%
9332 16
 
3.1%
4518 11
 
2.1%
4534 4
 
0.8%
9294 3
 
0.6%
748 2
 
0.4%
716 1
 
0.2%
0 1
 
0.2%
Other values (3) 3
 
0.6%
ValueCountFrequency (%)
0 1
 
0.2%
1 426
82.1%
548 1
 
0.2%
716 1
 
0.2%
748 2
 
0.4%
1539 1
 
0.2%
2510 32
 
6.2%
4508 1
 
0.2%
4518 11
 
2.1%
4534 4
 
0.8%
ValueCountFrequency (%)
9627 20
3.9%
9332 16
3.1%
9294 3
 
0.6%
4534 4
 
0.8%
4518 11
 
2.1%
4508 1
 
0.2%
2510 32
6.2%
1539 1
 
0.2%
748 2
 
0.4%
716 1
 
0.2%
Distinct144
Distinct (%)27.7%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
2023-12-12T11:09:49.811372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length21
Mean length21.198459
Min length21

Characters and Unicode

Total characters11002
Distinct characters16
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131 ?
Unique (%)25.2%

Sample

1st row2011-07-20 오후 4:06:59
2nd row2011-05-30 오후 5:44:11
3rd row2011-07-20 오후 4:06:59
4th row2011-07-20 오후 4:06:59
5th row2011-07-20 오후 4:06:59
ValueCountFrequency (%)
오후 415
26.7%
2011-07-20 178
11.4%
4:06:59 177
11.4%
2011-05-30 136
 
8.7%
5:44:11 136
 
8.7%
오전 104
 
6.7%
2011-06-23 23
 
1.5%
10:48:28 23
 
1.5%
2018-02-13 21
 
1.3%
2011-08-04 16
 
1.0%
Other values (184) 328
21.1%
2023-12-12T11:09:50.361859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1716
15.6%
1 1604
14.6%
- 1038
9.4%
1038
9.4%
: 1038
9.4%
2 1016
9.2%
4 603
 
5.5%
5 590
 
5.4%
519
 
4.7%
415
 
3.8%
Other values (6) 1425
13.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 6850
62.3%
Dash Punctuation 1038
 
9.4%
Space Separator 1038
 
9.4%
Other Punctuation 1038
 
9.4%
Other Letter 1038
 
9.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1716
25.1%
1 1604
23.4%
2 1016
14.8%
4 603
 
8.8%
5 590
 
8.6%
3 334
 
4.9%
9 309
 
4.5%
6 275
 
4.0%
7 244
 
3.6%
8 159
 
2.3%
Other Letter
ValueCountFrequency (%)
519
50.0%
415
40.0%
104
 
10.0%
Dash Punctuation
ValueCountFrequency (%)
- 1038
100.0%
Space Separator
ValueCountFrequency (%)
1038
100.0%
Other Punctuation
ValueCountFrequency (%)
: 1038
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 9964
90.6%
Hangul 1038
 
9.4%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1716
17.2%
1 1604
16.1%
- 1038
10.4%
1038
10.4%
: 1038
10.4%
2 1016
10.2%
4 603
 
6.1%
5 590
 
5.9%
3 334
 
3.4%
9 309
 
3.1%
Other values (3) 678
 
6.8%
Hangul
ValueCountFrequency (%)
519
50.0%
415
40.0%
104
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 9964
90.6%
Hangul 1038
 
9.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1716
17.2%
1 1604
16.1%
- 1038
10.4%
1038
10.4%
: 1038
10.4%
2 1016
10.2%
4 603
 
6.1%
5 590
 
5.9%
3 334
 
3.4%
9 309
 
3.1%
Other values (3) 678
 
6.8%
Hangul
ValueCountFrequency (%)
519
50.0%
415
40.0%
104
 
10.0%

Interactions

2023-12-12T11:09:47.342105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:09:47.104301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:09:47.460168image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T11:09:47.226185image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T11:09:50.479301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검사항목일련번호사용여부수정자일련번호
검사항목일련번호1.0000.4510.774
사용여부0.4511.0000.228
수정자일련번호0.7740.2281.000
2023-12-12T11:09:50.590225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
검사항목일련번호수정자일련번호사용여부
검사항목일련번호1.0000.4910.344
수정자일련번호0.4911.0000.277
사용여부0.3440.2771.000

Missing values

2023-12-12T11:09:47.596700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T11:09:47.695162image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

검사항목일련번호검사항목명사용여부수정자일련번호수정시간
02011,1-디클로로에탄Y12011-07-20 오후 4:06:59
13151,1-디클로로프로펜Y12011-05-30 오후 5:44:11
22061,1,1-트리클로로아세톤Y12011-07-20 오후 4:06:59
32081,1,1,2-테트라클로로에탄Y12011-07-20 오후 4:06:59
42071,1,2-트리클로로에탄Y12011-07-20 오후 4:06:59
53131,1,2,2-테트라클로로에탄Y12011-05-30 오후 5:44:11
63191,2-디클로로프로판Y12011-05-30 오후 5:44:11
72691,2,3-트리클로로벤젠Y12011-05-30 오후 5:44:11
83201,2,3-트리클로로프로판Y12011-05-30 오후 5:44:11
93141,2,4-트리메틸벤젠Y12011-05-30 오후 5:44:11
검사항목일련번호검사항목명사용여부수정자일련번호수정시간
509541MCPAY93322022-03-06 오후 2:41:10
510542아세나프텐Y93322022-05-10 오후 5:25:49
511543미세플라스틱Y96272023-01-02 오후 1:58:11
512544DCAcAmY96272023-01-02 오후 2:29:46
513545BCAcAmY96272023-01-02 오후 2:30:23
514546TCAcAmY96272023-01-02 오후 2:30:50
515547DBAcAmY96272023-01-02 오후 2:31:11
516548BDCAcAmY96272023-01-02 오후 2:31:35
517549DBCAcAmY96272023-01-02 오후 2:31:55
518550TBAcAmY96272023-01-02 오후 2:32:15