Overview

Dataset statistics

Number of variables5
Number of observations54
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.2 KiB
Average record size in memory42.4 B

Variable types

DateTime1
Categorical3
Text1

Alerts

측정일 has constant value ""Constant
측정지점 is highly overall correlated with 시설명 and 1 other fieldsHigh correlation
시설명 is highly overall correlated with 측정지점 and 1 other fieldsHigh correlation
수치 is highly overall correlated with 시설명 and 1 other fieldsHigh correlation
측정지점 is highly imbalanced (86.7%)Imbalance

Reproduction

Analysis started2023-12-10 10:18:57.494107
Analysis finished2023-12-10 10:18:58.067814
Duration0.57 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

측정일
Date

CONSTANT 

Distinct1
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size564.0 B
Minimum2020-10-05 00:00:00
Maximum2020-10-05 00:00:00
2023-12-10T19:18:58.137241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T19:18:58.276071image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=1)

시설명
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size564.0 B
곤명정수장
35 
금당정수장
18 
고령정수장
 
1

Length

Max length5
Median length5
Mean length5
Min length5

Unique

Unique1 ?
Unique (%)1.9%

Sample

1st row금당정수장
2nd row금당정수장
3rd row곤명정수장
4th row금당정수장
5th row금당정수장

Common Values

ValueCountFrequency (%)
곤명정수장 35
64.8%
금당정수장 18
33.3%
고령정수장 1
 
1.9%

Length

2023-12-10T19:18:58.424702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:18:58.569749image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
곤명정수장 35
64.8%
금당정수장 18
33.3%
고령정수장 1
 
1.9%

측정지점
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size564.0 B
정수지
53 
침전지
 
1

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique1 ?
Unique (%)1.9%

Sample

1st row정수지
2nd row정수지
3rd row정수지
4th row정수지
5th row정수지

Common Values

ValueCountFrequency (%)
정수지 53
98.1%
침전지 1
 
1.9%

Length

2023-12-10T19:18:58.727658image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T19:18:58.870991image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정수지 53
98.1%
침전지 1
 
1.9%
Distinct43
Distinct (%)79.6%
Missing0
Missing (%)0.0%
Memory size564.0 B
2023-12-10T19:18:59.173285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length10
Mean length4.6666667
Min length1

Characters and Unicode

Total characters252
Distinct characters103
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique32 ?
Unique (%)59.3%

Sample

1st row브롬산염
2nd row셀레늄
3rd row파라티온
4th row브로모디클로로메탄
5th row세제(음이온계면활성제)
ValueCountFrequency (%)
사염화탄소 2
 
3.7%
과망간산칼륨소비량 2
 
3.7%
색도 2
 
3.7%
셀레늄 2
 
3.7%
경도 2
 
3.7%
대장균 2
 
3.7%
벤젠 2
 
3.7%
ph 2
 
3.7%
브로모디클로로메탄 2
 
3.7%
세제(음이온계면활성제 2
 
3.7%
Other values (33) 34
63.0%
2023-12-10T19:18:59.773735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
20
 
7.9%
8
 
3.2%
8
 
3.2%
6
 
2.4%
6
 
2.4%
6
 
2.4%
6
 
2.4%
5
 
2.0%
5
 
2.0%
5
 
2.0%
Other values (93) 177
70.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 231
91.7%
Lowercase Letter 5
 
2.0%
Decimal Number 4
 
1.6%
Dash Punctuation 3
 
1.2%
Uppercase Letter 3
 
1.2%
Other Punctuation 2
 
0.8%
Open Punctuation 2
 
0.8%
Close Punctuation 2
 
0.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
20
 
8.7%
8
 
3.5%
8
 
3.5%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
5
 
2.2%
Other values (81) 156
67.5%
Lowercase Letter
ValueCountFrequency (%)
p 2
40.0%
h 1
20.0%
l 1
20.0%
a 1
20.0%
Decimal Number
ValueCountFrequency (%)
1 3
75.0%
4 1
 
25.0%
Uppercase Letter
ValueCountFrequency (%)
H 2
66.7%
C 1
33.3%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 231
91.7%
Common 13
 
5.2%
Latin 8
 
3.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
20
 
8.7%
8
 
3.5%
8
 
3.5%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
5
 
2.2%
Other values (81) 156
67.5%
Common
ValueCountFrequency (%)
- 3
23.1%
1 3
23.1%
. 2
15.4%
( 2
15.4%
) 2
15.4%
4 1
 
7.7%
Latin
ValueCountFrequency (%)
p 2
25.0%
H 2
25.0%
C 1
12.5%
h 1
12.5%
l 1
12.5%
a 1
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 231
91.7%
ASCII 21
 
8.3%

Most frequent character per block

Hangul
ValueCountFrequency (%)
20
 
8.7%
8
 
3.5%
8
 
3.5%
6
 
2.6%
6
 
2.6%
6
 
2.6%
6
 
2.6%
5
 
2.2%
5
 
2.2%
5
 
2.2%
Other values (81) 156
67.5%
ASCII
ValueCountFrequency (%)
- 3
14.3%
1 3
14.3%
. 2
9.5%
( 2
9.5%
) 2
9.5%
p 2
9.5%
H 2
9.5%
C 1
 
4.8%
h 1
 
4.8%
l 1
 
4.8%
Other values (2) 2
9.5%

수치
Categorical

HIGH CORRELATION 

Distinct20
Distinct (%)37.0%
Missing0
Missing (%)0.0%
Memory size564.0 B
불검출
35 
0.0016
 
1
0.007
 
1
1.0
 
1
1.9
 
1
Other values (15)
15 

Length

Max length6
Median length3
Mean length3.2037037
Min length1

Unique

Unique19 ?
Unique (%)35.2%

Sample

1st row0.0016
2nd row불검출
3rd row불검출
4th row0.007
5th row불검출

Common Values

ValueCountFrequency (%)
불검출 35
64.8%
0.0016 1
 
1.9%
0.007 1
 
1.9%
1.0 1
 
1.9%
1.9 1
 
1.9%
96 1
 
1.9%
4 1
 
1.9%
7 1
 
1.9%
0.003 1
 
1.9%
0.006 1
 
1.9%
Other values (10) 10
 
18.5%

Length

2023-12-10T19:18:59.971855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
불검출 35
64.8%
0.0016 1
 
1.9%
0.7 1
 
1.9%
0.60 1
 
1.9%
7.4 1
 
1.9%
39 1
 
1.9%
0.0023 1
 
1.9%
6.7 1
 
1.9%
0.012 1
 
1.9%
0.010 1
 
1.9%
Other values (10) 10
 
18.5%

Correlations

2023-12-10T19:19:00.114216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설명측정지점검사항목수치
시설명1.0001.0000.0000.856
측정지점1.0001.0001.0001.000
검사항목0.0001.0001.0000.678
수치0.8561.0000.6781.000
2023-12-10T19:19:00.254120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정지점수치시설명
측정지점1.0000.8090.990
수치0.8091.0000.567
시설명0.9900.5671.000
2023-12-10T19:19:00.390054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설명측정지점수치
시설명1.0000.9900.567
측정지점0.9901.0000.809
수치0.5670.8091.000

Missing values

2023-12-10T19:18:57.886201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T19:18:58.013160image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

측정일시설명측정지점검사항목수치
02020-10-05금당정수장정수지브롬산염0.0016
12020-10-05금당정수장정수지셀레늄불검출
22020-10-05곤명정수장정수지파라티온불검출
32020-10-05금당정수장정수지브로모디클로로메탄0.007
42020-10-05금당정수장정수지세제(음이온계면활성제)불검출
52020-10-05곤명정수장정수지세제(음이온계면활성제)불검출
62020-10-05곤명정수장정수지질산성질소1.0
72020-10-05곤명정수장정수지트리클로로에틸렌불검출
82020-10-05금당정수장정수지과망간산칼륨소비량1.9
92020-10-05금당정수장정수지불소불검출
측정일시설명측정지점검사항목수치
442020-10-05금당정수장정수지디클로로아세토니트릴0.0023
452020-10-05금당정수장정수지벤젠불검출
462020-10-05곤명정수장정수지경도39
472020-10-05곤명정수장정수지셀레늄불검출
482020-10-05곤명정수장정수지pH7.4
492020-10-05곤명정수장정수지1.4-다이옥산불검출
502020-10-05고령정수장침전지Chl-a0.60
512020-10-05곤명정수장정수지과망간산칼륨소비량0.7
522020-10-05곤명정수장정수지불검출
532020-10-05곤명정수장정수지냄새없음