Overview

Dataset statistics

Number of variables7
Number of observations300
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory16.8 KiB
Average record size in memory57.4 B

Variable types

Numeric1
Text1
Categorical5

Dataset

Description한국수자원공사에서 정수장 원·정수에 대해 조사 중인 300항목 현황입니다. 제공정보 - 번호, 항목, 단위, 물질구분, 대상시료, 검사주기, 구분 등
URLhttps://www.data.go.kr/data/15065464/fileData.do

Alerts

번호 is highly overall correlated with 물질구분 and 3 other fieldsHigh correlation
물질구분 is highly overall correlated with 번호 and 1 other fieldsHigh correlation
대상시료 is highly overall correlated with 번호 and 3 other fieldsHigh correlation
검사주기 is highly overall correlated with 번호 and 2 other fieldsHigh correlation
구분 is highly overall correlated with 번호 and 2 other fieldsHigh correlation
단위 is highly imbalanced (75.2%)Imbalance
번호 has unique valuesUnique

Reproduction

Analysis started2023-12-12 12:32:37.232073
Analysis finished2023-12-12 12:32:38.000400
Duration0.77 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

번호
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct300
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean150.5
Minimum1
Maximum300
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.8 KiB
2023-12-12T21:32:38.082526image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile15.95
Q175.75
median150.5
Q3225.25
95-th percentile285.05
Maximum300
Range299
Interquartile range (IQR)149.5

Descriptive statistics

Standard deviation86.746758
Coefficient of variation (CV)0.57639042
Kurtosis-1.2
Mean150.5
Median Absolute Deviation (MAD)75
Skewness0
Sum45150
Variance7525
MonotonicityStrictly increasing
2023-12-12T21:32:38.250980image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 1
 
0.3%
208 1
 
0.3%
206 1
 
0.3%
205 1
 
0.3%
204 1
 
0.3%
203 1
 
0.3%
202 1
 
0.3%
201 1
 
0.3%
200 1
 
0.3%
199 1
 
0.3%
Other values (290) 290
96.7%
ValueCountFrequency (%)
1 1
0.3%
2 1
0.3%
3 1
0.3%
4 1
0.3%
5 1
0.3%
6 1
0.3%
7 1
0.3%
8 1
0.3%
9 1
0.3%
10 1
0.3%
ValueCountFrequency (%)
300 1
0.3%
299 1
0.3%
298 1
0.3%
297 1
0.3%
296 1
0.3%
295 1
0.3%
294 1
0.3%
293 1
0.3%
292 1
0.3%
291 1
0.3%

항목
Text

Distinct299
Distinct (%)99.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2023-12-12T21:32:38.518396image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length43
Median length34
Mean length14.123333
Min length2

Characters and Unicode

Total characters4237
Distinct characters66
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique298 ?
Unique (%)99.3%

Sample

1st rowTotal Colony Counts(35℃)
2nd rowTotal coliforms
3rd rowFecal coliforms
4th rowEscherichia coli
5th rowLead
ValueCountFrequency (%)
acid 10
 
2.6%
total 5
 
1.3%
aminocarb 2
 
0.5%
fecal 2
 
0.5%
vinyl 2
 
0.5%
colony 2
 
0.5%
chloride 2
 
0.5%
sulfonate 2
 
0.5%
potassium 2
 
0.5%
sodium 2
 
0.5%
Other values (346) 353
91.9%
2023-12-12T21:32:39.042372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 427
 
10.1%
e 361
 
8.5%
i 312
 
7.4%
r 269
 
6.3%
n 265
 
6.3%
a 256
 
6.0%
l 238
 
5.6%
c 180
 
4.2%
t 179
 
4.2%
h 171
 
4.0%
Other values (56) 1579
37.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3401
80.3%
Uppercase Letter 423
 
10.0%
Decimal Number 100
 
2.4%
Space Separator 99
 
2.3%
Dash Punctuation 72
 
1.7%
Close Punctuation 46
 
1.1%
Open Punctuation 46
 
1.1%
Other Punctuation 45
 
1.1%
Other Symbol 2
 
< 0.1%
Letter Number 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 51
12.1%
D 50
11.8%
T 42
9.9%
P 40
9.5%
M 31
 
7.3%
A 28
 
6.6%
B 28
 
6.6%
N 27
 
6.4%
S 22
 
5.2%
F 21
 
5.0%
Other values (14) 83
19.6%
Lowercase Letter
ValueCountFrequency (%)
o 427
12.6%
e 361
10.6%
i 312
9.2%
r 269
 
7.9%
n 265
 
7.8%
a 256
 
7.5%
l 238
 
7.0%
c 180
 
5.3%
t 179
 
5.3%
h 171
 
5.0%
Other values (13) 743
21.8%
Decimal Number
ValueCountFrequency (%)
1 38
38.0%
2 28
28.0%
4 15
 
15.0%
3 10
 
10.0%
6 4
 
4.0%
5 3
 
3.0%
0 1
 
1.0%
7 1
 
1.0%
Close Punctuation
ValueCountFrequency (%)
) 45
97.8%
] 1
 
2.2%
Open Punctuation
ValueCountFrequency (%)
( 45
97.8%
[ 1
 
2.2%
Letter Number
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
99
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 72
100.0%
Other Punctuation
ValueCountFrequency (%)
, 45
100.0%
Other Symbol
ValueCountFrequency (%)
2
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3826
90.3%
Common 411
 
9.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 427
 
11.2%
e 361
 
9.4%
i 312
 
8.2%
r 269
 
7.0%
n 265
 
6.9%
a 256
 
6.7%
l 238
 
6.2%
c 180
 
4.7%
t 179
 
4.7%
h 171
 
4.5%
Other values (39) 1168
30.5%
Common
ValueCountFrequency (%)
99
24.1%
- 72
17.5%
) 45
10.9%
( 45
10.9%
, 45
10.9%
1 38
 
9.2%
2 28
 
6.8%
4 15
 
3.6%
3 10
 
2.4%
6 4
 
1.0%
Other values (7) 10
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4233
99.9%
Letterlike Symbols 2
 
< 0.1%
Number Forms 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 427
 
10.1%
e 361
 
8.5%
i 312
 
7.4%
r 269
 
6.4%
n 265
 
6.3%
a 256
 
6.0%
l 238
 
5.6%
c 180
 
4.3%
t 179
 
4.2%
h 171
 
4.0%
Other values (53) 1575
37.2%
Letterlike Symbols
ValueCountFrequency (%)
2
100.0%
Number Forms
ValueCountFrequency (%)
1
50.0%
1
50.0%

단위
Categorical

IMBALANCE 

Distinct18
Distinct (%)6.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
mg/L
261 
Bq/L
 
7
㎍/L
 
7
/250 mL
 
4
-
 
3
Other values (13)
 
18

Length

Max length8
Median length4
Mean length4.08
Min length1

Unique

Unique10 ?
Unique (%)3.3%

Sample

1st rowCFU/mL
2nd row/100mL
3rd row/100mL
4th row/100mL
5th rowmg/L

Common Values

ValueCountFrequency (%)
mg/L 261
87.0%
Bq/L 7
 
2.3%
㎍/L 7
 
2.3%
/250 mL 4
 
1.3%
- 3
 
1.0%
/100 mL 3
 
1.0%
/100mL 3
 
1.0%
CFU/mL 2
 
0.7%
/L 1
 
0.3%
/50 mL 1
 
0.3%
Other values (8) 8
 
2.7%

Length

2023-12-12T21:32:39.227704image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mg/l 261
84.5%
ml 8
 
2.6%
bq/l 7
 
2.3%
㎍/l 7
 
2.3%
250 4
 
1.3%
4
 
1.3%
100 3
 
1.0%
100ml 3
 
1.0%
l 2
 
0.6%
cfu/ml 2
 
0.6%
Other values (8) 8
 
2.6%

물질구분
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
농약
102 
유해영향 유기물질
49 
소독부산물
34 
중금속 및 무기물
32 
미생물
19 
Other values (11)
64 

Length

Max length9
Median length8
Mean length4.8466667
Min length2

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row미생물
2nd row미생물
3rd row미생물
4th row미생물
5th row중금속 및 무기물

Common Values

ValueCountFrequency (%)
농약 102
34.0%
유해영향 유기물질 49
16.3%
소독부산물 34
 
11.3%
중금속 및 무기물 32
 
10.7%
미생물 19
 
6.3%
의약물질 17
 
5.7%
심미적 영향물질 10
 
3.3%
방사성물질 8
 
2.7%
조류독소 8
 
2.7%
이온 7
 
2.3%
Other values (6) 14
 
4.7%

Length

2023-12-12T21:32:39.494626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
농약 102
24.1%
유해영향 49
11.6%
유기물질 49
11.6%
소독부산물 34
 
8.0%
중금속 32
 
7.6%
32
 
7.6%
무기물 32
 
7.6%
미생물 19
 
4.5%
의약물질 17
 
4.0%
영향물질 10
 
2.4%
Other values (10) 47
11.1%

대상시료
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
원수, 정수
189 
정수
111 

Length

Max length6
Median length6
Mean length4.52
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정수
2nd row정수
3rd row정수
4th row정수
5th row정수

Common Values

ValueCountFrequency (%)
원수, 정수 189
63.0%
정수 111
37.0%

Length

2023-12-12T21:32:39.702461image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:32:39.877465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정수 300
61.3%
원수 189
38.7%

검사주기
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
년 1회
222 
월 1회
62 
분기 1회
 
15
반기 1회
 
1

Length

Max length5
Median length4
Mean length4.0533333
Min length4

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row월 1회
2nd row월 1회
3rd row월 1회
4th row월 1회
5th row월 1회

Common Values

ValueCountFrequency (%)
년 1회 222
74.0%
월 1회 62
 
20.7%
분기 1회 15
 
5.0%
반기 1회 1
 
0.3%

Length

2023-12-12T21:32:40.351113image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:32:40.484724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
1회 300
50.0%
222
37.0%
62
 
10.3%
분기 15
 
2.5%
반기 1
 
0.2%

구분
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
K-water 자체관리항목
208 
먹는물 수질기준항목
60 
먹는물 수질감시항목
30 
먹는물 수질기준항목(지하수수원)
 
1
먹는물 수질감시항목(지하수수원)
 
1

Length

Max length17
Median length14
Mean length12.82
Min length10

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row먹는물 수질기준항목
2nd row먹는물 수질기준항목
3rd row먹는물 수질기준항목
4th row먹는물 수질기준항목
5th row먹는물 수질기준항목

Common Values

ValueCountFrequency (%)
K-water 자체관리항목 208
69.3%
먹는물 수질기준항목 60
 
20.0%
먹는물 수질감시항목 30
 
10.0%
먹는물 수질기준항목(지하수수원) 1
 
0.3%
먹는물 수질감시항목(지하수수원) 1
 
0.3%

Length

2023-12-12T21:32:40.650164image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T21:32:40.786178image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
k-water 208
34.7%
자체관리항목 208
34.7%
먹는물 92
15.3%
수질기준항목 60
 
10.0%
수질감시항목 30
 
5.0%
수질기준항목(지하수수원 1
 
0.2%
수질감시항목(지하수수원 1
 
0.2%

Interactions

2023-12-12T21:32:37.643678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T21:32:40.898807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호단위물질구분대상시료검사주기구분
번호1.0000.5800.8550.9690.8330.950
단위0.5801.0000.8150.3400.4250.338
물질구분0.8550.8151.0000.8420.6640.633
대상시료0.9690.3400.8421.0000.8960.676
검사주기0.8330.4250.6640.8961.0000.899
구분0.9500.3380.6330.6760.8991.000
2023-12-12T21:32:41.060009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
물질구분구분검사주기대상시료단위
물질구분1.0000.3790.3620.6780.417
구분0.3791.0000.8990.8040.175
검사주기0.3620.8991.0000.7060.239
대상시료0.6780.8040.7061.0000.261
단위0.4170.1750.2390.2611.000
2023-12-12T21:32:41.221736image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
번호단위물질구분대상시료검사주기구분
번호1.0000.2610.5510.8360.6640.689
단위0.2611.0000.4170.2610.2390.175
물질구분0.5510.4171.0000.6780.3620.379
대상시료0.8360.2610.6781.0000.7060.804
검사주기0.6640.2390.3620.7061.0000.899
구분0.6890.1750.3790.8040.8991.000

Missing values

2023-12-12T21:32:37.795427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T21:32:37.951967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

번호항목단위물질구분대상시료검사주기구분
01Total Colony Counts(35℃)CFU/mL미생물정수월 1회먹는물 수질기준항목
12Total coliforms/100mL미생물정수월 1회먹는물 수질기준항목
23Fecal coliforms/100mL미생물정수월 1회먹는물 수질기준항목
34Escherichia coli/100mL미생물정수월 1회먹는물 수질기준항목
45Leadmg/L중금속 및 무기물정수월 1회먹는물 수질기준항목
56Fluoridemg/L이온정수월 1회먹는물 수질기준항목
67Arsenicmg/L중금속 및 무기물정수월 1회먹는물 수질기준항목
78Seleniummg/L중금속 및 무기물정수월 1회먹는물 수질기준항목
89Mercurymg/L중금속 및 무기물정수월 1회먹는물 수질기준항목
910Cyanidemg/L중금속 및 무기물정수월 1회먹는물 수질기준항목
번호항목단위물질구분대상시료검사주기구분
290291N-Nitroso-di-n-propylamine (NDPA)mg/L니트로스아민정수년 1회K-water 자체관리항목
291292N-Nitrosodiphenylamine (NDPHA)mg/L니트로스아민정수년 1회K-water 자체관리항목
292293Sodium perfluoro-a-decanesulfonate (PFDS)mg/L과불화화합물원수, 정수년 1회K-water 자체관리항목
293294Perfluorononanoic acid (PFNA)mg/L과불화화합물원수, 정수년 1회K-water 자체관리항목
294295Perfluorohexanoic acid (PFHxA)mg/L과불화화합물원수, 정수년 1회K-water 자체관리항목
295296Bisphenol-Amg/L알킬페놀원수, 정수년 1회K-water 자체관리항목
296297n-Octylphenolmg/L알킬페놀원수, 정수년 1회K-water 자체관리항목
297298Nonylphenolmg/L알킬페놀원수, 정수년 1회K-water 자체관리항목
298299n-Pentylphenolmg/L알킬페놀원수, 정수년 1회K-water 자체관리항목
299300Total Organic Carbon (TOC)mg/L기타원수, 정수년 1회K-water 자체관리항목