Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows779
Duplicate rows (%)7.8%
Total size in memory556.6 KiB
Average record size in memory57.0 B

Variable types

Categorical5
Numeric1

Dataset

Description홍수 예측을 위해 한강홍수통제소에서 관측하는 예보유역 별 수문학적홍수변수정보입니다. 예측수문 모형 종류, 변수명, 변수 기본값 등을 제공합니다.
Author환경부 한강홍수통제소
URLhttps://www.data.go.kr/data/15085910/fileData.do

Alerts

Dataset has 779 (7.8%) duplicate rowsDuplicates
변수명 is highly overall correlated with 변수기본값 and 1 other fieldsHigh correlation
모형종류(FK) is highly overall correlated with 변수명High correlation
변수기본값 is highly overall correlated with 변수명High correlation
적용시작일자 is highly overall correlated with 적용종료일자High correlation
적용종료일자 is highly overall correlated with 적용시작일자High correlation
변수기본값 has 133 (1.3%) zerosZeros

Reproduction

Analysis started2023-12-12 03:07:47.648262
Analysis finished2023-12-12 03:07:48.792079
Duration1.14 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

예보유역명
Categorical

Distinct13
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
낙동강
3833 
한강
2899 
금강
1110 
섬진강
579 
영산강
531 
Other values (8)
1048 

Length

Max length7
Median length3
Mean length2.7307
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row만경 · 동진
2nd row금강
3rd row낙동강
4th row한강
5th row안성천

Common Values

ValueCountFrequency (%)
낙동강 3833
38.3%
한강 2899
29.0%
금강 1110
 
11.1%
섬진강 579
 
5.8%
영산강 531
 
5.3%
만경 · 동진 285
 
2.9%
삽교천 218
 
2.2%
<NA> 141
 
1.4%
형산강 120
 
1.2%
안성천 119
 
1.2%
Other values (3) 165
 
1.7%

Length

2023-12-12T12:07:48.882203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
낙동강 3833
36.3%
한강 2899
27.4%
금강 1110
 
10.5%
섬진강 579
 
5.5%
영산강 531
 
5.0%
만경 285
 
2.7%
· 285
 
2.7%
동진 285
 
2.7%
삽교천 218
 
2.1%
na 141
 
1.3%
Other values (5) 404
 
3.8%

적용시작일자
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2000-01-01
4235 
2021-01-01
4155 
2018-01-01
1212 
2017-06-01
 
213
2011-01-01
 
141

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-01-01
2nd row2000-01-01
3rd row2018-01-01
4th row2021-01-01
5th row2000-01-01

Common Values

ValueCountFrequency (%)
2000-01-01 4235
42.4%
2021-01-01 4155
41.5%
2018-01-01 1212
 
12.1%
2017-06-01 213
 
2.1%
2011-01-01 141
 
1.4%
2013-01-01 44
 
0.4%

Length

2023-12-12T12:07:49.026949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:07:49.200320image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2000-01-01 4235
42.4%
2021-01-01 4155
41.5%
2018-01-01 1212
 
12.1%
2017-06-01 213
 
2.1%
2011-01-01 141
 
1.4%
2013-01-01 44
 
0.4%

적용종료일자
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
9999-12-31
4450 
2020-12-31
4090 
2017-12-31
1149 
2017-05-31
 
253
2012-12-31
 
58

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9999-12-31
2nd row2020-12-31
3rd row2020-12-31
4th row9999-12-31
5th row9999-12-31

Common Values

ValueCountFrequency (%)
9999-12-31 4450
44.5%
2020-12-31 4090
40.9%
2017-12-31 1149
 
11.5%
2017-05-31 253
 
2.5%
2012-12-31 58
 
0.6%

Length

2023-12-12T12:07:49.367264image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:07:49.503919image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
9999-12-31 4450
44.5%
2020-12-31 4090
40.9%
2017-12-31 1149
 
11.5%
2017-05-31 253
 
2.5%
2012-12-31 58
 
0.6%

모형종류(FK)
Categorical

HIGH CORRELATION 

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
침투율모형
2204 
저류함수법
2102 
Green-Ampt
1393 
포화우량
1048 
Muskingum
931 
Other values (4)
2322 

Length

Max length10
Median length9
Mean length6.0499
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row저류함수법
2nd row단위도법
3rd row강우패턴
4th row저류함수법
5th row<NA>

Common Values

ValueCountFrequency (%)
침투율모형 2204
22.0%
저류함수법 2102
21.0%
Green-Ampt 1393
13.9%
포화우량 1048
10.5%
Muskingum 931
9.3%
SCS 유출지수 795
 
8.0%
<NA> 758
 
7.6%
단위도법 508
 
5.1%
강우패턴 261
 
2.6%

Length

2023-12-12T12:07:49.659391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:07:49.824968image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
침투율모형 2204
20.4%
저류함수법 2102
19.5%
green-ampt 1393
12.9%
포화우량 1048
9.7%
muskingum 931
8.6%
scs 795
 
7.4%
유출지수 795
 
7.4%
na 758
 
7.0%
단위도법 508
 
4.7%
강우패턴 261
 
2.4%

변수명
Categorical

HIGH CORRELATION 

Distinct35
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
α
 
526
Km
 
473
TlChn
 
467
KChn
 
462
X
 
458
Other values (30)
7614 

Length

Max length6
Median length5
Mean length2.8956
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTlChn
2nd rowTp
3rd row강우패턴
4th rowPChn
5th rowα

Common Values

ValueCountFrequency (%)
α 526
 
5.3%
Km 473
 
4.7%
TlChn 467
 
4.7%
KChn 462
 
4.6%
X 458
 
4.6%
PChn 447
 
4.5%
ω 275
 
2.8%
CN 274
 
2.7%
f1 271
 
2.7%
Tp 269
 
2.7%
Other values (25) 6078
60.8%

Length

2023-12-12T12:07:50.337644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
α 526
 
5.3%
km 473
 
4.7%
tlchn 467
 
4.7%
kchn 462
 
4.6%
x 458
 
4.6%
pchn 447
 
4.5%
ω 275
 
2.8%
cn 274
 
2.7%
f1 271
 
2.7%
tp 269
 
2.7%
Other values (25) 6078
60.8%

변수기본값
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct2835
Distinct (%)28.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.55786
Minimum0
Maximum490.8
Zeros133
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T12:07:50.485990image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.07
Q10.38
median1
Q310.77525
95-th percentile263.825
Maximum490.8
Range490.8
Interquartile range (IQR)10.39525

Descriptive statistics

Standard deviation82.223505
Coefficient of variation (CV)2.2491334
Kurtosis7.0630638
Mean36.55786
Median Absolute Deviation (MAD)0.937
Skewness2.7542745
Sum365578.6
Variance6760.7047
MonotonicityNot monotonic
2023-12-12T12:07:50.691009image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0 1300
 
13.0%
0.2 722
 
7.2%
0.6 296
 
3.0%
0.1 293
 
2.9%
2.6 261
 
2.6%
0.25 258
 
2.6%
2.0 250
 
2.5%
100.0 234
 
2.3%
2.5 215
 
2.1%
0.0 133
 
1.3%
Other values (2825) 6038
60.4%
ValueCountFrequency (%)
0.0 133
1.3%
0.001 2
 
< 0.1%
0.003 2
 
< 0.1%
0.0038 1
 
< 0.1%
0.004 5
 
0.1%
0.005 1
 
< 0.1%
0.006 3
 
< 0.1%
0.008 2
 
< 0.1%
0.009 3
 
< 0.1%
0.01 32
 
0.3%
ValueCountFrequency (%)
490.8 1
< 0.1%
468.0 1
< 0.1%
456.8 1
< 0.1%
435.1 1
< 0.1%
422.7 1
< 0.1%
421.9 2
< 0.1%
413.1 1
< 0.1%
411.6 1
< 0.1%
408.1 2
< 0.1%
406.1 1
< 0.1%

Interactions

2023-12-12T12:07:48.366944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T12:07:50.808660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
예보유역명적용시작일자적용종료일자모형종류(FK)변수명변수기본값
예보유역명1.0000.4930.5030.0700.0740.112
적용시작일자0.4931.0000.6800.1990.2870.082
적용종료일자0.5030.6801.0000.2200.3480.091
모형종류(FK)0.0700.1990.2201.0001.0000.492
변수명0.0740.2870.3481.0001.0000.889
변수기본값0.1120.0820.0910.4920.8891.000
2023-12-12T12:07:50.978098image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
변수명예보유역명적용시작일자적용종료일자모형종류(FK)
변수명1.0000.0240.1280.1580.999
예보유역명0.0241.0000.2990.3070.030
적용시작일자0.1280.2991.0000.5420.111
적용종료일자0.1580.3070.5421.0000.136
모형종류(FK)0.9990.0300.1110.1361.000
2023-12-12T12:07:51.086633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
변수기본값예보유역명적용시작일자적용종료일자모형종류(FK)변수명
변수기본값1.0000.0470.0430.0380.2610.559
예보유역명0.0471.0000.2990.3070.0300.024
적용시작일자0.0430.2991.0000.5420.1110.128
적용종료일자0.0380.3070.5421.0000.1360.158
모형종류(FK)0.2610.0300.1110.1361.0000.999
변수명0.5590.0240.1280.1580.9991.000

Missing values

2023-12-12T12:07:48.549496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T12:07:48.711553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

예보유역명적용시작일자적용종료일자모형종류(FK)변수명변수기본값
53492만경 · 동진2021-01-019999-12-31저류함수법TlChn0.26
45229금강2000-01-012020-12-31단위도법Tp4.18
29762낙동강2018-01-012020-12-31강우패턴강우패턴1.0
12753한강2021-01-019999-12-31저류함수법PChn0.562
18303안성천2000-01-019999-12-31<NA>α1.0
18291안성천2000-01-019999-12-31<NA>α1.0
25905낙동강2000-01-012017-12-31<NA>α1.0
2764한강2000-01-012020-12-31강우패턴강우패턴1.0
41186낙동강2021-01-019999-12-31포화우량Rsa130.0
40513낙동강2021-01-019999-12-31침투율모형rLZ0.25
예보유역명적용시작일자적용종료일자모형종류(FK)변수명변수기본값
41799낙동강2021-01-019999-12-31포화우량AMC2.6
33269낙동강2018-01-012020-12-31침투율모형rUZ0.2
24055낙동강2000-01-012017-12-31침투율모형βf1.0
31126낙동강2018-01-012020-12-31저류함수법Kbas34.937
48893금강2021-01-019999-12-31저류함수법PChn0.632
27035낙동강2017-06-012017-12-31저류함수법PChn0.638
33344낙동강2018-01-012020-12-31침투율모형rUZ0.2
38987낙동강2021-01-019999-12-31저류함수법TlChn0.13
19531낙동강2000-01-012017-12-31Green-Amptrwater1.188
14896한강2021-01-019999-12-31침투율모형rUZ0.2

Duplicate rows

Most frequently occurring

예보유역명적용시작일자적용종료일자모형종류(FK)변수명변수기본값# duplicates
741한강2021-01-019999-12-31<NA>α1.089
621한강2000-01-012020-12-31<NA>α1.075
266낙동강2018-01-012020-12-31<NA>α1.067
150낙동강2000-01-012017-12-31<NA>α1.066
354낙동강2021-01-019999-12-31<NA>α1.059
73낙동강2000-01-012017-05-31MuskingumX0.255
659한강2021-01-019999-12-31MuskingumX0.255
540한강2000-01-012020-12-31MuskingumX0.251
620한강2000-01-012020-12-31포화우량fsa1.049
668한강2021-01-019999-12-31SCS 유출지수ω0.148