Overview

Dataset statistics

Number of variables6
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows1988
Duplicate rows (%)19.9%
Total size in memory566.4 KiB
Average record size in memory58.0 B

Variable types

Text1
Categorical3
Numeric2

Dataset

Description비점오염저감시설의 강우 기초자료 확보를 위한 데이터로서 기상청 API를 활용한 주소 좌표별 기상 정보를 제공합니다.
URLhttps://www.data.go.kr/data/15070134/fileData.do

Alerts

예보시간 has constant value ""Constant
구분코드 has constant value ""Constant
Dataset has 1988 (19.9%) duplicate rowsDuplicates
예측_값 is highly imbalanced (98.8%)Imbalance

Reproduction

Analysis started2023-12-12 09:07:35.187027
Analysis finished2023-12-12 09:07:36.415491
Duration1.23 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct70
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T18:07:36.548798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length13
Median length13
Mean length13
Min length13

Characters and Unicode

Total characters130000
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022년 01월 18일
2nd row2022년 02월 12일
3rd row2022년 12월 31일
4th row2022년 03월 03일
5th row2022년 01월 18일
ValueCountFrequency (%)
2022년 10000
33.3%
01월 4397
14.7%
02월 3959
 
13.2%
12월 895
 
3.0%
03월 749
 
2.5%
27일 481
 
1.6%
04일 474
 
1.6%
05일 456
 
1.5%
02일 429
 
1.4%
28일 428
 
1.4%
Other values (26) 7732
25.8%
2023-12-12T18:07:36.927083image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 39170
30.1%
0 23450
18.0%
20000
15.4%
10000
 
7.7%
10000
 
7.7%
10000
 
7.7%
1 9146
 
7.0%
3 2308
 
1.8%
7 1071
 
0.8%
4 1056
 
0.8%
Other values (4) 3799
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 80000
61.5%
Other Letter 30000
 
23.1%
Space Separator 20000
 
15.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 39170
49.0%
0 23450
29.3%
1 9146
 
11.4%
3 2308
 
2.9%
7 1071
 
1.3%
4 1056
 
1.3%
5 989
 
1.2%
8 981
 
1.2%
6 932
 
1.2%
9 897
 
1.1%
Other Letter
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%
Space Separator
ValueCountFrequency (%)
20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 100000
76.9%
Hangul 30000
 
23.1%

Most frequent character per script

Common
ValueCountFrequency (%)
2 39170
39.2%
0 23450
23.4%
20000
20.0%
1 9146
 
9.1%
3 2308
 
2.3%
7 1071
 
1.1%
4 1056
 
1.1%
5 989
 
1.0%
8 981
 
1.0%
6 932
 
0.9%
Hangul
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100000
76.9%
Hangul 30000
 
23.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 39170
39.2%
0 23450
23.4%
20000
20.0%
1 9146
 
9.1%
3 2308
 
2.3%
7 1071
 
1.1%
4 1056
 
1.1%
5 989
 
1.0%
8 981
 
1.0%
6 932
 
0.9%
Hangul
ValueCountFrequency (%)
10000
33.3%
10000
33.3%
10000
33.3%

예보시간
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
오전 12시
10000 

Length

Max length6
Median length6
Mean length6
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row오전 12시
2nd row오전 12시
3rd row오전 12시
4th row오전 12시
5th row오전 12시

Common Values

ValueCountFrequency (%)
오전 12시 10000
100.0%

Length

2023-12-12T18:07:37.121273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:07:37.220325image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
오전 10000
50.0%
12시 10000
50.0%

구분코드
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
PCP
10000 

Length

Max length3
Median length3
Mean length3
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPCP
2nd rowPCP
3rd rowPCP
4th rowPCP
5th rowPCP

Common Values

ValueCountFrequency (%)
PCP 10000
100.0%

Length

2023-12-12T18:07:37.319007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:07:37.436820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
pcp 10000
100.0%

예측_값
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
강수없음
9989 
1.0mm
 
11

Length

Max length5
Median length4
Mean length4.0011
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강수없음
2nd row강수없음
3rd row강수없음
4th row강수없음
5th row강수없음

Common Values

ValueCountFrequency (%)
강수없음 9989
99.9%
1.0mm 11
 
0.1%

Length

2023-12-12T18:07:37.577108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T18:07:37.706182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
강수없음 9989
99.9%
1.0mm 11
 
0.1%

X좌표
Real number (ℝ)

Distinct62
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.8427
Minimum21
Maximum126
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T18:07:37.866780image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile48
Q158
median64
Q384
95-th percentile98
Maximum126
Range105
Interquartile range (IQR)26

Descriptive statistics

Standard deviation16.614649
Coefficient of variation (CV)0.23788669
Kurtosis-0.18884527
Mean69.8427
Median Absolute Deviation (MAD)9
Skewness0.44226178
Sum698427
Variance276.04656
MonotonicityNot monotonic
2023-12-12T18:07:38.033685image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
58 509
 
5.1%
61 477
 
4.8%
62 473
 
4.7%
59 443
 
4.4%
63 400
 
4.0%
60 382
 
3.8%
55 374
 
3.7%
56 331
 
3.3%
97 264
 
2.6%
66 242
 
2.4%
Other values (52) 6105
61.1%
ValueCountFrequency (%)
21 57
0.6%
33 51
0.5%
43 44
 
0.4%
44 45
 
0.4%
45 51
0.5%
46 85
0.9%
47 47
 
0.5%
48 127
1.3%
49 81
0.8%
50 97
1.0%
ValueCountFrequency (%)
126 53
 
0.5%
104 56
 
0.6%
102 105
 
1.1%
101 36
 
0.4%
100 55
 
0.5%
99 113
1.1%
98 127
1.3%
97 264
2.6%
96 228
2.3%
95 32
 
0.3%

Y좌표
Real number (ℝ)

Distinct76
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.3103
Minimum32
Maximum146
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T18:07:38.194811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum32
5-th percentile66
Q179
median103
Q3123
95-th percentile132
Maximum146
Range114
Interquartile range (IQR)44

Descriptive statistics

Standard deviation23.59526
Coefficient of variation (CV)0.2329009
Kurtosis-0.85575802
Mean101.3103
Median Absolute Deviation (MAD)21
Skewness-0.30192766
Sum1013103
Variance556.73629
MonotonicityNot monotonic
2023-12-12T18:07:38.356016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
125 444
 
4.4%
126 401
 
4.0%
75 395
 
4.0%
123 365
 
3.6%
120 311
 
3.1%
128 308
 
3.1%
99 295
 
2.9%
77 276
 
2.8%
73 253
 
2.5%
74 242
 
2.4%
Other values (66) 6710
67.1%
ValueCountFrequency (%)
32 45
0.4%
35 44
0.4%
50 43
0.4%
53 47
0.5%
55 45
0.4%
59 49
0.5%
60 45
0.4%
62 42
0.4%
64 51
0.5%
65 84
0.8%
ValueCountFrequency (%)
146 47
 
0.5%
141 82
 
0.8%
140 53
 
0.5%
138 97
 
1.0%
136 81
 
0.8%
134 94
 
0.9%
132 183
1.8%
130 113
 
1.1%
129 94
 
0.9%
128 308
3.1%

Interactions

2023-12-12T18:07:35.863379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:07:35.546467image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:07:36.009976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T18:07:35.718483image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T18:07:38.459949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
예보일자예측_값X좌표Y좌표
예보일자1.0000.1820.0000.000
예측_값0.1821.0000.1180.046
X좌표0.0000.1181.0000.563
Y좌표0.0000.0460.5631.000
2023-12-12T18:07:38.574703image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
X좌표Y좌표예측_값
X좌표1.000-0.1500.117
Y좌표-0.1501.0000.035
예측_값0.1170.0351.000

Missing values

2023-12-12T18:07:36.214574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T18:07:36.355337image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

예보일자예보시간구분코드예측_값X좌표Y좌표
205212022년 01월 18일오전 12시PCP강수없음5873
566212022년 02월 12일오전 12시PCP강수없음78106
929912022년 12월 31일오전 12시PCP강수없음54126
820542022년 03월 03일오전 12시PCP강수없음84103
203392022년 01월 18일오전 12시PCP강수없음60120
380802022년 01월 29일오전 12시PCP강수없음62108
340152022년 01월 26일오전 12시PCP강수없음60104
452172022년 02월 03일오전 12시PCP강수없음63123
170602022년 01월 13일오전 12시PCP강수없음57128
926632022년 12월 31일오전 12시PCP강수없음5977
예보일자예보시간구분코드예측_값X좌표Y좌표
857402022년 12월 26일오전 12시PCP강수없음7667
229552022년 01월 21일오전 12시PCP강수없음6165
141942022년 01월 12일오전 12시PCP강수없음9788
340902022년 01월 26일오전 12시PCP강수없음57126
834522022년 03월 04일오전 12시PCP강수없음8368
561322022년 02월 11일오전 12시PCP강수없음5974
463662022년 02월 03일오전 12시PCP강수없음7587
441222022년 02월 05일오전 12시PCP강수없음83104
746032022년 02월 26일오전 12시PCP강수없음7295
879462022년 12월 30일오전 12시PCP강수없음7283

Duplicate rows

Most frequently occurring

예보일자예보시간구분코드예측_값X좌표Y좌표# duplicates
13732022년 02월 18일오전 12시PCP강수없음821215
14802022년 02월 24일오전 12시PCP강수없음541255
15782022년 02월 27일오전 12시PCP강수없음44555
19342022년 12월 30일오전 12시PCP강수없음56895
612022년 01월 03일오전 12시PCP강수없음72834
2072022년 01월 08일오전 12시PCP강수없음43954
2462022년 01월 09일오전 12시PCP강수없음78714
3112022년 01월 11일오전 12시PCP강수없음991144
4182022년 01월 17일오전 12시PCP강수없음1001034
4442022년 01월 18일오전 12시PCP강수없음92944