Overview

Dataset statistics

Number of variables4
Number of observations10000
Missing cells3
Missing cells (%)< 0.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory410.2 KiB
Average record size in memory42.0 B

Variable types

Numeric2
Categorical2

Dataset

Description조사년월일,측정소이름,측정지역,일일강수량
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-22140/S/1/datasetView.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
측정지역 is highly overall correlated with 측정소이름High correlation
측정소이름 is highly overall correlated with 측정지역High correlation
일일강수량 has 7679 (76.8%) zerosZeros

Reproduction

Analysis started2024-05-18 06:01:27.732336
Analysis finished2024-05-18 06:01:30.224727
Duration2.49 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

조사년월일
Real number (ℝ)

Distinct3712
Distinct (%)37.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20174066
Minimum20120102
Maximum20240229
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T15:01:30.413036image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum20120102
5-th percentile20120706
Q120140712
median20170216
Q320210217
95-th percentile20230722
Maximum20240229
Range120127
Interquartile range (IQR)69505.25

Descriptive statistics

Standard deviation36685.873
Coefficient of variation (CV)0.001818467
Kurtosis-1.3367571
Mean20174066
Median Absolute Deviation (MAD)30493
Skewness0.10390207
Sum2.0174066 × 1011
Variance1.3458533 × 109
MonotonicityNot monotonic
2024-05-18T15:01:30.898515image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20121025 10
 
0.1%
20160513 9
 
0.1%
20140613 8
 
0.1%
20141218 8
 
0.1%
20150618 8
 
0.1%
20210210 8
 
0.1%
20130414 8
 
0.1%
20151201 8
 
0.1%
20190612 8
 
0.1%
20240205 8
 
0.1%
Other values (3702) 9917
99.2%
ValueCountFrequency (%)
20120102 3
< 0.1%
20120103 3
< 0.1%
20120104 2
 
< 0.1%
20120105 1
 
< 0.1%
20120107 7
0.1%
20120108 3
< 0.1%
20120109 2
 
< 0.1%
20120110 4
< 0.1%
20120111 2
 
< 0.1%
20120112 2
 
< 0.1%
ValueCountFrequency (%)
20240229 2
 
< 0.1%
20240228 1
 
< 0.1%
20240227 2
 
< 0.1%
20240226 2
 
< 0.1%
20240224 3
< 0.1%
20240222 7
0.1%
20240221 3
< 0.1%
20240220 2
 
< 0.1%
20240219 2
 
< 0.1%
20240218 6
0.1%

측정소이름
Categorical

HIGH CORRELATION 

Distinct27
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
성북
 
456
중구
 
445
은평
 
431
광진
 
429
영등포
 
428
Other values (22)
7811 

Length

Max length3
Median length2
Mean length2.1246
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row중랑
2nd row양천
3rd row관악
4th row구로
5th row동작

Common Values

ValueCountFrequency (%)
성북 456
 
4.6%
중구 445
 
4.5%
은평 431
 
4.3%
광진 429
 
4.3%
영등포 428
 
4.3%
강북 427
 
4.3%
송파 425
 
4.2%
노원 419
 
4.2%
용산 417
 
4.2%
동대문 414
 
4.1%
Other values (17) 5709
57.1%

Length

2024-05-18T15:01:31.274285image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
성북 456
 
4.6%
중구 445
 
4.5%
강북 436
 
4.4%
은평 431
 
4.3%
광진 429
 
4.3%
영등포 428
 
4.3%
송파 425
 
4.2%
노원 419
 
4.2%
용산 417
 
4.2%
동대문 414
 
4.1%
Other values (16) 5700
57.0%

측정지역
Categorical

HIGH CORRELATION 

Distinct27
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
성북구
 
456
강북구
 
436
은평구
 
431
광진구
 
429
영등포구
 
428
Other values (22)
7820 

Length

Max length4
Median length3
Mean length3.1049
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row중랑구
2nd row양천구
3rd row관악구
4th row구로구
5th row동작구

Common Values

ValueCountFrequency (%)
성북구 456
 
4.6%
강북구 436
 
4.4%
은평구 431
 
4.3%
광진구 429
 
4.3%
영등포구 428
 
4.3%
송파구 425
 
4.2%
노원구 419
 
4.2%
용산구 417
 
4.2%
구로구 414
 
4.1%
동대문구 414
 
4.1%
Other values (17) 5731
57.3%

Length

2024-05-18T15:01:31.667287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
성북구 456
 
4.6%
강북구 436
 
4.4%
은평구 431
 
4.3%
광진구 429
 
4.3%
영등포구 428
 
4.3%
송파구 425
 
4.2%
노원구 419
 
4.2%
용산구 417
 
4.2%
구로구 414
 
4.1%
동대문구 414
 
4.1%
Other values (17) 5731
57.3%

일일강수량
Real number (ℝ)

ZEROS 

Distinct196
Distinct (%)2.0%
Missing3
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean3.3918175
Minimum0
Maximum354.5
Zeros7679
Zeros (%)76.8%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2024-05-18T15:01:32.013232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile20
Maximum354.5
Range354.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13.557769
Coefficient of variation (CV)3.9971988
Kurtosis120.46054
Mean3.3918175
Median Absolute Deviation (MAD)0
Skewness8.6217209
Sum33908
Variance183.8131
MonotonicityNot monotonic
2024-05-18T15:01:32.500985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 7679
76.8%
0.5 296
 
3.0%
1.0 180
 
1.8%
1.5 152
 
1.5%
2.0 120
 
1.2%
2.5 92
 
0.9%
3.0 87
 
0.9%
3.5 74
 
0.7%
4.5 64
 
0.6%
4.0 60
 
0.6%
Other values (186) 1193
 
11.9%
ValueCountFrequency (%)
0.0 7679
76.8%
0.5 296
 
3.0%
1.0 180
 
1.8%
1.5 152
 
1.5%
2.0 120
 
1.2%
2.5 92
 
0.9%
3.0 87
 
0.9%
3.5 74
 
0.7%
4.0 60
 
0.6%
4.5 64
 
0.6%
ValueCountFrequency (%)
354.5 1
< 0.1%
305.5 1
< 0.1%
208.5 1
< 0.1%
200.5 1
< 0.1%
198.0 1
< 0.1%
190.5 1
< 0.1%
182.5 1
< 0.1%
166.5 1
< 0.1%
164.5 1
< 0.1%
163.0 1
< 0.1%

Interactions

2024-05-18T15:01:29.279932image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:01:28.671476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:01:29.553807image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-05-18T15:01:28.974728image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-05-18T15:01:32.861586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사년월일측정소이름측정지역일일강수량
조사년월일1.0000.2470.2890.068
측정소이름0.2471.0001.0000.000
측정지역0.2891.0001.0000.027
일일강수량0.0680.0000.0271.000
2024-05-18T15:01:33.135084image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정지역측정소이름
측정지역1.0000.981
측정소이름0.9811.000
2024-05-18T15:01:33.434411image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
조사년월일일일강수량측정소이름측정지역
조사년월일1.0000.0450.0910.108
일일강수량0.0451.0000.0000.011
측정소이름0.0910.0001.0000.981
측정지역0.1080.0110.9811.000

Missing values

2024-05-18T15:01:29.918490image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-18T15:01:30.135880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

조사년월일측정소이름측정지역일일강수량
4975220161102중랑중랑구0.0
5173320160814양천양천구0.0
6113820150731관악관악구0.0
7064820140624구로구로구0.0
8919520120627동작동작구0.0
8040220130604강남강남구0.0
2033220210722강서강서구0.0
2438820210113강서강서구0.0
1985620210814송파송파구0.0
4882720161212서대문서대문구0.0
조사년월일측정소이름측정지역일일강수량
2253220210408관악관악구0.0
6418320150313동대문동대문구0.0
3446420191004영등포영등포구0.0
8331520130212중랑중랑구0.0
4557020180225강북강북구0.0
4639320180116강남강남구0.0
294220231014중구중구3.0
4301720180701은평은평구69.0
4343920180611성북성북구0.0
1148720220908관악관악구0.0

Duplicate rows

Most frequently occurring

조사년월일측정소이름측정지역일일강수량# duplicates
020190614송파송파구0.02