Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 10000 |
Missing cells | 4 |
Missing cells (%) | < 0.1% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 693.4 KiB |
Average record size in memory | 71.0 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 2 |
Dataset
Description | 파일 다운로드 |
---|---|
Author | 서울특별시 |
URL | https://data.seoul.go.kr/dataList/OA-15526/S/1/datasetView.do |
지자체 기준초과 구분 is highly overall correlated with 평균값 and 1 other fields | High correlation |
국가 기준초과 구분 is highly overall correlated with 평균값 and 1 other fields | High correlation |
측정항목 is highly overall correlated with 평균값 | High correlation |
평균값 is highly overall correlated with 측정항목 and 2 other fields | High correlation |
국가 기준초과 구분 is highly imbalanced (82.0%) | Imbalance |
지자체 기준초과 구분 is highly imbalanced (82.0%) | Imbalance |
측정기 상태 has 9845 (98.5%) zeros | Zeros |
Reproduction
Analysis started | 2024-07-27 00:34:42.084712 |
---|---|
Analysis finished | 2024-07-27 00:34:55.564879 |
Duration | 13.48 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
측정일시
Real number (ℝ)
Distinct | 468 |
---|---|
Distinct (%) | 4.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.016011 × 109 |
Minimum | 2.0160101 × 109 |
---|---|
Maximum | 2.016012 × 109 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 2.0160101 × 109 |
---|---|
5-th percentile | 2.0160101 × 109 |
Q1 | 2.0160105 × 109 |
median | 2.016011 × 109 |
Q3 | 2.0160115 × 109 |
95-th percentile | 2.0160119 × 109 |
Maximum | 2.016012 × 109 |
Range | 1911 |
Interquartile range (IQR) | 998 |
Descriptive statistics
Standard deviation | 566.52846 |
---|---|
Coefficient of variation (CV) | 2.8101456 × 10-7 |
Kurtosis | -1.2108739 |
Mean | 2.016011 × 109 |
Median Absolute Deviation (MAD) | 498.5 |
Skewness | 0.0049462973 |
Sum | 2.016011 × 1013 |
Variance | 320954.5 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2016010221 | 33 | 0.3% |
2016010521 | 31 | 0.3% |
2016011413 | 31 | 0.3% |
2016011417 | 31 | 0.3% |
2016011806 | 31 | 0.3% |
2016010315 | 31 | 0.3% |
2016011309 | 31 | 0.3% |
2016010507 | 31 | 0.3% |
2016011905 | 31 | 0.3% |
2016010816 | 30 | 0.3% |
Other values (458) | 9689 |
Value | Count | Frequency (%) |
2016010100 | 30 | |
2016010101 | 28 | |
2016010102 | 20 | |
2016010103 | 11 | 0.1% |
2016010104 | 24 | |
2016010105 | 18 | |
2016010106 | 15 | |
2016010107 | 27 | |
2016010108 | 23 | |
2016010109 | 15 |
Value | Count | Frequency (%) |
2016012011 | 21 | |
2016012010 | 26 | |
2016012009 | 22 | |
2016012008 | 20 | |
2016012007 | 22 | |
2016012006 | 26 | |
2016012005 | 20 | |
2016012004 | 20 | |
2016012003 | 19 | |
2016012002 | 19 |
측정소 코드
Real number (ℝ)
Distinct | 25 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 113.0696 |
Minimum | 101 |
---|---|
Maximum | 125 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 101 |
---|---|
5-th percentile | 102 |
Q1 | 107 |
median | 113 |
Q3 | 119 |
95-th percentile | 124 |
Maximum | 125 |
Range | 24 |
Interquartile range (IQR) | 12 |
Descriptive statistics
Standard deviation | 7.1947851 |
---|---|
Coefficient of variation (CV) | 0.063631472 |
Kurtosis | -1.1980857 |
Mean | 113.0696 |
Median Absolute Deviation (MAD) | 6 |
Skewness | -0.028542672 |
Sum | 1130696 |
Variance | 51.764932 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=25)
Value | Count | Frequency (%) |
121 | 444 | 4.4% |
105 | 441 | 4.4% |
112 | 428 | 4.3% |
116 | 427 | 4.3% |
113 | 425 | 4.2% |
119 | 409 | 4.1% |
115 | 407 | 4.1% |
102 | 406 | 4.1% |
110 | 402 | 4.0% |
118 | 402 | 4.0% |
Other values (15) | 5809 |
Value | Count | Frequency (%) |
101 | 402 | |
102 | 406 | |
103 | 377 | |
104 | 401 | |
105 | 441 | |
106 | 375 | |
107 | 369 | |
108 | 342 | |
109 | 383 | |
110 | 402 |
Value | Count | Frequency (%) |
125 | 382 | |
124 | 395 | |
123 | 401 | |
122 | 400 | |
121 | 444 | |
120 | 401 | |
119 | 409 | |
118 | 402 | |
117 | 390 | |
116 | 427 |
측정항목
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 6 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5.3679 |
Minimum | 1 |
---|---|
Maximum | 9 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
median | 6 |
Q3 | 8 |
95-th percentile | 9 |
Maximum | 9 |
Range | 8 |
Interquartile range (IQR) | 5 |
Descriptive statistics
Standard deviation | 2.746107 |
---|---|
Coefficient of variation (CV) | 0.51157939 |
Kurtosis | -1.2024288 |
Mean | 5.3679 |
Median Absolute Deviation (MAD) | 3 |
Skewness | -0.2145918 |
Sum | 53679 |
Variance | 7.5411037 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
Value | Count | Frequency (%) |
9 | 1717 | |
3 | 1674 | |
5 | 1670 | |
8 | 1665 | |
6 | 1652 | |
1 | 1622 |
Value | Count | Frequency (%) |
1 | 1622 | |
3 | 1674 | |
5 | 1670 | |
6 | 1652 | |
8 | 1665 | |
9 | 1717 |
Value | Count | Frequency (%) |
9 | 1717 | |
8 | 1665 | |
6 | 1652 | |
5 | 1670 | |
3 | 1674 | |
1 | 1622 |
평균값
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 435 |
---|---|
Distinct (%) | 4.4% |
Missing | 4 |
Missing (%) | < 0.1% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 13.114523 |
Minimum | -221 |
---|---|
Maximum | 217 |
Zeros | 44 |
Zeros (%) | 0.4% |
Negative | 6 |
Negative (%) | 0.1% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | -221 |
---|---|
5-th percentile | 0.004 |
Q1 | 0.011 |
median | 0.18 |
Q3 | 22 |
95-th percentile | 62 |
Maximum | 217 |
Range | 438 |
Interquartile range (IQR) | 21.989 |
Descriptive statistics
Standard deviation | 23.415935 |
---|---|
Coefficient of variation (CV) | 1.7854965 |
Kurtosis | 8.2980509 |
Mean | 13.114523 |
Median Absolute Deviation (MAD) | 0.178 |
Skewness | 2.0809158 |
Sum | 131092.77 |
Variance | 548.30601 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0.006 | 499 | 5.0% |
0.005 | 499 | 5.0% |
0.007 | 368 | 3.7% |
0.004 | 236 | 2.4% |
0.008 | 207 | 2.1% |
0.003 | 169 | 1.7% |
0.002 | 169 | 1.7% |
0.009 | 102 | 1.0% |
0.022 | 96 | 1.0% |
0.02 | 93 | 0.9% |
Other values (425) | 7558 |
Value | Count | Frequency (%) |
-221.0 | 1 | < 0.1% |
-190.0 | 1 | < 0.1% |
-67.0 | 1 | < 0.1% |
-30.0 | 1 | < 0.1% |
-16.0 | 1 | < 0.1% |
-12.0 | 1 | < 0.1% |
0.0 | 44 | 0.4% |
0.001 | 62 | 0.6% |
0.002 | 169 | |
0.003 | 169 |
Value | Count | Frequency (%) |
217.0 | 1 | |
198.0 | 1 | |
193.0 | 1 | |
184.0 | 1 | |
179.0 | 1 | |
177.0 | 1 | |
172.0 | 1 | |
171.0 | 2 | |
164.0 | 1 | |
158.0 | 1 |
측정기 상태
Real number (ℝ)
ZEROS
 
Distinct | 6 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.0528 |
Minimum | 0 |
---|---|
Maximum | 9 |
Zeros | 9845 |
Zeros (%) | 98.5% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0 |
Maximum | 9 |
Range | 9 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 0.58944627 |
---|---|
Coefficient of variation (CV) | 11.163755 |
Kurtosis | 202.95223 |
Mean | 0.0528 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 13.923146 |
Sum | 528 |
Variance | 0.3474469 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
Value | Count | Frequency (%) |
0 | 9845 | |
1 | 77 | 0.8% |
9 | 37 | 0.4% |
2 | 27 | 0.3% |
4 | 12 | 0.1% |
8 | 2 | < 0.1% |
Value | Count | Frequency (%) |
0 | 9845 | |
1 | 77 | 0.8% |
2 | 27 | 0.3% |
4 | 12 | 0.1% |
8 | 2 | < 0.1% |
9 | 37 | 0.4% |
Value | Count | Frequency (%) |
9 | 37 | 0.4% |
8 | 2 | < 0.1% |
4 | 12 | 0.1% |
2 | 27 | 0.3% |
1 | 77 | 0.8% |
0 | 9845 |
국가 기준초과 구분
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
0 | |
---|---|
1 | 272 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 9728 | |
1 | 272 | 2.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 9728 | |
1 | 272 | 2.7% |
지자체 기준초과 구분
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
0 | |
---|---|
1 | 272 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 0 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 9728 | |
1 | 272 | 2.7% |
Length
Histogram of lengths of the category
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 9728 | |
1 | 272 | 2.7% |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
측정일시 | 1.000 | 0.000 | 0.000 | 0.246 | 0.121 | 0.357 | 0.357 |
측정소 코드 | 0.000 | 1.000 | 0.000 | 0.046 | 0.112 | 0.058 | 0.058 |
측정항목 | 0.000 | 0.000 | 1.000 | 0.440 | 0.129 | 0.378 | 0.378 |
평균값 | 0.246 | 0.046 | 0.440 | 1.000 | 0.231 | 0.759 | 0.759 |
측정기 상태 | 0.121 | 0.112 | 0.129 | 0.231 | 1.000 | 0.195 | 0.195 |
국가 기준초과 구분 | 0.357 | 0.058 | 0.378 | 0.759 | 0.195 | 1.000 | 1.000 |
지자체 기준초과 구분 | 0.357 | 0.058 | 0.378 | 0.759 | 0.195 | 1.000 | 1.000 |
지자체 기준초과 구분 | 국가 기준초과 구분 | |
---|---|---|
지자체 기준초과 구분 | 1.000 | 0.998 |
국가 기준초과 구분 | 0.998 | 1.000 |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
측정일시 | 1.000 | -0.002 | -0.004 | -0.022 | -0.037 | 0.272 | 0.272 |
측정소 코드 | -0.002 | 1.000 | -0.008 | -0.008 | 0.015 | 0.044 | 0.044 |
측정항목 | -0.004 | -0.008 | 1.000 | 0.723 | 0.043 | 0.272 | 0.272 |
평균값 | -0.022 | -0.008 | 0.723 | 1.000 | -0.043 | 0.584 | 0.584 |
측정기 상태 | -0.037 | 0.015 | 0.043 | -0.043 | 1.000 | 0.140 | 0.140 |
국가 기준초과 구분 | 0.272 | 0.044 | 0.272 | 0.584 | 0.140 | 1.000 | 0.998 |
지자체 기준초과 구분 | 0.272 | 0.044 | 0.272 | 0.584 | 0.140 | 0.998 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
14478 | 2016010500 | 114 | 1 | 0.006 | 0 | 0 | 0 |
5622 | 2016010213 | 113 | 1 | 0.006 | 0 | 0 | 0 |
27969 | 2016010818 | 112 | 6 | 0.023 | 0 | 0 | 0 |
25039 | 2016010722 | 124 | 3 | 0.023 | 0 | 0 | 0 |
15521 | 2016010507 | 112 | 9 | 21.0 | 0 | 0 | 0 |
51674 | 2016011508 | 113 | 5 | 1.11 | 0 | 0 | 0 |
61431 | 2016011801 | 114 | 6 | 0.001 | 0 | 0 | 0 |
23056 | 2016010709 | 118 | 8 | 38.0 | 0 | 0 | 0 |
44474 | 2016011308 | 113 | 5 | 1.0 | 0 | 0 | 0 |
29440 | 2016010904 | 107 | 8 | 40.0 | 0 | 0 | 0 |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
12155 | 2016010409 | 101 | 9 | 77.0 | 0 | 1 | 1 |
31213 | 2016010916 | 103 | 3 | 0.025 | 0 | 0 | 0 |
1394 | 2016010109 | 108 | 5 | 0.97 | 0 | 0 | 0 |
14697 | 2016010501 | 125 | 6 | 0.004 | 0 | 0 | 0 |
62856 | 2016011811 | 102 | 1 | 0.007 | 0 | 0 | 0 |
66662 | 2016011912 | 111 | 5 | 0.36 | 0 | 0 | 0 |
920 | 2016010106 | 104 | 5 | 1.03 | 0 | 0 | 0 |
67707 | 2016011919 | 110 | 6 | 0.022 | 0 | 0 | 0 |
53322 | 2016011519 | 113 | 1 | 0.006 | 0 | 0 | 0 |
45639 | 2016011316 | 107 | 6 | 0.006 | 0 | 0 | 0 |