Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 10000 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 693.4 KiB |
Average record size in memory | 71.0 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 2 |
Dataset
Description | 파일 다운로드 |
---|---|
Author | 서울특별시 |
URL | https://data.seoul.go.kr/dataList/OA-15526/S/1/datasetView.do |
지자체 기준초과 구분 is highly overall correlated with 국가 기준초과 구분 | High correlation |
국가 기준초과 구분 is highly overall correlated with 지자체 기준초과 구분 | High correlation |
측정항목 is highly overall correlated with 평균값 | High correlation |
평균값 is highly overall correlated with 측정항목 | High correlation |
국가 기준초과 구분 is highly imbalanced (70.6%) | Imbalance |
지자체 기준초과 구분 is highly imbalanced (70.6%) | Imbalance |
평균값 is highly skewed (γ1 = -22.21268622) | Skewed |
평균값 has 222 (2.2%) zeros | Zeros |
측정기 상태 has 9626 (96.3%) zeros | Zeros |
Reproduction
Analysis started | 2024-07-27 00:24:27.097043 |
---|---|
Analysis finished | 2024-07-27 00:24:39.703367 |
Duration | 12.61 seconds |
Software version | ydata-profiling vv4.5.1 |
Download configuration | config.json |
측정일시
Real number (ℝ)
Distinct | 470 |
---|---|
Distinct (%) | 4.7% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.018011 × 109 |
Minimum | 2.0180101 × 109 |
---|---|
Maximum | 2.018012 × 109 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 2.0180101 × 109 |
---|---|
5-th percentile | 2.0180102 × 109 |
Q1 | 2.0180105 × 109 |
median | 2.018011 × 109 |
Q3 | 2.0180115 × 109 |
95-th percentile | 2.0180119 × 109 |
Maximum | 2.018012 × 109 |
Range | 1913 |
Interquartile range (IQR) | 994.25 |
Descriptive statistics
Standard deviation | 562.16649 |
---|---|
Coefficient of variation (CV) | 2.7857453 × 10-7 |
Kurtosis | -1.1991596 |
Mean | 2.018011 × 109 |
Median Absolute Deviation (MAD) | 497 |
Skewness | -0.0018689884 |
Sum | 2.018011 × 1013 |
Variance | 316031.16 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
2018011301 | 38 | 0.4% |
2018010410 | 34 | 0.3% |
2018010704 | 33 | 0.3% |
2018010708 | 33 | 0.3% |
2018011715 | 33 | 0.3% |
2018011806 | 32 | 0.3% |
2018011303 | 31 | 0.3% |
2018010403 | 31 | 0.3% |
2018011815 | 31 | 0.3% |
2018010311 | 30 | 0.3% |
Other values (460) | 9674 |
Value | Count | Frequency (%) |
2018010100 | 19 | |
2018010101 | 19 | |
2018010102 | 27 | |
2018010103 | 17 | |
2018010104 | 22 | |
2018010105 | 20 | |
2018010106 | 12 | |
2018010107 | 24 | |
2018010108 | 22 | |
2018010109 | 18 |
Value | Count | Frequency (%) |
2018012013 | 13 | |
2018012012 | 23 | |
2018012011 | 22 | |
2018012010 | 23 | |
2018012009 | 19 | |
2018012008 | 23 | |
2018012007 | 14 | |
2018012006 | 20 | |
2018012005 | 18 | |
2018012004 | 17 |
측정소 코드
Real number (ℝ)
Distinct | 25 |
---|---|
Distinct (%) | 0.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 112.9199 |
Minimum | 101 |
---|---|
Maximum | 125 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 101 |
---|---|
5-th percentile | 102 |
Q1 | 107 |
median | 113 |
Q3 | 119 |
95-th percentile | 124 |
Maximum | 125 |
Range | 24 |
Interquartile range (IQR) | 12 |
Descriptive statistics
Standard deviation | 7.2346332 |
---|---|
Coefficient of variation (CV) | 0.064068718 |
Kurtosis | -1.2279988 |
Mean | 112.9199 |
Median Absolute Deviation (MAD) | 6 |
Skewness | -6.2770396 × 10-5 |
Sum | 1129199 |
Variance | 52.339918 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=25)
Value | Count | Frequency (%) |
102 | 470 | 4.7% |
120 | 453 | 4.5% |
106 | 442 | 4.4% |
104 | 432 | 4.3% |
117 | 426 | 4.3% |
114 | 422 | 4.2% |
124 | 415 | 4.2% |
119 | 410 | 4.1% |
107 | 406 | 4.1% |
109 | 402 | 4.0% |
Other values (15) | 5722 |
Value | Count | Frequency (%) |
101 | 382 | |
102 | 470 | |
103 | 387 | |
104 | 432 | |
105 | 371 | |
106 | 442 | |
107 | 406 | |
108 | 390 | |
109 | 402 | |
110 | 381 |
Value | Count | Frequency (%) |
125 | 360 | |
124 | 415 | |
123 | 375 | |
122 | 398 | |
121 | 401 | |
120 | 453 | |
119 | 410 | |
118 | 397 | |
117 | 426 | |
116 | 359 |
측정항목
Real number (ℝ)
HIGH CORRELATION
 
Distinct | 6 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5.3203 |
Minimum | 1 |
---|---|
Maximum | 9 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
median | 5 |
Q3 | 8 |
95-th percentile | 9 |
Maximum | 9 |
Range | 8 |
Interquartile range (IQR) | 5 |
Descriptive statistics
Standard deviation | 2.7548987 |
---|---|
Coefficient of variation (CV) | 0.5178089 |
Kurtosis | -1.2163683 |
Mean | 5.3203 |
Median Absolute Deviation (MAD) | 3 |
Skewness | -0.19620904 |
Sum | 53203 |
Variance | 7.5894669 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
Value | Count | Frequency (%) |
5 | 1685 | |
1 | 1683 | |
3 | 1675 | |
9 | 1672 | |
8 | 1656 | |
6 | 1629 |
Value | Count | Frequency (%) |
1 | 1683 | |
3 | 1675 | |
5 | 1685 | |
6 | 1629 | |
8 | 1656 | |
9 | 1672 |
Value | Count | Frequency (%) |
9 | 1672 | |
8 | 1656 | |
6 | 1629 | |
5 | 1685 | |
3 | 1675 | |
1 | 1683 |
평균값
Real number (ℝ)
HIGH CORRELATION
  SKEWED
  ZEROS
 
Distinct | 265 |
---|---|
Distinct (%) | 2.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -2.9925596 |
Minimum | -9999 |
---|---|
Maximum | 3514 |
Zeros | 222 |
Zeros (%) | 2.2% |
Negative | 24 |
Negative (%) | 0.2% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | -9999 |
---|---|
5-th percentile | 0.002 |
Q1 | 0.008 |
median | 0.067 |
Q3 | 25 |
95-th percentile | 76 |
Maximum | 3514 |
Range | 13513 |
Interquartile range (IQR) | 24.992 |
Descriptive statistics
Standard deviation | 439.80235 |
---|---|
Coefficient of variation (CV) | -146.96528 |
Kurtosis | 505.18592 |
Mean | -2.9925596 |
Median Absolute Deviation (MAD) | 0.067 |
Skewness | -22.212686 |
Sum | -29925.596 |
Variance | 193426.11 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
0.005 | 474 | 4.7% |
0.006 | 441 | 4.4% |
0.004 | 412 | 4.1% |
0.007 | 297 | 3.0% |
0.5 | 279 | 2.8% |
0.002 | 275 | 2.8% |
0.003 | 247 | 2.5% |
0.6 | 232 | 2.3% |
0.4 | 226 | 2.3% |
0.0 | 222 | 2.2% |
Other values (255) | 6895 |
Value | Count | Frequency (%) |
-9999.0 | 19 | 0.2% |
-433.0 | 1 | < 0.1% |
-349.0 | 1 | < 0.1% |
-50.0 | 1 | < 0.1% |
-1.0 | 1 | < 0.1% |
-0.2 | 1 | < 0.1% |
0.0 | 222 | |
0.001 | 49 | 0.5% |
0.002 | 275 | |
0.003 | 247 |
Value | Count | Frequency (%) |
3514.0 | 1 | |
3487.0 | 1 | |
161.0 | 1 | |
160.0 | 1 | |
157.0 | 1 | |
155.0 | 1 | |
154.0 | 1 | |
149.0 | 2 | |
147.0 | 1 | |
146.0 | 2 |
측정기 상태
Real number (ℝ)
ZEROS
 
Distinct | 6 |
---|---|
Distinct (%) | 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.2425 |
Minimum | 0 |
---|---|
Maximum | 9 |
Zeros | 9626 |
Zeros (%) | 96.3% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 166.0 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 0 |
95-th percentile | 0 |
Maximum | 9 |
Range | 9 |
Interquartile range (IQR) | 0 |
Descriptive statistics
Standard deviation | 1.3556096 |
---|---|
Coefficient of variation (CV) | 5.5901429 |
Kurtosis | 30.393485 |
Mean | 0.2425 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 5.6514277 |
Sum | 2425 |
Variance | 1.8376775 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
Value | Count | Frequency (%) |
0 | 9626 | |
8 | 214 | 2.1% |
1 | 68 | 0.7% |
9 | 59 | 0.6% |
4 | 24 | 0.2% |
2 | 9 | 0.1% |
Value | Count | Frequency (%) |
0 | 9626 | |
1 | 68 | 0.7% |
2 | 9 | 0.1% |
4 | 24 | 0.2% |
8 | 214 | 2.1% |
9 | 59 | 0.6% |
Value | Count | Frequency (%) |
9 | 59 | 0.6% |
8 | 214 | 2.1% |
4 | 24 | 0.2% |
2 | 9 | 0.1% |
1 | 68 | 0.7% |
0 | 9626 |
국가 기준초과 구분
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
0 | |
---|---|
1 | 519 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 1 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 9481 | |
1 | 519 | 5.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 9481 | |
1 | 519 | 5.2% |
지자체 기준초과 구분
Categorical
HIGH CORRELATION
  IMBALANCE
 
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 156.2 KiB |
0 | |
---|---|
1 | 519 |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 0 |
---|---|
2nd row | 0 |
3rd row | 1 |
4th row | 0 |
5th row | 0 |
Common Values
Value | Count | Frequency (%) |
0 | 9481 | |
1 | 519 | 5.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Value | Count | Frequency (%) |
0 | 9481 | |
1 | 519 | 5.2% |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
측정일시 | 1.000 | 0.000 | 0.000 | 0.040 | 0.142 | 0.456 | 0.456 |
측정소 코드 | 0.000 | 1.000 | 0.000 | 0.050 | 0.431 | 0.003 | 0.003 |
측정항목 | 0.000 | 0.000 | 1.000 | 0.031 | 0.108 | 0.514 | 0.514 |
평균값 | 0.040 | 0.050 | 0.031 | 1.000 | 0.253 | 0.068 | 0.068 |
측정기 상태 | 0.142 | 0.431 | 0.108 | 0.253 | 1.000 | 0.044 | 0.044 |
국가 기준초과 구분 | 0.456 | 0.003 | 0.514 | 0.068 | 0.044 | 1.000 | 1.000 |
지자체 기준초과 구분 | 0.456 | 0.003 | 0.514 | 0.068 | 0.044 | 1.000 | 1.000 |
지자체 기준초과 구분 | 국가 기준초과 구분 | |
---|---|---|
지자체 기준초과 구분 | 1.000 | 0.999 |
국가 기준초과 구분 | 0.999 | 1.000 |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
측정일시 | 1.000 | 0.004 | -0.002 | 0.090 | -0.084 | 0.351 | 0.351 |
측정소 코드 | 0.004 | 1.000 | -0.008 | 0.034 | -0.152 | 0.002 | 0.002 |
측정항목 | -0.002 | -0.008 | 1.000 | 0.685 | 0.009 | 0.372 | 0.372 |
평균값 | 0.090 | 0.034 | 0.685 | 1.000 | -0.212 | 0.060 | 0.060 |
측정기 상태 | -0.084 | -0.152 | 0.009 | -0.212 | 1.000 | 0.031 | 0.031 |
국가 기준초과 구분 | 0.351 | 0.002 | 0.372 | 0.060 | 0.031 | 1.000 | 0.999 |
지자체 기준초과 구분 | 0.351 | 0.002 | 0.372 | 0.060 | 0.031 | 0.999 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
8680 | 2018010309 | 122 | 8 | 51.0 | 0 | 0 | 0 |
12441 | 2018010410 | 124 | 6 | 0.014 | 0 | 0 | 0 |
57713 | 2018011700 | 119 | 9 | 102.0 | 0 | 1 | 1 |
33711 | 2018011008 | 119 | 6 | 0.009 | 0 | 0 | 0 |
17181 | 2018010518 | 114 | 6 | 0.026 | 0 | 0 | 0 |
15552 | 2018010507 | 118 | 1 | 0.006 | 0 | 0 | 0 |
53966 | 2018011523 | 120 | 5 | 1.2 | 0 | 0 | 0 |
43104 | 2018011223 | 110 | 1 | 0.006 | 0 | 0 | 0 |
1704 | 2018010111 | 110 | 1 | 0.007 | 0 | 0 | 0 |
30599 | 2018010911 | 125 | 9 | 22.0 | 0 | 0 | 0 |
측정일시 | 측정소 코드 | 측정항목 | 평균값 | 측정기 상태 | 국가 기준초과 구분 | 지자체 기준초과 구분 | |
---|---|---|---|---|---|---|---|
31274 | 2018010916 | 113 | 5 | 0.6 | 0 | 0 | 0 |
59450 | 2018011712 | 109 | 5 | 0.7 | 0 | 0 | 0 |
3095 | 2018010120 | 116 | 9 | 17.0 | 0 | 0 | 0 |
56097 | 2018011613 | 125 | 6 | 0.014 | 0 | 0 | 0 |
27102 | 2018010812 | 118 | 1 | 0.007 | 0 | 0 | 0 |
35389 | 2018011019 | 124 | 3 | 0.014 | 0 | 0 | 0 |
37369 | 2018011109 | 104 | 3 | 0.025 | 0 | 0 | 0 |
4215 | 2018010204 | 103 | 6 | 0.002 | 0 | 0 | 0 |
39774 | 2018011201 | 105 | 1 | 0.001 | 8 | 0 | 0 |
62858 | 2018011811 | 102 | 5 | 0.9 | 0 | 0 | 0 |