Overview

Dataset statistics

Number of variables8
Number of observations126
Missing cells6
Missing cells (%)0.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.5 KiB
Average record size in memory69.0 B

Variable types

Categorical3
Text1
Numeric4

Alerts

소재지우편번호 is highly overall correlated with WGS84위도 and 1 other fieldsHigh correlation
WGS84위도 is highly overall correlated with 소재지우편번호 and 1 other fieldsHigh correlation
WGS84경도 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 소재지우편번호 and 2 other fieldsHigh correlation
측정망명 is highly imbalanced (60.5%)Imbalance
측정항목내역 is highly imbalanced (88.2%)Imbalance
소재지우편번호 has 4 (3.2%) missing valuesMissing
측정소명 has unique valuesUnique

Reproduction

Analysis started2023-12-10 21:41:29.386832
Analysis finished2023-12-10 21:41:31.173173
Duration1.79 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시군명
Categorical

HIGH CORRELATION 

Distinct32
Distinct (%)25.4%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
화성시
 
8
성남시
 
8
안산시
 
8
수원시
 
8
용인시
 
7
Other values (27)
87 

Length

Max length4
Median length3
Mean length3.0793651
Min length2

Unique

Unique2 ?
Unique (%)1.6%

Sample

1st row가평군
2nd row가평군
3rd row고양시
4th row고양시
5th row고양시

Common Values

ValueCountFrequency (%)
화성시 8
 
6.3%
성남시 8
 
6.3%
안산시 8
 
6.3%
수원시 8
 
6.3%
용인시 7
 
5.6%
시흥시 7
 
5.6%
남양주시 7
 
5.6%
평택시 6
 
4.8%
부천시 5
 
4.0%
고양시 5
 
4.0%
Other values (22) 57
45.2%

Length

2023-12-11T06:41:31.231578image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
화성시 8
 
6.3%
수원시 8
 
6.3%
성남시 8
 
6.3%
안산시 8
 
6.3%
용인시 7
 
5.6%
시흥시 7
 
5.6%
남양주시 7
 
5.6%
평택시 6
 
4.8%
고양시 5
 
4.0%
김포시 5
 
4.0%
Other values (22) 57
45.2%

측정소명
Text

UNIQUE 

Distinct126
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
2023-12-11T06:41:31.502039image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length3
Mean length3.4920635
Min length2

Characters and Unicode

Total characters440
Distinct characters138
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique126 ?
Unique (%)100.0%

Sample

1st row설악면
2nd row가평
3rd row식사동
4th row행신동
5th row주엽동
ValueCountFrequency (%)
설악면 1
 
0.8%
용문면 1
 
0.8%
수지 1
 
0.8%
중부대로(구갈동 1
 
0.8%
이동읍 1
 
0.8%
백암면 1
 
0.8%
모현읍 1
 
0.8%
금암로(신장동 1
 
0.8%
오산동 1
 
0.8%
전곡 1
 
0.8%
Other values (116) 116
92.1%
2023-12-11T06:41:31.900139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
78
 
17.7%
20
 
4.5%
11
 
2.5%
11
 
2.5%
) 11
 
2.5%
11
 
2.5%
( 11
 
2.5%
10
 
2.3%
9
 
2.0%
8
 
1.8%
Other values (128) 260
59.1%

Most occurring categories

ValueCountFrequency (%)
Other Letter 407
92.5%
Close Punctuation 11
 
2.5%
Open Punctuation 11
 
2.5%
Decimal Number 8
 
1.8%
Uppercase Letter 3
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
78
 
19.2%
20
 
4.9%
11
 
2.7%
11
 
2.7%
11
 
2.7%
10
 
2.5%
9
 
2.2%
8
 
2.0%
8
 
2.0%
7
 
1.7%
Other values (119) 234
57.5%
Decimal Number
ValueCountFrequency (%)
1 3
37.5%
2 2
25.0%
3 2
25.0%
8 1
 
12.5%
Uppercase Letter
ValueCountFrequency (%)
M 1
33.3%
Z 1
33.3%
D 1
33.3%
Close Punctuation
ValueCountFrequency (%)
) 11
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 407
92.5%
Common 30
 
6.8%
Latin 3
 
0.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
78
 
19.2%
20
 
4.9%
11
 
2.7%
11
 
2.7%
11
 
2.7%
10
 
2.5%
9
 
2.2%
8
 
2.0%
8
 
2.0%
7
 
1.7%
Other values (119) 234
57.5%
Common
ValueCountFrequency (%)
) 11
36.7%
( 11
36.7%
1 3
 
10.0%
2 2
 
6.7%
3 2
 
6.7%
8 1
 
3.3%
Latin
ValueCountFrequency (%)
M 1
33.3%
Z 1
33.3%
D 1
33.3%

Most occurring blocks

ValueCountFrequency (%)
Hangul 407
92.5%
ASCII 33
 
7.5%

Most frequent character per block

Hangul
ValueCountFrequency (%)
78
 
19.2%
20
 
4.9%
11
 
2.7%
11
 
2.7%
11
 
2.7%
10
 
2.5%
9
 
2.2%
8
 
2.0%
8
 
2.0%
7
 
1.7%
Other values (119) 234
57.5%
ASCII
ValueCountFrequency (%)
) 11
33.3%
( 11
33.3%
1 3
 
9.1%
2 2
 
6.1%
3 2
 
6.1%
M 1
 
3.0%
Z 1
 
3.0%
D 1
 
3.0%
8 1
 
3.0%

소재지우편번호
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct122
Distinct (%)100.0%
Missing4
Missing (%)3.2%
Infinite0
Infinite (%)0.0%
Mean14608.131
Minimum10024
Maximum44785
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2023-12-11T06:41:32.044558image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10024
5-th percentile10390.4
Q112187.75
median14442.5
Q316665
95-th percentile18313.3
Maximum44785
Range34761
Interquartile range (IQR)4477.25

Descriptive statistics

Standard deviation3775.0162
Coefficient of variation (CV)0.25841884
Kurtosis32.798918
Mean14608.131
Median Absolute Deviation (MAD)2253
Skewness4.2015766
Sum1782192
Variance14250748
MonotonicityNot monotonic
2023-12-11T06:41:32.181738image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11653 1
 
0.8%
16075 1
 
0.8%
17049 1
 
0.8%
16963 1
 
0.8%
16832 1
 
0.8%
17136 1
 
0.8%
17178 1
 
0.8%
17036 1
 
0.8%
18109 1
 
0.8%
18131 1
 
0.8%
Other values (112) 112
88.9%
(Missing) 4
 
3.2%
ValueCountFrequency (%)
10024 1
0.8%
10062 1
0.8%
10079 1
0.8%
10108 1
0.8%
10123 1
0.8%
10317 1
0.8%
10385 1
0.8%
10493 1
0.8%
10567 1
0.8%
10800 1
0.8%
ValueCountFrequency (%)
44785 1
0.8%
18592 1
0.8%
18563 1
0.8%
18555 1
0.8%
18483 1
0.8%
18443 1
0.8%
18316 1
0.8%
18262 1
0.8%
18242 1
0.8%
18131 1
0.8%

WGS84위도
Real number (ℝ)

HIGH CORRELATION 

Distinct125
Distinct (%)100.0%
Missing1
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean37.419962
Minimum35.496252
Maximum38.129572
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2023-12-11T06:41:32.316907image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum35.496252
5-th percentile37.027175
Q137.275269
median37.380257
Q337.619306
95-th percentile37.895736
Maximum38.129572
Range2.6333196
Interquartile range (IQR)0.344037

Descriptive statistics

Standard deviation0.30710963
Coefficient of variation (CV)0.008207107
Kurtosis11.494274
Mean37.419962
Median Absolute Deviation (MAD)0.15878721
Skewness-1.5905395
Sum4677.4953
Variance0.094316326
MonotonicityNot monotonic
2023-12-11T06:41:32.441142image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37.6761934621 1
 
0.8%
37.2986748663 1
 
0.8%
37.2343602006 1
 
0.8%
37.2804510231 1
 
0.8%
37.3279697009 1
 
0.8%
37.2752648209 1
 
0.8%
37.1411233252 1
 
0.8%
37.1636573698 1
 
0.8%
37.3294550112 1
 
0.8%
37.1709788878 1
 
0.8%
Other values (115) 115
91.3%
ValueCountFrequency (%)
35.4962524156 1
0.8%
36.9746382929 1
0.8%
36.9857568173 1
0.8%
36.9910624271 1
0.8%
37.0010143881 1
0.8%
37.0083010751 1
0.8%
37.0206736777 1
0.8%
37.0531793306 1
0.8%
37.074674853 1
0.8%
37.0816629899 1
0.8%
ValueCountFrequency (%)
38.1295719846 1
0.8%
38.1237556519 1
0.8%
38.0964663782 1
0.8%
38.0279865448 1
0.8%
37.961273728 1
0.8%
37.9176330163 1
0.8%
37.9063047665 1
0.8%
37.8534602094 1
0.8%
37.8311224444 1
0.8%
37.8308927006 1
0.8%

WGS84경도
Real number (ℝ)

HIGH CORRELATION 

Distinct125
Distinct (%)100.0%
Missing1
Missing (%)0.8%
Infinite0
Infinite (%)0.0%
Mean127.05035
Minimum126.55287
Maximum129.3405
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2023-12-11T06:41:32.556280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.55287
5-th percentile126.71832
Q1126.83303
median127.03015
Q3127.16439
95-th percentile127.53778
Maximum129.3405
Range2.7876311
Interquartile range (IQR)0.33136354

Descriptive statistics

Standard deviation0.31583603
Coefficient of variation (CV)0.0024859124
Kurtosis21.268112
Mean127.05035
Median Absolute Deviation (MAD)0.16906327
Skewness3.2660303
Sum15881.294
Variance0.099752398
MonotonicityNot monotonic
2023-12-11T06:41:32.674644image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
127.4946928236 1
 
0.8%
127.6292949206 1
 
0.8%
127.2012931829 1
 
0.8%
127.114702591 1
 
0.8%
127.0950115914 1
 
0.8%
127.1136372415 1
 
0.8%
127.1961991903 1
 
0.8%
127.3746861091 1
 
0.8%
127.2423886045 1
 
0.8%
127.0515715149 1
 
0.8%
Other values (115) 115
91.3%
ValueCountFrequency (%)
126.5528707862 1
0.8%
126.5852863736 1
0.8%
126.6296887577 1
0.8%
126.6750801941 1
0.8%
126.7086689523 1
0.8%
126.7106059965 1
0.8%
126.7167475232 1
0.8%
126.7246017904 1
0.8%
126.7311872482 1
0.8%
126.733801719 1
0.8%
ValueCountFrequency (%)
129.340501846 1
0.8%
127.6296142576 1
0.8%
127.6292949206 1
0.8%
127.5962160807 1
0.8%
127.5844464327 1
0.8%
127.5471834699 1
0.8%
127.5448724152 1
0.8%
127.5094206698 1
0.8%
127.4946928236 1
0.8%
127.4867072092 1
0.8%

설치년도
Real number (ℝ)

Distinct33
Distinct (%)26.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2007.2381
Minimum1986
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.2 KiB
2023-12-11T06:41:32.794724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1986
5-th percentile1987.75
Q12000
median2005.5
Q32019
95-th percentile2020
Maximum2022
Range36
Interquartile range (IQR)19

Descriptive statistics

Standard deviation10.367973
Coefficient of variation (CV)0.0051652929
Kurtosis-1.0197883
Mean2007.2381
Median Absolute Deviation (MAD)7.5
Skewness-0.21005915
Sum252912
Variance107.49486
MonotonicityNot monotonic
2023-12-11T06:41:32.922253image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=33)
ValueCountFrequency (%)
2020 17
 
13.5%
2019 13
 
10.3%
2004 8
 
6.3%
2001 8
 
6.3%
1998 7
 
5.6%
2018 7
 
5.6%
2002 5
 
4.0%
2000 5
 
4.0%
2003 5
 
4.0%
1999 4
 
3.2%
Other values (23) 47
37.3%
ValueCountFrequency (%)
1986 4
3.2%
1987 3
2.4%
1990 1
 
0.8%
1991 1
 
0.8%
1992 2
1.6%
1993 2
1.6%
1994 1
 
0.8%
1995 2
1.6%
1996 1
 
0.8%
1997 1
 
0.8%
ValueCountFrequency (%)
2022 2
 
1.6%
2021 1
 
0.8%
2020 17
13.5%
2019 13
10.3%
2018 7
5.6%
2017 3
 
2.4%
2015 1
 
0.8%
2013 1
 
0.8%
2012 2
 
1.6%
2011 2
 
1.6%

측정망명
Categorical

IMBALANCE 

Distinct3
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
도시대기
111 
도로변대기
 
11
교외대기
 
4

Length

Max length5
Median length4
Mean length4.0873016
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row도시대기
2nd row도시대기
3rd row도시대기
4th row도시대기
5th row도시대기

Common Values

ValueCountFrequency (%)
도시대기 111
88.1%
도로변대기 11
 
8.7%
교외대기 4
 
3.2%

Length

2023-12-11T06:41:33.037953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:41:33.126011image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
도시대기 111
88.1%
도로변대기 11
 
8.7%
교외대기 4
 
3.2%

측정항목내역
Categorical

IMBALANCE 

Distinct2
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
SO2, CO, O3, NO2, PM10, PM2.5
124 
SO2, CO, NO2, PM10, PM2.5
 
2

Length

Max length29
Median length29
Mean length28.936508
Min length25

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSO2, CO, O3, NO2, PM10, PM2.5
2nd rowSO2, CO, O3, NO2, PM10, PM2.5
3rd rowSO2, CO, O3, NO2, PM10, PM2.5
4th rowSO2, CO, O3, NO2, PM10, PM2.5
5th rowSO2, CO, O3, NO2, PM10, PM2.5

Common Values

ValueCountFrequency (%)
SO2, CO, O3, NO2, PM10, PM2.5 124
98.4%
SO2, CO, NO2, PM10, PM2.5 2
 
1.6%

Length

2023-12-11T06:41:33.235647image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T06:41:33.327140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
so2 126
16.7%
co 126
16.7%
no2 126
16.7%
pm10 126
16.7%
pm2.5 126
16.7%
o3 124
16.4%

Interactions

2023-12-11T06:41:30.602266image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:29.719094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.009656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.287298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.688470image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:29.791880image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.081005image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.380534image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.764852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:29.858803image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.143296image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.453777image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.841388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:29.931506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.214061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T06:41:30.529431image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T06:41:33.387536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명소재지우편번호WGS84위도WGS84경도설치년도측정망명측정항목내역
시군명1.0000.9990.9860.9760.0400.0000.000
소재지우편번호0.9991.0000.9140.6900.3360.0000.000
WGS84위도0.9860.9141.0000.6300.0400.6620.000
WGS84경도0.9760.6900.6301.0000.0000.0000.000
설치년도0.0400.3360.0400.0001.0000.0000.416
측정망명0.0000.0000.6620.0000.0001.0000.242
측정항목내역0.0000.0000.0000.0000.4160.2421.000
2023-12-11T06:41:33.481359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시군명측정망명측정항목내역
시군명1.0000.0000.000
측정망명0.0001.0000.392
측정항목내역0.0000.3921.000
2023-12-11T06:41:33.566477image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
소재지우편번호WGS84위도WGS84경도설치년도시군명측정망명측정항목내역
소재지우편번호1.000-0.9230.055-0.0330.8460.0000.000
WGS84위도-0.9231.000-0.0960.0010.8150.3460.000
WGS84경도0.055-0.0961.0000.2170.7850.0000.000
설치년도-0.0330.0010.2171.0000.0910.0000.312
시군명0.8460.8150.7850.0911.0000.0000.000
측정망명0.0000.3460.0000.0000.0001.0000.392
측정항목내역0.0000.0000.0000.3120.0000.3921.000

Missing values

2023-12-11T06:41:30.948408image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T06:41:31.053175image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T06:41:31.130945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

시군명측정소명소재지우편번호WGS84위도WGS84경도설치년도측정망명측정항목내역
0가평군설악면1246737.676193127.4946932020도시대기SO2, CO, O3, NO2, PM10, PM2.5
1가평군가평1241737.831122127.5094212010도시대기SO2, CO, O3, NO2, PM10, PM2.5
2고양시식사동1031737.6854126.8136112002도시대기SO2, CO, O3, NO2, PM10, PM2.5
3고양시행신동1049337.625182126.8419971998도시대기SO2, CO, O3, NO2, PM10, PM2.5
4고양시주엽동1038537.66846126.7565062018도시대기SO2, CO, O3, NO2, PM10, PM2.5
5고양시신원동1056737.666386126.8863642015도시대기SO2, CO, O3, NO2, PM10, PM2.5
6고양시백마로(마두역)<NA>37.654409126.7754912004도로변대기SO2, CO, O3, NO2, PM10, PM2.5
7과천시과천동1381537.44869127.0023852000도시대기SO2, CO, O3, NO2, PM10, PM2.5
8과천시별양동1383437.424023126.9949811991도시대기SO2, CO, O3, NO2, PM10, PM2.5
9광명시소하동1432037.445457126.8879921998도시대기SO2, CO, O3, NO2, PM10, PM2.5
시군명측정소명소재지우편번호WGS84위도WGS84경도설치년도측정망명측정항목내역
116하남시신장동1295137.539044127.2155332001도시대기SO2, CO, O3, NO2, PM10, PM2.5
117하남시미사1290937.567164127.1860642020도시대기SO2, CO, O3, NO2, PM10, PM2.5
118화성시봉담읍1831637.219412126.9490542020도시대기SO2, CO, O3, NO2, PM10, PM2.5
119화성시남양읍1826237.211759126.8237862003도시대기SO2, CO, O3, NO2, PM10, PM2.5
120화성시서신면1855537.16576126.7086692020도시대기SO2, CO, O3, NO2, PM10, PM2.5
121화성시새솔동1824237.2812126.8187772020도시대기SO2, CO, O3, NO2, PM10, PM2.5
122화성시우정읍1856337.089814126.8153852018도시대기SO2, CO, O3, NO2, PM10, PM2.5
123화성시향남읍1859237.132649126.9202542004도시대기SO2, CO, O3, NO2, PM10, PM2.5
124화성시청계동1848337.196555127.1193772018도시대기SO2, CO, O3, NO2, PM10, PM2.5
125화성시동탄1844337.196944127.0723092008도시대기SO2, CO, O3, NO2, PM10, PM2.5