Overview

Dataset statistics

Number of variables5
Number of observations1662
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory69.9 KiB
Average record size in memory43.1 B

Variable types

Numeric3
Text1
Categorical1

Dataset

Description서울교통공사 실시간 공기질 모니터링 시스템의 역 측정 데이터를 역별로 공기질 측정항목별 순위를 산출한 데이터 기준일자: 2022년 6월 9일
URLhttps://www.data.go.kr/data/15118428/fileData.do

Alerts

호선 is highly overall correlated with 역코드High correlation
역코드 is highly overall correlated with 호선High correlation

Reproduction

Analysis started2023-12-12 23:15:41.754497
Analysis finished2023-12-12 23:15:43.033436
Duration1.28 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Real number (ℝ)

HIGH CORRELATION 

Distinct8
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.6642599
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.7 KiB
2023-12-13T08:15:43.090565image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q13
median5
Q36
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.025523
Coefficient of variation (CV)0.43426461
Kurtosis-1.2108085
Mean4.6642599
Median Absolute Deviation (MAD)2
Skewness-0.09946163
Sum7752
Variance4.1027436
MonotonicityNot monotonic
2023-12-13T08:15:43.223136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5 306
18.4%
7 306
18.4%
2 300
18.1%
6 228
13.7%
3 204
12.3%
4 156
9.4%
8 102
 
6.1%
1 60
 
3.6%
ValueCountFrequency (%)
1 60
 
3.6%
2 300
18.1%
3 204
12.3%
4 156
9.4%
5 306
18.4%
6 228
13.7%
7 306
18.4%
8 102
 
6.1%
ValueCountFrequency (%)
8 102
 
6.1%
7 306
18.4%
6 228
13.7%
5 306
18.4%
4 156
9.4%
3 204
12.3%
2 300
18.1%
1 60
 
3.6%

역코드
Real number (ℝ)

HIGH CORRELATION 

Distinct277
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1630.1877
Minimum150
Maximum2827
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.7 KiB
2023-12-13T08:15:43.383735image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum150
5-th percentile204.05
Q1318
median2529
Q32647
95-th percentile2813.95
Maximum2827
Range2677
Interquartile range (IQR)2329

Descriptive statistics

Standard deviation1175.4841
Coefficient of variation (CV)0.72107284
Kurtosis-1.9062458
Mean1630.1877
Median Absolute Deviation (MAD)231
Skewness-0.26445666
Sum2709372
Variance1381762.9
MonotonicityNot monotonic
2023-12-13T08:15:43.544476image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
150 6
 
0.4%
2623 6
 
0.4%
2629 6
 
0.4%
2628 6
 
0.4%
2627 6
 
0.4%
2626 6
 
0.4%
2625 6
 
0.4%
2624 6
 
0.4%
2622 6
 
0.4%
2614 6
 
0.4%
Other values (267) 1602
96.4%
ValueCountFrequency (%)
150 6
0.4%
151 6
0.4%
152 6
0.4%
153 6
0.4%
154 6
0.4%
155 6
0.4%
156 6
0.4%
157 6
0.4%
158 6
0.4%
159 6
0.4%
ValueCountFrequency (%)
2827 6
0.4%
2826 6
0.4%
2825 6
0.4%
2824 6
0.4%
2823 6
0.4%
2822 6
0.4%
2821 6
0.4%
2820 6
0.4%
2819 6
0.4%
2818 6
0.4%

역명
Text

Distinct277
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Memory size13.1 KiB
2023-12-13T08:15:43.911478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length5
Mean length5.9711191
Min length5

Characters and Unicode

Total characters9924
Distinct characters217
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울(1)
2nd row시청(1)
3rd row종각(1)
4th row종로3가(1)
5th row종로5가(1)
ValueCountFrequency (%)
서울(1 6
 
0.4%
연신내(6 6
 
0.4%
삼각지(6 6
 
0.4%
효창공원앞(6 6
 
0.4%
공덕(6 6
 
0.4%
대흥(6 6
 
0.4%
광흥창(6 6
 
0.4%
상수(6 6
 
0.4%
망원(6 6
 
0.4%
애오개(5 6
 
0.4%
Other values (267) 1602
96.4%
2023-12-13T08:15:44.435335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
( 1674
 
16.9%
) 1674
 
16.9%
5 312
 
3.1%
7 306
 
3.1%
2 300
 
3.0%
3 234
 
2.4%
6 228
 
2.3%
198
 
2.0%
180
 
1.8%
4 168
 
1.7%
Other values (207) 4650
46.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 4866
49.0%
Decimal Number 1710
 
17.2%
Open Punctuation 1674
 
16.9%
Close Punctuation 1674
 
16.9%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
198
 
4.1%
180
 
3.7%
150
 
3.1%
150
 
3.1%
108
 
2.2%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
Other values (197) 3630
74.6%
Decimal Number
ValueCountFrequency (%)
5 312
18.2%
7 306
17.9%
2 300
17.5%
3 234
13.7%
6 228
13.3%
4 168
9.8%
8 102
 
6.0%
1 60
 
3.5%
Open Punctuation
ValueCountFrequency (%)
( 1674
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1674
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 5058
51.0%
Hangul 4866
49.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
198
 
4.1%
180
 
3.7%
150
 
3.1%
150
 
3.1%
108
 
2.2%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
Other values (197) 3630
74.6%
Common
ValueCountFrequency (%)
( 1674
33.1%
) 1674
33.1%
5 312
 
6.2%
7 306
 
6.0%
2 300
 
5.9%
3 234
 
4.6%
6 228
 
4.5%
4 168
 
3.3%
8 102
 
2.0%
1 60
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5058
51.0%
Hangul 4866
49.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
( 1674
33.1%
) 1674
33.1%
5 312
 
6.2%
7 306
 
6.0%
2 300
 
5.9%
3 234
 
4.6%
6 228
 
4.5%
4 168
 
3.3%
8 102
 
2.0%
1 60
 
1.2%
Hangul
ValueCountFrequency (%)
198
 
4.1%
180
 
3.7%
150
 
3.1%
150
 
3.1%
108
 
2.2%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
90
 
1.8%
Other values (197) 3630
74.6%

구분
Categorical

Distinct6
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size13.1 KiB
PM2.5
277 
PM10
277 
CO2
277 
TEMPERATURE
277 
HUMIDITY
277 

Length

Max length11
Median length6.5
Mean length7
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPM2.5
2nd rowPM2.5
3rd rowPM2.5
4th rowPM2.5
5th rowPM2.5

Common Values

ValueCountFrequency (%)
PM2.5 277
16.7%
PM10 277
16.7%
CO2 277
16.7%
TEMPERATURE 277
16.7%
HUMIDITY 277
16.7%
AIRPRESSURE 277
16.7%

Length

2023-12-13T08:15:44.591463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T08:15:44.729774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
pm2.5 277
16.7%
pm10 277
16.7%
co2 277
16.7%
temperature 277
16.7%
humidity 277
16.7%
airpressure 277
16.7%


Real number (ℝ)

Distinct277
Distinct (%)16.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139
Minimum1
Maximum277
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size14.7 KiB
2023-12-13T08:15:44.872242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile14.05
Q170
median139
Q3208
95-th percentile263.95
Maximum277
Range276
Interquartile range (IQR)138

Descriptive statistics

Standard deviation79.986558
Coefficient of variation (CV)0.57544286
Kurtosis-1.2000305
Mean139
Median Absolute Deviation (MAD)69
Skewness0
Sum231018
Variance6397.8495
MonotonicityNot monotonic
2023-12-13T08:15:45.029999image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
112 6
 
0.4%
214 6
 
0.4%
72 6
 
0.4%
249 6
 
0.4%
221 6
 
0.4%
184 6
 
0.4%
227 6
 
0.4%
71 6
 
0.4%
68 6
 
0.4%
100 6
 
0.4%
Other values (267) 1602
96.4%
ValueCountFrequency (%)
1 6
0.4%
2 6
0.4%
3 6
0.4%
4 6
0.4%
5 6
0.4%
6 6
0.4%
7 6
0.4%
8 6
0.4%
9 6
0.4%
10 6
0.4%
ValueCountFrequency (%)
277 6
0.4%
276 6
0.4%
275 6
0.4%
274 6
0.4%
273 6
0.4%
272 6
0.4%
271 6
0.4%
270 6
0.4%
269 6
0.4%
268 6
0.4%

Interactions

2023-12-13T08:15:42.574001image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:41.970383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.247509image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.673958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.067552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.356588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.781453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.155892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-13T08:15:42.465199image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T08:15:45.151054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역코드구분
호선1.0000.9970.0000.152
역코드0.9971.0000.0000.174
구분0.0000.0001.0000.000
0.1520.1740.0001.000
2023-12-13T08:15:45.266619image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선역코드구분
호선1.0000.988-0.0220.000
역코드0.9881.000-0.0270.000
-0.022-0.0271.0000.000
구분0.0000.0000.0001.000

Missing values

2023-12-13T08:15:42.889791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T08:15:42.997235image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선역코드역명구분
01150서울(1)PM2.5112
11151시청(1)PM2.5183
21152종각(1)PM2.5265
31153종로3가(1)PM2.5244
41154종로5가(1)PM2.5246
51155동대문(1)PM2.5276
61156신설동(1)PM2.5218
71157제기동(1)PM2.5222
81158청량리(1)PM2.5203
91159동묘앞(1)PM2.548
호선역코드역명구분
165282818가락시장(8)AIRPRESSURE83
165382819문정(8)AIRPRESSURE84
165482820장지(8)AIRPRESSURE182
165582821복정(8)AIRPRESSURE169
165682822산성(8)AIRPRESSURE8
165782823남한산성입구(8)AIRPRESSURE9
165882824단대오거리(8)AIRPRESSURE17
165982825신흥(8)AIRPRESSURE6
166082826수진(8)AIRPRESSURE32
166182827모란(8)AIRPRESSURE135