Overview

Dataset statistics

Number of variables8
Number of observations303
Missing cells206
Missing cells (%)8.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.4 KiB
Average record size in memory65.4 B

Variable types

Categorical5
Text3

Dataset

Description파일 다운로드
Author서울교통공사
URLhttps://data.seoul.go.kr/dataList/OA-2732/F/1/datasetView.do

Alerts

측정 지점 is highly overall correlated with 측 정 항 목.4High correlation
측 정 항 목.4 is highly overall correlated with 측정 지점High correlation
시 설 명 (역사명) has 206 (68.0%) missing valuesMissing

Reproduction

Analysis started2024-04-29 22:00:15.739409
Analysis finished2024-04-29 22:00:16.330962
Duration0.59 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

호선
Categorical

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2
113 
3
90 
4
68 
1
30 
<NA>
 
2

Length

Max length4
Median length1
Mean length1.019802
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 113
37.3%
3 90
29.7%
4 68
22.4%
1 30
 
9.9%
<NA> 2
 
0.7%

Length

2024-04-30T07:00:16.398698image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T07:00:16.509246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2 113
37.3%
3 90
29.7%
4 68
22.4%
1 30
 
9.9%
na 2
 
0.7%
Distinct88
Distinct (%)90.7%
Missing206
Missing (%)68.0%
Memory size2.5 KiB
2024-04-30T07:00:16.753859image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length5
Mean length4.3608247
Min length3

Characters and Unicode

Total characters423
Distinct characters120
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique79 ?
Unique (%)81.4%

Sample

1st row서울역
2nd row시 청
3rd row종 각
4th row종로3가
5th row종로5가
ValueCountFrequency (%)
5
 
3.4%
5
 
3.4%
동대문 4
 
2.7%
3
 
2.0%
3
 
2.0%
3
 
2.0%
3
 
2.0%
2
 
1.4%
2
 
1.4%
2
 
1.4%
Other values (99) 115
78.2%
2024-04-30T07:00:17.131367image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
135
31.9%
17
 
4.0%
13
 
3.1%
13
 
3.1%
11
 
2.6%
10
 
2.4%
7
 
1.7%
7
 
1.7%
6
 
1.4%
6
 
1.4%
Other values (110) 198
46.8%

Most occurring categories

ValueCountFrequency (%)
Other Letter 279
66.0%
Space Separator 135
31.9%
Decimal Number 6
 
1.4%
Control 3
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
17
 
6.1%
13
 
4.7%
13
 
4.7%
11
 
3.9%
10
 
3.6%
7
 
2.5%
7
 
2.5%
6
 
2.2%
6
 
2.2%
5
 
1.8%
Other values (105) 184
65.9%
Decimal Number
ValueCountFrequency (%)
3 4
66.7%
5 1
 
16.7%
4 1
 
16.7%
Space Separator
ValueCountFrequency (%)
135
100.0%
Control
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 279
66.0%
Common 144
34.0%

Most frequent character per script

Hangul
ValueCountFrequency (%)
17
 
6.1%
13
 
4.7%
13
 
4.7%
11
 
3.9%
10
 
3.6%
7
 
2.5%
7
 
2.5%
6
 
2.2%
6
 
2.2%
5
 
1.8%
Other values (105) 184
65.9%
Common
ValueCountFrequency (%)
135
93.8%
3 4
 
2.8%
3
 
2.1%
5 1
 
0.7%
4 1
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
Hangul 279
66.0%
ASCII 144
34.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
135
93.8%
3 4
 
2.8%
3
 
2.1%
5 1
 
0.7%
4 1
 
0.7%
Hangul
ValueCountFrequency (%)
17
 
6.1%
13
 
4.7%
13
 
4.7%
11
 
3.9%
10
 
3.6%
7
 
2.5%
7
 
2.5%
6
 
2.2%
6
 
2.2%
5
 
1.8%
Other values (105) 184
65.9%

측정 지점
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
평 균
97 
승강장
97 
대합실
97 
환승통로
10 
<NA>
 
1

Length

Max length6
Median length3
Mean length3.3663366
Min length3

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row<NA>
2nd row공기질 기준
3rd row평 균
4th row승강장
5th row대합실

Common Values

ValueCountFrequency (%)
평 균 97
32.0%
승강장 97
32.0%
대합실 97
32.0%
환승통로 10
 
3.3%
<NA> 1
 
0.3%
공기질 기준 1
 
0.3%

Length

2024-04-30T07:00:17.265172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-30T07:00:17.366674image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
97
24.2%
97
24.2%
승강장 97
24.2%
대합실 97
24.2%
환승통로 10
 
2.5%
na 1
 
0.2%
공기질 1
 
0.2%
기준 1
 
0.2%
Distinct232
Distinct (%)76.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2024-04-30T07:00:17.700850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length4.1584158
Min length2

Characters and Unicode

Total characters1260
Distinct characters18
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique177 ?
Unique (%)58.4%

Sample

1st rowPM10
2nd row150㎍/㎥이하
3rd row91.3
4th row105.5
5th row77
ValueCountFrequency (%)
88.8 4
 
1.3%
86.9 4
 
1.3%
93.7 4
 
1.3%
81 3
 
1.0%
94.1 3
 
1.0%
94 3
 
1.0%
101.3 3
 
1.0%
95 3
 
1.0%
105.1 3
 
1.0%
88.4 3
 
1.0%
Other values (222) 270
89.1%
2024-04-30T07:00:18.200607image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
. 273
21.7%
1 194
15.4%
8 147
11.7%
9 134
10.6%
7 86
 
6.8%
0 80
 
6.3%
3 70
 
5.6%
2 69
 
5.5%
5 68
 
5.4%
4 67
 
5.3%
Other values (8) 72
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 980
77.8%
Other Punctuation 274
 
21.7%
Other Symbol 2
 
0.2%
Other Letter 2
 
0.2%
Uppercase Letter 2
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 194
19.8%
8 147
15.0%
9 134
13.7%
7 86
8.8%
0 80
8.2%
3 70
 
7.1%
2 69
 
7.0%
5 68
 
6.9%
4 67
 
6.8%
6 65
 
6.6%
Other Punctuation
ValueCountFrequency (%)
. 273
99.6%
/ 1
 
0.4%
Other Symbol
ValueCountFrequency (%)
1
50.0%
1
50.0%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Uppercase Letter
ValueCountFrequency (%)
P 1
50.0%
M 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1256
99.7%
Hangul 2
 
0.2%
Latin 2
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
. 273
21.7%
1 194
15.4%
8 147
11.7%
9 134
10.7%
7 86
 
6.8%
0 80
 
6.4%
3 70
 
5.6%
2 69
 
5.5%
5 68
 
5.4%
4 67
 
5.3%
Other values (4) 68
 
5.4%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Latin
ValueCountFrequency (%)
P 1
50.0%
M 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1256
99.7%
CJK Compat 2
 
0.2%
Hangul 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
. 273
21.7%
1 194
15.4%
8 147
11.7%
9 134
10.7%
7 86
 
6.8%
0 80
 
6.4%
3 70
 
5.6%
2 69
 
5.5%
5 68
 
5.4%
4 67
 
5.3%
Other values (4) 68
 
5.4%
CJK Compat
ValueCountFrequency (%)
1
50.0%
1
50.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct217
Distinct (%)71.6%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
2024-04-30T07:00:18.534520image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length3
Mean length3.3861386
Min length3

Characters and Unicode

Total characters1026
Distinct characters18
Distinct categories5 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique156 ?
Unique (%)51.5%

Sample

1st rowCO2
2nd row1,000ppm이하
3rd row522.5
4th row579
5th row466
ValueCountFrequency (%)
507 5
 
1.7%
468 4
 
1.3%
488 4
 
1.3%
482 4
 
1.3%
587 4
 
1.3%
484 3
 
1.0%
534 3
 
1.0%
603 3
 
1.0%
570 3
 
1.0%
464 3
 
1.0%
Other values (207) 267
88.1%
2024-04-30T07:00:18.980790image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
5 218
21.2%
4 186
18.1%
6 106
10.3%
7 71
 
6.9%
0 71
 
6.9%
8 69
 
6.7%
3 65
 
6.3%
9 61
 
5.9%
1 59
 
5.8%
2 57
 
5.6%
Other values (8) 63
 
6.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 963
93.9%
Other Punctuation 56
 
5.5%
Lowercase Letter 3
 
0.3%
Other Letter 2
 
0.2%
Uppercase Letter 2
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 218
22.6%
4 186
19.3%
6 106
11.0%
7 71
 
7.4%
0 71
 
7.4%
8 69
 
7.2%
3 65
 
6.7%
9 61
 
6.3%
1 59
 
6.1%
2 57
 
5.9%
Other Punctuation
ValueCountFrequency (%)
. 55
98.2%
, 1
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
p 2
66.7%
m 1
33.3%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Uppercase Letter
ValueCountFrequency (%)
C 1
50.0%
O 1
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1019
99.3%
Latin 5
 
0.5%
Hangul 2
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
5 218
21.4%
4 186
18.3%
6 106
10.4%
7 71
 
7.0%
0 71
 
7.0%
8 69
 
6.8%
3 65
 
6.4%
9 61
 
6.0%
1 59
 
5.8%
2 57
 
5.6%
Other values (2) 56
 
5.5%
Latin
ValueCountFrequency (%)
p 2
40.0%
m 1
20.0%
C 1
20.0%
O 1
20.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1024
99.8%
Hangul 2
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 218
21.3%
4 186
18.2%
6 106
10.4%
7 71
 
6.9%
0 71
 
6.9%
8 69
 
6.7%
3 65
 
6.3%
9 61
 
6.0%
1 59
 
5.8%
2 57
 
5.6%
Other values (6) 61
 
6.0%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct44
Distinct (%)14.5%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
13
51 
13.1
45 
13.2
35 
12.9
22 
13.3
18 
Other values (39)
132 

Length

Max length8
Median length4
Mean length3.6237624
Min length2

Unique

Unique20 ?
Unique (%)6.6%

Sample

1st rowHCHO
2nd row100㎍/㎥이하
3rd row19.2
4th row12.7
5th row25.6

Common Values

ValueCountFrequency (%)
13 51
16.8%
13.1 45
14.9%
13.2 35
11.6%
12.9 22
 
7.3%
13.3 18
 
5.9%
13.5 18
 
5.9%
13.4 17
 
5.6%
12.8 13
 
4.3%
25.9 9
 
3.0%
13.7 8
 
2.6%
Other values (34) 67
22.1%

Length

2024-04-30T07:00:19.108706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
13 51
16.8%
13.1 45
14.9%
13.2 35
11.6%
12.9 22
 
7.3%
13.3 18
 
5.9%
13.5 18
 
5.9%
13.4 17
 
5.6%
12.8 13
 
4.3%
25.9 9
 
3.0%
26 8
 
2.6%
Other values (34) 67
22.1%
Distinct19
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0.9
47 
1
41 
0.8
37 
1.1
32 
0.2
27 
Other values (14)
119 

Length

Max length7
Median length3
Mean length2.7392739
Min length1

Unique

Unique4 ?
Unique (%)1.3%

Sample

1st rowCO
2nd row10ppm이하
3rd row0.8
4th row0.7
5th row0.8

Common Values

ValueCountFrequency (%)
0.9 47
15.5%
1 41
13.5%
0.8 37
12.2%
1.1 32
10.6%
0.2 27
8.9%
0.7 23
7.6%
1.2 22
7.3%
1.3 14
 
4.6%
0.3 13
 
4.3%
0.6 12
 
4.0%
Other values (9) 35
11.6%

Length

2024-04-30T07:00:19.215856image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.9 47
15.5%
1 41
13.5%
0.8 37
12.2%
1.1 32
10.6%
0.2 27
8.9%
0.7 23
7.6%
1.2 22
7.3%
1.3 14
 
4.6%
0.3 13
 
4.3%
0.6 12
 
4.0%
Other values (9) 35
11.6%

측 정 항 목.4
Categorical

HIGH CORRELATION 

Distinct21
Distinct (%)6.9%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
0.0008
109 
0.0004
43 
0.0013
29 
0.0012
21 
0.0006
17 
Other values (16)
84 

Length

Max length10
Median length6
Mean length5.970297
Min length2

Unique

Unique4 ?
Unique (%)1.3%

Sample

1st row석면
2nd row0.01개/cc이하
3rd row0.0007
4th row0.0009
5th row0.0004

Common Values

ValueCountFrequency (%)
0.0008 109
36.0%
0.0004 43
 
14.2%
0.0013 29
 
9.6%
0.0012 21
 
6.9%
0.0006 17
 
5.6%
0.0009 17
 
5.6%
0.0017 15
 
5.0%
0.0011 12
 
4.0%
0.001 9
 
3.0%
0.0007 6
 
2.0%
Other values (11) 25
 
8.3%

Length

2024-04-30T07:00:19.333157image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.0008 109
36.0%
0.0004 43
 
14.2%
0.0013 29
 
9.6%
0.0012 21
 
6.9%
0.0006 17
 
5.6%
0.0009 17
 
5.6%
0.0017 15
 
5.0%
0.0011 12
 
4.0%
0.001 9
 
3.0%
0.0007 6
 
2.0%
Other values (11) 25
 
8.3%

Correlations

2024-04-30T07:00:19.408145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선시 설 명 (역사명)측정 지점측 정 항 목.2측 정 항 목.3측 정 항 목.4
호선1.0000.0000.0000.7520.5270.414
시 설 명\n(역사명)0.0001.000NaN0.0000.9790.727
측정\n지점0.000NaN1.0000.7860.7440.918
측 정 항 목.20.7520.0000.7861.0000.8050.857
측 정 항 목.30.5270.9790.7440.8051.0000.786
측 정 항 목.40.4140.7270.9180.8570.7861.000
2024-04-30T07:00:19.509308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
측정 지점호선측 정 항 목.2측 정 항 목.3측 정 항 목.4
측정\n지점1.0000.0000.4720.4870.639
호선0.0001.0000.4540.3130.232
측 정 항 목.20.4720.4541.0000.3190.367
측 정 항 목.30.4870.3130.3191.0000.352
측 정 항 목.40.6390.2320.3670.3521.000
2024-04-30T07:00:19.819844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
호선측정 지점측 정 항 목.2측 정 항 목.3측 정 항 목.4
호선1.0000.0000.4540.3130.232
측정\n지점0.0001.0000.4720.4870.639
측 정 항 목.20.4540.4721.0000.3190.367
측 정 항 목.30.3130.4870.3191.0000.352
측 정 항 목.40.2320.6390.3670.3521.000

Missing values

2024-04-30T07:00:16.170019image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-30T07:00:16.283048image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

호선시 설 명 (역사명)측정 지점측 정 항 목측 정 항 목.1측 정 항 목.2측 정 항 목.3측 정 항 목.4
0<NA><NA><NA>PM10CO2HCHOCO석면
1<NA><NA>공기질 기준150㎍/㎥이하1,000ppm이하100㎍/㎥이하10ppm이하0.01개/cc이하
21서울역평 균91.3522.519.20.80.0007
31<NA>승강장105.557912.70.70.0009
41<NA>대합실7746625.60.80.0004
51시 청평 균92.957819.50.90.0011
61<NA>승강장108.8591130.60.0013
71<NA>대합실76.9565261.20.0008
81종 각평 균109.850012.80.70.0015
91<NA>승강장123.251612.80.60.0021
호선시 설 명 (역사명)측정 지점측 정 항 목측 정 항 목.1측 정 항 목.2측 정 항 목.3측 정 항 목.4
2934총신대 입구평 균83.9541.513.21.10.0006
2944<NA>승강장98.453613.21.10.0008
2954<NA>대합실69.354713.210.0004
2964사 당평 균88.4502.717.20.40.0009
2974<NA>승강장116.451612.90.30.0014
2984<NA>대합실72.843125.80.40.0008
2994<NA>환승통로7656112.90.50.0006
3004남태령평 균107.739719.610.0006
3014<NA>승강장122.842126.110.0008
3024<NA>대합실92.6373130.90.0004