Overview

Dataset statistics

Number of variables7
Number of observations54
Missing cells4
Missing cells (%)1.1%
Duplicate rows1
Duplicate rows (%)1.9%
Total size in memory3.1 KiB
Average record size in memory59.4 B

Variable types

Categorical4
Text2
Numeric1

Dataset

Description자치구,구분,위치_역명,주소_역본선구분,일평균발생량(톤/일),시설구분,현재이용가능
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15609/S/1/datasetView.do

Alerts

Dataset has 1 (1.9%) duplicate rowsDuplicates
현재이용가능 is highly overall correlated with 일평균발생량(톤/일) and 3 other fieldsHigh correlation
구분 is highly overall correlated with 시설구분 and 1 other fieldsHigh correlation
시설구분 is highly overall correlated with 구분 and 1 other fieldsHigh correlation
자치구 is highly overall correlated with 현재이용가능High correlation
일평균발생량(톤/일) is highly overall correlated with 현재이용가능High correlation
시설구분 is highly imbalanced (53.4%)Imbalance
주소_역본선구분 has 1 (1.9%) missing valuesMissing
일평균발생량(톤/일) has 3 (5.6%) missing valuesMissing
일평균발생량(톤/일) has 4 (7.4%) zerosZeros

Reproduction

Analysis started2024-03-13 13:52:16.146264
Analysis finished2024-03-13 13:52:17.284518
Duration1.14 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

자치구
Categorical

HIGH CORRELATION 

Distinct20
Distinct (%)37.0%
Missing0
Missing (%)0.0%
Memory size564.0 B
용산구
구로구
동작구
영등포구
종로구
Other values (15)
33 

Length

Max length4
Median length3
Mean length3.0555556
Min length2

Unique

Unique5 ?
Unique (%)9.3%

Sample

1st row관악구
2nd row강남구
3rd row강남구
4th row송파구
5th row송파구

Common Values

ValueCountFrequency (%)
용산구 5
 
9.3%
구로구 4
 
7.4%
동작구 4
 
7.4%
영등포구 4
 
7.4%
종로구 4
 
7.4%
송파구 4
 
7.4%
성북구 4
 
7.4%
금천구 3
 
5.6%
관악구 3
 
5.6%
마포구 3
 
5.6%
Other values (10) 16
29.6%

Length

2024-03-13T22:52:17.383475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
용산구 5
 
9.3%
동작구 4
 
7.4%
영등포구 4
 
7.4%
종로구 4
 
7.4%
송파구 4
 
7.4%
성북구 4
 
7.4%
구로구 4
 
7.4%
마포구 3
 
5.6%
중구 3
 
5.6%
관악구 3
 
5.6%
Other values (10) 16
29.6%

구분
Categorical

HIGH CORRELATION 

Distinct12
Distinct (%)22.2%
Missing0
Missing (%)0.0%
Memory size564.0 B
7호선
13 
5호선
6호선
전력구
3호선
Other values (7)
13 

Length

Max length6
Median length3
Mean length3.1111111
Min length3

Unique

Unique3 ?
Unique (%)5.6%

Sample

1st row전력구
2nd row3호선
3rd row3호선
4th row5호선
5th row5호선

Common Values

ValueCountFrequency (%)
7호선 13
24.1%
5호선 9
16.7%
6호선 9
16.7%
전력구 6
11.1%
3호선 4
 
7.4%
공항철도 3
 
5.6%
건축물 3
 
5.6%
4호선 2
 
3.7%
2호선 2
 
3.7%
8호선 1
 
1.9%
Other values (2) 2
 
3.7%

Length

2024-03-13T22:52:17.564065image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7호선 13
24.1%
5호선 9
16.7%
6호선 9
16.7%
전력구 6
11.1%
3호선 4
 
7.4%
공항철도 3
 
5.6%
건축물 3
 
5.6%
4호선 2
 
3.7%
2호선 2
 
3.7%
8호선 1
 
1.9%
Other values (2) 2
 
3.7%
Distinct51
Distinct (%)94.4%
Missing0
Missing (%)0.0%
Memory size564.0 B
2024-03-13T22:52:17.828035image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length22
Median length12
Mean length5.8333333
Min length2

Characters and Unicode

Total characters315
Distinct characters122
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique48 ?
Unique (%)88.9%

Sample

1st row신길전력구(신림정화단 앞, 당곡사거리))
2nd row매봉
3rd row가락시장
4th row오금본선(3K600)
5th row방이
ValueCountFrequency (%)
대림 2
 
3.3%
pit 2
 
3.3%
서울역 2
 
3.3%
상도 2
 
3.3%
신대방삼거리 1
 
1.7%
영오전력구 1
 
1.7%
재활용집하장 1
 
1.7%
연신내 1
 
1.7%
역촌 1
 
1.7%
동아일보(김병관 1
 
1.7%
Other values (46) 46
76.7%
2024-03-13T22:52:18.290706image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 14
 
4.4%
( 13
 
4.1%
K 9
 
2.9%
8
 
2.5%
8
 
2.5%
8
 
2.5%
0 8
 
2.5%
7
 
2.2%
7
 
2.2%
7
 
2.2%
Other values (112) 226
71.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 220
69.8%
Decimal Number 44
 
14.0%
Uppercase Letter 16
 
5.1%
Close Punctuation 14
 
4.4%
Open Punctuation 13
 
4.1%
Space Separator 6
 
1.9%
Other Punctuation 1
 
0.3%
Dash Punctuation 1
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8
 
3.6%
8
 
3.6%
8
 
3.6%
7
 
3.2%
7
 
3.2%
7
 
3.2%
7
 
3.2%
5
 
2.3%
5
 
2.3%
5
 
2.3%
Other values (94) 153
69.5%
Decimal Number
ValueCountFrequency (%)
0 8
18.2%
2 6
13.6%
6 5
11.4%
1 5
11.4%
7 4
9.1%
4 4
9.1%
8 4
9.1%
5 4
9.1%
3 4
9.1%
Uppercase Letter
ValueCountFrequency (%)
K 9
56.2%
T 3
 
18.8%
I 2
 
12.5%
P 2
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 14
100.0%
Open Punctuation
ValueCountFrequency (%)
( 13
100.0%
Space Separator
ValueCountFrequency (%)
6
100.0%
Other Punctuation
ValueCountFrequency (%)
, 1
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 220
69.8%
Common 79
 
25.1%
Latin 16
 
5.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8
 
3.6%
8
 
3.6%
8
 
3.6%
7
 
3.2%
7
 
3.2%
7
 
3.2%
7
 
3.2%
5
 
2.3%
5
 
2.3%
5
 
2.3%
Other values (94) 153
69.5%
Common
ValueCountFrequency (%)
) 14
17.7%
( 13
16.5%
0 8
10.1%
6
7.6%
2 6
7.6%
6 5
 
6.3%
1 5
 
6.3%
7 4
 
5.1%
4 4
 
5.1%
8 4
 
5.1%
Other values (4) 10
12.7%
Latin
ValueCountFrequency (%)
K 9
56.2%
T 3
 
18.8%
I 2
 
12.5%
P 2
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
Hangul 220
69.8%
ASCII 95
30.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 14
14.7%
( 13
13.7%
K 9
9.5%
0 8
 
8.4%
6
 
6.3%
2 6
 
6.3%
6 5
 
5.3%
1 5
 
5.3%
7 4
 
4.2%
4 4
 
4.2%
Other values (8) 21
22.1%
Hangul
ValueCountFrequency (%)
8
 
3.6%
8
 
3.6%
8
 
3.6%
7
 
3.2%
7
 
3.2%
7
 
3.2%
7
 
3.2%
5
 
2.3%
5
 
2.3%
5
 
2.3%
Other values (94) 153
69.5%
Distinct29
Distinct (%)54.7%
Missing1
Missing (%)1.9%
Memory size564.0 B
2024-03-13T22:52:18.620049image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length49
Median length32
Mean length11.018868
Min length2

Characters and Unicode

Total characters584
Distinct characters102
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique26 ?
Unique (%)49.1%

Sample

1st row신길환기구#2(관악구 신림동1467-7)
2nd row역사(도곡동 464-1(도곡동))
3rd row역사
4th row본선
5th row역사
ValueCountFrequency (%)
역사 19
 
17.8%
본선 7
 
6.5%
지하 3
 
2.8%
사이 2
 
1.9%
서대문구 2
 
1.9%
역사(상도로49길 2
 
1.9%
20(상도동 2
 
1.9%
56-39 1
 
0.9%
마곡지구8단지앞 1
 
0.9%
252-6 1
 
0.9%
Other values (67) 67
62.6%
2024-03-13T22:52:19.011268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
55
 
9.4%
( 36
 
6.2%
) 36
 
6.2%
33
 
5.7%
33
 
5.7%
27
 
4.6%
1 21
 
3.6%
2 19
 
3.3%
18
 
3.1%
16
 
2.7%
Other values (92) 290
49.7%

Most occurring categories

ValueCountFrequency (%)
Other Letter 326
55.8%
Decimal Number 113
 
19.3%
Space Separator 55
 
9.4%
Open Punctuation 36
 
6.2%
Close Punctuation 36
 
6.2%
Dash Punctuation 11
 
1.9%
Other Punctuation 5
 
0.9%
Uppercase Letter 2
 
0.3%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
33
 
10.1%
33
 
10.1%
27
 
8.3%
18
 
5.5%
16
 
4.9%
12
 
3.7%
11
 
3.4%
9
 
2.8%
8
 
2.5%
8
 
2.5%
Other values (74) 151
46.3%
Decimal Number
ValueCountFrequency (%)
1 21
18.6%
2 19
16.8%
5 15
13.3%
4 12
10.6%
9 10
8.8%
6 10
8.8%
7 10
8.8%
3 6
 
5.3%
8 6
 
5.3%
0 4
 
3.5%
Other Punctuation
ValueCountFrequency (%)
# 3
60.0%
/ 1
 
20.0%
, 1
 
20.0%
Space Separator
ValueCountFrequency (%)
55
100.0%
Open Punctuation
ValueCountFrequency (%)
( 36
100.0%
Close Punctuation
ValueCountFrequency (%)
) 36
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11
100.0%
Uppercase Letter
ValueCountFrequency (%)
S 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 326
55.8%
Common 256
43.8%
Latin 2
 
0.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
33
 
10.1%
33
 
10.1%
27
 
8.3%
18
 
5.5%
16
 
4.9%
12
 
3.7%
11
 
3.4%
9
 
2.8%
8
 
2.5%
8
 
2.5%
Other values (74) 151
46.3%
Common
ValueCountFrequency (%)
55
21.5%
( 36
14.1%
) 36
14.1%
1 21
 
8.2%
2 19
 
7.4%
5 15
 
5.9%
4 12
 
4.7%
- 11
 
4.3%
9 10
 
3.9%
6 10
 
3.9%
Other values (7) 31
12.1%
Latin
ValueCountFrequency (%)
S 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 326
55.8%
ASCII 258
44.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
55
21.3%
( 36
14.0%
) 36
14.0%
1 21
 
8.1%
2 19
 
7.4%
5 15
 
5.8%
4 12
 
4.7%
- 11
 
4.3%
9 10
 
3.9%
6 10
 
3.9%
Other values (8) 33
12.8%
Hangul
ValueCountFrequency (%)
33
 
10.1%
33
 
10.1%
27
 
8.3%
18
 
5.5%
16
 
4.9%
12
 
3.7%
11
 
3.4%
9
 
2.8%
8
 
2.5%
8
 
2.5%
Other values (74) 151
46.3%

일평균발생량(톤/일)
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct46
Distinct (%)90.2%
Missing3
Missing (%)5.6%
Infinite0
Infinite (%)0.0%
Mean534.49784
Minimum0
Maximum5786.14
Zeros4
Zeros (%)7.4%
Negative0
Negative (%)0.0%
Memory size618.0 B
2024-03-13T22:52:19.175300image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1108.5
median239.5
Q3596.55
95-th percentile1620.3
Maximum5786.14
Range5786.14
Interquartile range (IQR)488.05

Descriptive statistics

Standard deviation893.27534
Coefficient of variation (CV)1.6712422
Kurtosis24.131341
Mean534.49784
Median Absolute Deviation (MAD)210.37
Skewness4.3912261
Sum27259.39
Variance797940.84
MonotonicityNot monotonic
2024-03-13T22:52:19.356373image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
0.0 4
 
7.4%
239.5 2
 
3.7%
468.0 2
 
3.7%
160.3 1
 
1.9%
269.37 1
 
1.9%
112.0 1
 
1.9%
380.73 1
 
1.9%
41.3 1
 
1.9%
685.1 1
 
1.9%
460.7 1
 
1.9%
Other values (36) 36
66.7%
(Missing) 3
 
5.6%
ValueCountFrequency (%)
0.0 4
7.4%
0.3 1
 
1.9%
1.43 1
 
1.9%
20.0 1
 
1.9%
31.93 1
 
1.9%
41.3 1
 
1.9%
43.2 1
 
1.9%
104.0 1
 
1.9%
104.9 1
 
1.9%
108.0 1
 
1.9%
ValueCountFrequency (%)
5786.14 1
1.9%
2029.77 1
1.9%
1623.6 1
1.9%
1617.0 1
1.9%
1608.4 1
1.9%
1340.0 1
1.9%
959.47 1
1.9%
946.0 1
1.9%
874.67 1
1.9%
813.71 1
1.9%

시설구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size564.0 B
지하철
44 
전력구
기타
 
3
통신구
 
1

Length

Max length3
Median length3
Mean length2.9444444
Min length2

Unique

Unique1 ?
Unique (%)1.9%

Sample

1st row전력구
2nd row지하철
3rd row지하철
4th row지하철
5th row지하철

Common Values

ValueCountFrequency (%)
지하철 44
81.5%
전력구 6
 
11.1%
기타 3
 
5.6%
통신구 1
 
1.9%

Length

2024-03-13T22:52:19.512721image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T22:52:19.657145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
지하철 44
81.5%
전력구 6
 
11.1%
기타 3
 
5.6%
통신구 1
 
1.9%

현재이용가능
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size564.0 B
이용가능
29 
<NA>
25 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row이용가능
3rd row이용가능
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
이용가능 29
53.7%
<NA> 25
46.3%

Length

2024-03-13T22:52:19.807715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-13T22:52:19.907741image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
이용가능 29
53.7%
na 25
46.3%

Interactions

2024-03-13T22:52:16.547017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2024-03-13T22:52:19.977241image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
자치구구분위치_역명주소_역본선구분일평균발생량(톤/일)시설구분
자치구1.0000.6940.9950.5500.1180.744
구분0.6941.0000.9530.5440.6481.000
위치_역명0.9950.9531.0001.0001.0001.000
주소_역본선구분0.5500.5441.0001.0000.0001.000
일평균발생량(톤/일)0.1180.6481.0000.0001.0000.000
시설구분0.7441.0001.0001.0000.0001.000
2024-03-13T22:52:20.107705image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
현재이용가능구분시설구분자치구
현재이용가능1.0001.0001.0001.000
구분1.0001.0000.9170.272
시설구분1.0000.9171.0000.359
자치구1.0000.2720.3591.000
2024-03-13T22:52:20.217770image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
일평균발생량(톤/일)자치구구분시설구분현재이용가능
일평균발생량(톤/일)1.0000.0000.3910.0001.000
자치구0.0001.0000.2720.3591.000
구분0.3910.2721.0000.9171.000
시설구분0.0000.3590.9171.0001.000
현재이용가능1.0001.0001.0001.0001.000

Missing values

2024-03-13T22:52:16.961944image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-03-13T22:52:17.105814image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-03-13T22:52:17.211884image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

자치구구분위치_역명주소_역본선구분일평균발생량(톤/일)시설구분현재이용가능
0관악구전력구신길전력구(신림정화단 앞, 당곡사거리))신길환기구#2(관악구 신림동1467-7)345.0전력구<NA>
1강남구3호선매봉역사(도곡동 464-1(도곡동))233.0지하철이용가능
2강남구3호선가락시장역사417.3지하철이용가능
3송파구5호선오금본선(3K600)본선348.57지하철<NA>
4송파구5호선방이역사0.0지하철<NA>
5송파구8호선몽촌토성본선(4K074)본선449.87지하철<NA>
6송파구9호선2단계종합운동장역역사5786.14지하철이용가능
7강동구5호선굽은다리역사( 강동구 양재대로 1572 (명일동))165.17지하철이용가능
8종로구3호선독립문역사(통일로 지하 247(현저동))109.0지하철<NA>
9종로구4호선동대문역사104.9지하철<NA>
자치구구분위치_역명주소_역본선구분일평균발생량(톤/일)시설구분현재이용가능
44금천구전력구시독전력구시흥S/S내 집수정108.0전력구<NA>
45금천구전력구시흥변전소금천구 시흥동 562-2(561-1)43.2전력구이용가능
46영등포구5호선영등포시장역사1617.0지하철이용가능
47영등포구5호선여의도본선(18K827)본선1340.0지하철이용가능
48영등포구7호선대림역사468.0지하철이용가능
49영등포구전력구영오전력구영오환기구#1 (영등포구 영중로119)124.0전력구이용가능
50동작구7호선상도역사(상도로49길 20(상도동))239.5지하철<NA>
51동작구7호선신대방삼거리역사(상도로 지하 76(대방동))689.9지하철이용가능
52관악구2호선사당역사(외선)0.3지하철<NA>
53관악구전력구신림전력구(인헌초교앞)신림환기구#5(관악구 낙성대로15길 56-39) (관악구 남부순환로1942(낙성대동))104.0전력구<NA>

Duplicate rows

Most frequently occurring

자치구구분위치_역명주소_역본선구분일평균발생량(톤/일)시설구분현재이용가능# duplicates
0동작구7호선상도역사(상도로49길 20(상도동))239.5지하철<NA>2