Overview

Dataset statistics

Number of variables5
Number of observations2140
Missing cells12
Missing cells (%)0.1%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory83.7 KiB
Average record size in memory40.1 B

Variable types

Categorical2
Text1
Unsupported2

Dataset

Description파일 다운로드
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-12251/S/1/datasetView.do

Alerts

Dataset has 1 (< 0.1%) duplicate rowsDuplicates
Unnamed: 3 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 4 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-05-11 02:10:34.810804
Analysis finished2024-05-11 02:10:36.171626
Duration1.36 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size16.8 KiB
20160919
305 
20160920
305 
20160921
305 
20160922
305 
20160923
305 
Other values (5)
615 

Length

Max length38
Median length8
Mean length8.0065421
Min length4

Unique

Unique2 ?
Unique (%)0.1%

Sample

1st row추출내용 : 지하철 일별 호선별 무임승하차인원(서울시 관할 운송기관)
2nd row<NA>
3rd row운행일자
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
20160919 305
14.3%
20160920 305
14.3%
20160921 305
14.3%
20160922 305
14.3%
20160923 305
14.3%
20160924 305
14.3%
20160925 305
14.3%
<NA> 3
 
0.1%
추출내용 : 지하철 일별 호선별 무임승하차인원(서울시 관할 운송기관) 1
 
< 0.1%
운행일자 1
 
< 0.1%

Length

2024-05-11T02:10:36.382416image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-11T02:10:36.834073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
20160919 305
14.2%
20160921 305
14.2%
20160922 305
14.2%
20160923 305
14.2%
20160924 305
14.2%
20160925 305
14.2%
20160920 305
14.2%
na 3
 
0.1%
호선별 1
 
< 0.1%
운송기관 1
 
< 0.1%
Other values (7) 7
 
0.3%

Unnamed: 1
Categorical

Distinct12
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size16.8 KiB
5호선
357 
7호선
357 
2호선
350 
6호선
259 
3호선
231 
Other values (7)
586 

Length

Max length6
Median length3
Mean length3.0495327
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row호선
4th row<NA>
5th row호선

Common Values

ValueCountFrequency (%)
5호선 357
16.7%
7호선 357
16.7%
2호선 350
16.4%
6호선 259
12.1%
3호선 231
10.8%
4호선 182
8.5%
9호선 175
8.2%
8호선 119
 
5.6%
1호선 70
 
3.3%
9호선2단계 35
 
1.6%
Other values (2) 5
 
0.2%

Length

2024-05-11T02:10:37.314242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
5호선 357
16.7%
7호선 357
16.7%
2호선 350
16.4%
6호선 259
12.1%
3호선 231
10.8%
4호선 182
8.5%
9호선 175
8.2%
8호선 119
 
5.6%
1호선 70
 
3.3%
9호선2단계 35
 
1.6%
Other values (2) 5
 
0.2%
Distinct267
Distinct (%)12.5%
Missing4
Missing (%)0.2%
Memory size16.8 KiB
2024-05-11T02:10:37.937913image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length4.2181648
Min length1

Characters and Unicode

Total characters9010
Distinct characters242
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row
2nd row서울역
3rd row시청
4th row종각
5th row종로3가
ValueCountFrequency (%)
고속터미널 21
 
1.0%
종로3가 21
 
1.0%
동대문역사문화공원 21
 
1.0%
여의도 14
 
0.7%
대림(구로구청 14
 
0.7%
청구 14
 
0.7%
동작(현충원 14
 
0.7%
합정 14
 
0.7%
사당 14
 
0.7%
공덕 14
 
0.7%
Other values (257) 1975
92.5%
2024-05-11T02:10:39.009047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 420
 
4.7%
( 420
 
4.7%
364
 
4.0%
350
 
3.9%
266
 
3.0%
217
 
2.4%
203
 
2.3%
161
 
1.8%
147
 
1.6%
140
 
1.6%
Other values (232) 6322
70.2%

Most occurring categories

ValueCountFrequency (%)
Other Letter 8093
89.8%
Close Punctuation 420
 
4.7%
Open Punctuation 420
 
4.7%
Decimal Number 56
 
0.6%
Other Punctuation 21
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
364
 
4.5%
350
 
4.3%
266
 
3.3%
217
 
2.7%
203
 
2.5%
161
 
2.0%
147
 
1.8%
140
 
1.7%
140
 
1.7%
119
 
1.5%
Other values (226) 5986
74.0%
Decimal Number
ValueCountFrequency (%)
3 35
62.5%
4 14
 
25.0%
5 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 420
100.0%
Open Punctuation
ValueCountFrequency (%)
( 420
100.0%
Other Punctuation
ValueCountFrequency (%)
. 21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 8093
89.8%
Common 917
 
10.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
364
 
4.5%
350
 
4.3%
266
 
3.3%
217
 
2.7%
203
 
2.5%
161
 
2.0%
147
 
1.8%
140
 
1.7%
140
 
1.7%
119
 
1.5%
Other values (226) 5986
74.0%
Common
ValueCountFrequency (%)
) 420
45.8%
( 420
45.8%
3 35
 
3.8%
. 21
 
2.3%
4 14
 
1.5%
5 7
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 8093
89.8%
ASCII 917
 
10.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 420
45.8%
( 420
45.8%
3 35
 
3.8%
. 21
 
2.3%
4 14
 
1.5%
5 7
 
0.8%
Hangul
ValueCountFrequency (%)
364
 
4.5%
350
 
4.3%
266
 
3.3%
217
 
2.7%
203
 
2.5%
161
 
2.0%
147
 
1.8%
140
 
1.7%
140
 
1.7%
119
 
1.5%
Other values (226) 5986
74.0%

Unnamed: 3
Unsupported

REJECTED  UNSUPPORTED 

Missing4
Missing (%)0.2%
Memory size16.8 KiB

Unnamed: 4
Unsupported

REJECTED  UNSUPPORTED 

Missing4
Missing (%)0.2%
Memory size16.8 KiB

Correlations

2024-05-11T02:10:39.380505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
추출기한 : 2016.9.19~9.25Unnamed: 1
추출기한 : 2016.9.19~9.251.0000.644
Unnamed: 10.6441.000
2024-05-11T02:10:39.817678image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
추출기한 : 2016.9.19~9.25Unnamed: 1
추출기한 : 2016.9.19~9.251.0000.372
Unnamed: 10.3721.000
2024-05-11T02:10:40.049474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
추출기한 : 2016.9.19~9.25Unnamed: 1
추출기한 : 2016.9.19~9.251.0000.372
Unnamed: 10.3721.000

Missing values

2024-05-11T02:10:35.282967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-11T02:10:35.628108image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-05-11T02:10:35.979417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

추출기한 : 2016.9.19~9.25Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
0추출내용 : 지하철 일별 호선별 무임승하차인원(서울시 관할 운송기관)<NA><NA>NaNNaN
1<NA><NA><NA>NaNNaN
2운행일자호선총승차총하차
3<NA><NA><NA>NaNNaN
4<NA>호선<NA>NaNNaN
5201609191호선서울역81377388
6201609191호선시청33823350
7201609191호선종각57425599
8201609191호선종로3가1327012247
9201609191호선종로5가1073110386
추출기한 : 2016.9.19~9.25Unnamed: 1Unnamed: 2Unnamed: 3Unnamed: 4
2130201609259호선구반포425447
2131201609259호선신반포624594
2132201609259호선고속터미널12171323
2133201609259호선사평286263
2134201609259호선신논현927884
2135201609259호선2단계언주427397
2136201609259호선2단계선정릉280300
2137201609259호선2단계삼성중앙286314
2138201609259호선2단계봉은사806686
2139201609259호선2단계종합운동장329247

Duplicate rows

Most frequently occurring

추출기한 : 2016.9.19~9.25Unnamed: 1Unnamed: 2# duplicates
0<NA><NA><NA>2