gimi9 Pandas Profiling

Dataset statistics

Number of variables	15
Number of observations	27
Missing cells	113
Missing cells (%)	27.9%
Duplicate rows	3
Duplicate rows (%)	11.1%
Total size in memory	3.3 KiB
Average record size in memory	124.9 B

Variable types

Text	3
Categorical	1
Unsupported	11

Dataset

Description	파일 다운로드
Author	서울교통공사
URL	https://data.seoul.go.kr/dataList/OA-13231/F/1/datasetView.do

Alerts

Dataset has 3 (11.1%) duplicate rows	Duplicates
`시 설 명` has 8 (29.6%) missing values	Missing
`Unnamed: 1` has 21 (77.8%) missing values	Missing
`Unnamed: 2` has 24 (88.9%) missing values	Missing
`계` has 7 (25.9%) missing values	Missing
`1~4호선` has 6 (22.2%) missing values	Missing
`Unnamed: 6` has 6 (22.2%) missing values	Missing
`Unnamed: 7` has 6 (22.2%) missing values	Missing
`Unnamed: 8` has 5 (18.5%) missing values	Missing
`Unnamed: 9` has 6 (22.2%) missing values	Missing
`5~8호선` has 6 (22.2%) missing values	Missing
`Unnamed: 11` has 6 (22.2%) missing values	Missing
`Unnamed: 12` has 6 (22.2%) missing values	Missing
`Unnamed: 13` has 2 (7.4%) missing values	Missing
`Unnamed: 14` has 4 (14.8%) missing values	Missing
`계` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`1~4호선` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 6` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 7` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 8` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 9` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`5~8호선` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 11` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 12` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 13` is an unsupported type, check if it needs cleaning or further analysis	Unsupported
`Unnamed: 14` is an unsupported type, check if it needs cleaning or further analysis	Unsupported

Reproduction

Analysis started	2024-04-29 16:47:17.075716
Analysis finished	2024-04-29 16:47:19.007162
Duration	1.93 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

시 설 명
Text

MISSING

Distinct	18
Distinct (%)	94.7%
Missing	8
Missing (%)	29.6%
Memory size	348.0 B

Length

Max length	8
Median length	6
Mean length	4
Min length	2

Characters and Unicode

Total characters	76
Distinct characters	46
Distinct categories	7 ?
Distinct scripts	3 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	17 ?
Unique (%)	89.5%

Sample

1st row	궤도연장
2nd row	(본선/측선)
3rd row	곡선연장
4th row	(R<1200)
5th row	최소

Value	Count	Frequency (%)
본선/측선	2	10.5%
콘크리트도상	1	5.3%
b2s	1	5.3%
신축이음매	1	5.3%
차량기지	1	5.3%
장비유치선	1	5.3%
구간	1	5.3%
장치	1	5.3%
체결	1	5.3%
방진	1	5.3%
Other values (8)	8	42.1%

Most occurring characters

Value	Count	Frequency (%)
선	7	9.2%
기	6	7.9%
장	4	5.3%
)	3	3.9%
도	3	3.9%
(	3	3.9%
2	2	2.6%
연	2	2.6%
유	2	2.6%
본	2	2.6%
Other values (36)	42	55.3%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	59	77.6%
Decimal Number	5	6.6%
Close Punctuation	3	3.9%
Open Punctuation	3	3.9%
Uppercase Letter	3	3.9%
Other Punctuation	2	2.6%
Math Symbol	1	1.3%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
선	7	11.9%
기	6	10.2%
장	4	6.8%
도	3	5.1%
연	2	3.4%
유	2	3.4%
본	2	3.4%
최	2	3.4%
치	2	3.4%
곡	2	3.4%
Other values (26)	27	45.8%

Decimal Number

Value	Count	Frequency (%)
2	2	40.0%
0	2	40.0%
1	1	20.0%

Uppercase Letter

Value	Count	Frequency (%)
B	1	33.3%
R	1	33.3%
S	1	33.3%

Close Punctuation

Value	Count	Frequency (%)
)	3	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	3	100.0%

Other Punctuation

Value	Count	Frequency (%)
/	2	100.0%

Math Symbol

Value	Count	Frequency (%)
<	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	59	77.6%
Common	14	18.4%
Latin	3	3.9%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
선	7	11.9%
기	6	10.2%
장	4	6.8%
도	3	5.1%
연	2	3.4%
유	2	3.4%
본	2	3.4%
최	2	3.4%
치	2	3.4%
곡	2	3.4%
Other values (26)	27	45.8%

Common

Value	Count	Frequency (%)
)	3	21.4%
(	3	21.4%
2	2	14.3%
0	2	14.3%
/	2	14.3%
1	1	7.1%
<	1	7.1%

Latin

Value	Count	Frequency (%)
B	1	33.3%
R	1	33.3%
S	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	59	77.6%
ASCII	17	22.4%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
선	7	11.9%
기	6	10.2%
장	4	6.8%
도	3	5.1%
연	2	3.4%
유	2	3.4%
본	2	3.4%
최	2	3.4%
치	2	3.4%
곡	2	3.4%
Other values (26)	27	45.8%

ASCII

Value	Count	Frequency (%)
)	3	17.6%
(	3	17.6%
2	2	11.8%
0	2	11.8%
/	2	11.8%
B	1	5.9%
1	1	5.9%
<	1	5.9%
R	1	5.9%
S	1	5.9%

Unnamed: 1
Text

MISSING

Distinct	4
Distinct (%)	66.7%
Missing	21
Missing (%)	77.8%
Memory size	348.0 B

Length

Max length	8
Median length	7
Mean length	5.5
Min length	3

Characters and Unicode

Total characters	33
Distinct characters	16
Distinct categories	7 ?
Distinct scripts	3 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	2 ?
Unique (%)	33.3%

Sample

1st row	곡선반경 (R)
2nd row	연장 (m)
3rd row	구 간
4th row	기울기 (‰)
5th row	연장 (m)

Value	Count	Frequency (%)
연장	2	16.7%
m	2	16.7%
구	2	16.7%
간	2	16.7%
곡선반경	1	8.3%
r	1	8.3%
기울기	1	8.3%
‰	1	8.3%

Most occurring characters

Value	Count	Frequency (%)
	6	18.2%
(	4	12.1%
)	4	12.1%
연	2	6.1%
장	2	6.1%
m	2	6.1%
구	2	6.1%
간	2	6.1%
기	2	6.1%
곡	1	3.0%
Other values (6)	6	18.2%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	15	45.5%
Space Separator	6	18.2%
Open Punctuation	4	12.1%
Close Punctuation	4	12.1%
Lowercase Letter	2	6.1%
Uppercase Letter	1	3.0%
Other Punctuation	1	3.0%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
연	2	13.3%
장	2	13.3%
구	2	13.3%
간	2	13.3%
기	2	13.3%
곡	1	6.7%
선	1	6.7%
반	1	6.7%
경	1	6.7%
울	1	6.7%

Space Separator

Value	Count	Frequency (%)
	6	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	4	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	4	100.0%

Lowercase Letter

Value	Count	Frequency (%)
m	2	100.0%

Uppercase Letter

Value	Count	Frequency (%)
R	1	100.0%

Other Punctuation

Value	Count	Frequency (%)
‰	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	15	45.5%
Hangul	15	45.5%
Latin	3	9.1%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
연	2	13.3%
장	2	13.3%
구	2	13.3%
간	2	13.3%
기	2	13.3%
곡	1	6.7%
선	1	6.7%
반	1	6.7%
경	1	6.7%
울	1	6.7%

Common

Value	Count	Frequency (%)
	6	40.0%
(	4	26.7%
)	4	26.7%
‰	1	6.7%

Latin

Value	Count	Frequency (%)
m	2	66.7%
R	1	33.3%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	17	51.5%
Hangul	15	45.5%
Punctuation	1	3.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
	6	35.3%
(	4	23.5%
)	4	23.5%
m	2	11.8%
R	1	5.9%

Hangul

Value	Count	Frequency (%)
연	2	13.3%
장	2	13.3%
구	2	13.3%
간	2	13.3%
기	2	13.3%
곡	1	6.7%
선	1	6.7%
반	1	6.7%
경	1	6.7%
울	1	6.7%

Punctuation

Value	Count	Frequency (%)
‰	1	100.0%

Unnamed: 2
Text

MISSING

Distinct	3
Distinct (%)	100.0%
Missing	24
Missing (%)	88.9%
Memory size	348.0 B

Length

Max length	5
Median length	5
Mean length	5
Min length	5

Characters and Unicode

Total characters	15
Distinct characters	10
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	3 ?
Unique (%)	100.0%

Sample

1st row	Alt-Ⅰ
2nd row	Alt-Ⅱ
3rd row	DFF14

Value	Count	Frequency (%)
alt-ⅰ	1	33.3%
alt-ⅱ	1	33.3%
dff14	1	33.3%

Most occurring characters

Value	Count	Frequency (%)
A	2	13.3%
l	2	13.3%
t	2	13.3%
-	2	13.3%
F	2	13.3%
Ⅰ	1	6.7%
Ⅱ	1	6.7%
D	1	6.7%
1	1	6.7%
4	1	6.7%

Most occurring categories

Value	Count	Frequency (%)
Uppercase Letter	5	33.3%
Lowercase Letter	4	26.7%
Dash Punctuation	2	13.3%
Letter Number	2	13.3%
Decimal Number	2	13.3%

Most frequent character per category

Uppercase Letter

Value	Count	Frequency (%)
A	2	40.0%
F	2	40.0%
D	1	20.0%

Lowercase Letter

Value	Count	Frequency (%)
l	2	50.0%
t	2	50.0%

Letter Number

Value	Count	Frequency (%)
Ⅰ	1	50.0%
Ⅱ	1	50.0%

Decimal Number

Value	Count	Frequency (%)
1	1	50.0%
4	1	50.0%

Dash Punctuation

Value	Count	Frequency (%)
-	2	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Latin	11	73.3%
Common	4	26.7%

Most frequent character per script

Latin

Value	Count	Frequency (%)
A	2	18.2%
l	2	18.2%
t	2	18.2%
F	2	18.2%
Ⅰ	1	9.1%
Ⅱ	1	9.1%
D	1	9.1%

Common

Value	Count	Frequency (%)
-	2	50.0%
1	1	25.0%
4	1	25.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	13	86.7%
Number Forms	2	13.3%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
A	2	15.4%
l	2	15.4%
t	2	15.4%
-	2	15.4%
F	2	15.4%
D	1	7.7%
1	1	7.7%
4	1	7.7%

Number Forms

Value	Count	Frequency (%)
Ⅰ	1	50.0%
Ⅱ	1	50.0%

단위
Categorical

Distinct	9
Distinct (%)	33.3%
Missing	0
Missing (%)	0.0%
Memory size	348.0 B

<NA>	9
km	4
m	3
개	3
개소	3
Other values (4)	5

Length

Max length	4
Median length	2
Mean length	2.2592593
Min length	1

Unique

Unique	3 ?
Unique (%)	11.1%

Sample

1st row	<NA>
2nd row	km
3rd row	<NA>
4th row	km
5th row	<NA>

Common Values

Value	Count	Frequency (%)
<NA>	9	33.3%
km	4	14.8%
m	3	11.1%
개	3	11.1%
개소	3	11.1%
-	2	7.4%
‰	1	3.7%
틀	1	3.7%
대	1	3.7%

Length

Histogram of lengths of the category