gimi9 Pandas Profiling

Dataset statistics

Number of variables	6
Number of observations	37
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	1.9 KiB
Average record size in memory	52.6 B

Variable types

Categorical	1
Numeric	1
Text	4

Dataset

Description	파일 다운로드
Author	서울교통공사
URL	https://data.seoul.go.kr/dataList/OA-13317/F/1/datasetView.do

Alerts

`호선` is highly overall correlated with `회사`	High correlation
`회사` is highly overall correlated with `호선`	High correlation
`역` has unique values	Unique

Reproduction

Analysis started	2023-12-11 06:13:14.485878
Analysis finished	2023-12-11 06:13:15.010629
Duration	0.52 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

회사
Categorical

HIGH CORRELATION

Distinct	3
Distinct (%)	8.1%
Missing	0
Missing (%)	0.0%
Memory size	428.0 B

서울메트로	20
도시철도공사	14
메트로9호선	3

Length

Max length	6
Median length	5
Mean length	5.4594595
Min length	5

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	서울메트로
2nd row	서울메트로
3rd row	서울메트로
4th row	서울메트로
5th row	서울메트로

Common Values

Value	Count	Frequency (%)
서울메트로	20	54.1%
도시철도공사	14	37.8%
메트로9호선	3	8.1%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
서울메트로	20	54.1%
도시철도공사	14	37.8%
메트로9호선	3	8.1%

호선
Real number (ℝ)

HIGH CORRELATION

Distinct	9
Distinct (%)	24.3%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	4.4594595

Minimum	1
Maximum	9
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	465.0 B

Quantile statistics

Minimum	1
5-th percentile	1.8
Q1	2
median	4
Q3	6
95-th percentile	9
Maximum	9
Range	8
Interquartile range (IQR)	4

Descriptive statistics

Standard deviation	2.4220583
Coefficient of variation (CV)	0.54312822
Kurtosis	-0.96699417
Mean	4.4594595
Median Absolute Deviation (MAD)	2
Skewness	0.44291989
Sum	165
Variance	5.8663664
Monotonicity	Increasing

Histogram with fixed size bins (bins=9)

Value	Count	Frequency (%)
2	9	24.3%
3	5	13.5%
5	5	13.5%
4	4	10.8%
7	4	10.8%
6	3	8.1%
9	3	8.1%
1	2	5.4%
8	2	5.4%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
1	2	5.4%
2	9	24.3%
3	5	13.5%
4	4	10.8%
5	5	13.5%
6	3	8.1%
7	4	10.8%
8	2	5.4%
9	3	8.1%

Value	Count	Frequency (%)
9	3	8.1%
8	2	5.4%
7	4	10.8%
6	3	8.1%
5	5	13.5%
4	4	10.8%
3	5	13.5%
2	9	24.3%
1	2	5.4%

역
Text

UNIQUE

Distinct	37
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	428.0 B

Length

Max length	12
Median length	10
Mean length	6.9459459
Min length	2

Characters and Unicode

Total characters	257
Distinct characters	89
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	37 ?
Unique (%)	100.0%

Sample

1st row	서울↔청량리
2nd row	동묘앞
3rd row	신설동(2)↔종합운동장
4th row	종합운동장↔교대(2)
5th row	을지입구↔성수

Value	Count	Frequency (%)
서울↔청량리	1	2.7%
사당↔남태령(시계	1	2.7%
방화↔까치산	1	2.7%
강동↔마천	1	2.7%
까치산↔여의도	1	2.7%
여의도↔왕십리	1	2.7%
봉화산↔상월곡	1	2.7%
응암↔상월곡	1	2.7%
이태원↔약수	1	2.7%
장암↔건대입구	1	2.7%
Other values (27)	27	73.0%

Most occurring characters

Value	Count	Frequency (%)
↔	34	13.2%
구	13	5.1%
신	8	3.1%
입	8	3.1%
대	8	3.1%
동	7	2.7%
수	6	2.3%
서	5	1.9%
상	5	1.9%
리	5	1.9%
Other values (79)	158	61.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	212	82.5%
Math Symbol	34	13.2%
Close Punctuation	4	1.6%
Open Punctuation	4	1.6%
Decimal Number	3	1.2%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
구	13	6.1%
신	8	3.8%
입	8	3.8%
대	8	3.8%
동	7	3.3%
수	6	2.8%
서	5	2.4%
상	5	2.4%
리	5	2.4%
양	4	1.9%
Other values (75)	143	67.5%

Math Symbol

Value	Count	Frequency (%)
↔	34	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	4	100.0%

Open Punctuation

Value	Count	Frequency (%)
(	4	100.0%

Decimal Number

Value	Count	Frequency (%)
2	3	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	212	82.5%
Common	45	17.5%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
구	13	6.1%
신	8	3.8%
입	8	3.8%
대	8	3.8%
동	7	3.3%
수	6	2.8%
서	5	2.4%
상	5	2.4%
리	5	2.4%
양	4	1.9%
Other values (75)	143	67.5%

Common

Value	Count	Frequency (%)
↔	34	75.6%
)	4	8.9%
(	4	8.9%
2	3	6.7%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	212	82.5%
Arrows	34	13.2%
ASCII	11	4.3%

Most frequent character per block

Arrows

Value	Count	Frequency (%)
↔	34	100.0%

Hangul

Value	Count	Frequency (%)
구	13	6.1%
신	8	3.8%
입	8	3.8%
대	8	3.8%
동	7	3.3%
수	6	2.8%
서	5	2.4%
상	5	2.4%
리	5	2.4%
양	4	1.9%
Other values (75)	143	67.5%

ASCII

Value	Count	Frequency (%)
)	4	36.4%
(	4	36.4%
2	3	27.3%

역 수
Text

Distinct	19
Distinct (%)	51.4%
Missing	0
Missing (%)	0.0%
Memory size	428.0 B

Length

Max length	2
Median length	1
Mean length	1.3243243
Min length	1

Characters and Unicode

Total characters	49
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	10 ?
Unique (%)	27.0%

Sample

1st row	9
2nd row	1
3rd row	11
4th row	5
5th row	9

Value	Count	Frequency (%)
1	7	18.9%
9	4	10.8%
5	3	8.1%
7	3	8.1%
16	2	5.4%
4	2	5.4%
14	2	5.4%
8	2	5.4%
13	2	5.4%
10	1	2.7%
Other values (9)	9	24.3%

Most occurring characters

Value	Count	Frequency (%)
1	18	36.7%
9	5	10.2%
4	5	10.2%
5	4	8.2%
7	3	6.1%
6	3	6.1%
8	3	6.1%
3	3	6.1%
2	3	6.1%
-	1	2.0%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	48	98.0%
Dash Punctuation	1	2.0%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	18	37.5%
9	5	10.4%
4	5	10.4%
5	4	8.3%
7	3	6.2%
6	3	6.2%
8	3	6.2%
3	3	6.2%
2	3	6.2%
0	1	2.1%

Dash Punctuation

Value	Count	Frequency (%)
-	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	49	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	18	36.7%
9	5	10.2%
4	5	10.2%
5	4	8.2%
7	3	6.1%
6	3	6.1%
8	3	6.1%
3	3	6.1%
2	3	6.1%
-	1	2.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	49	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	18	36.7%
9	5	10.2%
4	5	10.2%
5	4	8.2%
7	3	6.1%
6	3	6.1%
8	3	6.1%
3	3	6.1%
2	3	6.1%
-	1	2.0%

연 장(km)
Text

Distinct	33
Distinct (%)	89.2%
Missing	0
Missing (%)	0.0%
Memory size	428.0 B

Length

Max length	4
Median length	3
Mean length	2.8648649
Min length	1

Characters and Unicode

Total characters	106
Distinct characters	12
Distinct categories	3 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	31 ?
Unique (%)	83.8%

Sample

1st row	7.8
2nd row	-
3rd row	14.3
4th row	5.5
5th row	7.9

Value	Count	Frequency (%)
	4	10.8%
7.9	2	5.4%
19	1	2.7%
14.4	1	2.7%
8.9	1	2.7%
7.1	1	2.7%
14	1	2.7%
4.2	1	2.7%
9.2	1	2.7%
7.8	1	2.7%
Other values (23)	23	62.2%

Most occurring characters

Value	Count	Frequency (%)
.	28	26.4%
1	18	17.0%
7	10	9.4%
2	9	8.5%
9	8	7.5%
4	8	7.5%
8	7	6.6%
5	5	4.7%
-	4	3.8%
3	4	3.8%
Other values (2)	5	4.7%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	74	69.8%
Other Punctuation	28	26.4%
Dash Punctuation	4	3.8%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	18	24.3%
7	10	13.5%
2	9	12.2%
9	8	10.8%
4	8	10.8%
8	7	9.5%
5	5	6.8%
3	4	5.4%
6	3	4.1%
0	2	2.7%

Other Punctuation

Value	Count	Frequency (%)
.	28	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	4	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	106	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
.	28	26.4%
1	18	17.0%
7	10	9.4%
2	9	8.5%
9	8	7.5%
4	8	7.5%
8	7	6.6%
5	5	4.7%
-	4	3.8%
3	4	3.8%
Other values (2)	5	4.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	106	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
.	28	26.4%
1	18	17.0%
7	10	9.4%
2	9	8.5%
9	8	7.5%
4	8	7.5%
8	7	6.6%
5	5	4.7%
-	4	3.8%
3	4	3.8%
Other values (2)	5	4.7%

개통일
Text

Distinct	35
Distinct (%)	94.6%
Missing	0
Missing (%)	0.0%
Memory size	428.0 B

Length

Max length	9
Median length	8
Mean length	8.3513514
Min length	7

Characters and Unicode

Total characters	309
Distinct characters	12
Distinct categories	2 ?
Distinct scripts	1 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	33 ?
Unique (%)	89.2%

Sample

1st row	'74.8.15
2nd row	'05.12.21
3rd row	'80.10.31
4th row	'82.12.23
5th row	'83.9.16

Value	Count	Frequency (%)
96.3.20	2	5.4%
85.10.18	2	5.4%
95.11.15	1	2.7%
96.8.12	1	2.7%
96.12.30	1	2.7%
00.8.7	1	2.7%
00.12.15	1	2.7%
01.3.9	1	2.7%
96.10.11	1	2.7%
96.3.30	1	2.7%
Other values (25)	25	67.6%

Most occurring characters

Value	Count	Frequency (%)
.	74	23.9%
1	39	12.6%
'	37	12.0%
0	33	10.7%
2	32	10.4%
9	21	6.8%
8	17	5.5%
3	16	5.2%
5	14	4.5%
6	9	2.9%
Other values (2)	17	5.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	198	64.1%
Other Punctuation	111	35.9%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	39	19.7%
0	33	16.7%
2	32	16.2%
9	21	10.6%
8	17	8.6%
3	16	8.1%
5	14	7.1%
6	9	4.5%
4	9	4.5%
7	8	4.0%

Other Punctuation

Value	Count	Frequency (%)
.	74	66.7%
'	37	33.3%

Most occurring scripts

Value	Count	Frequency (%)
Common	309	100.0%

Most frequent character per script

Common

Value	Count	Frequency (%)
.	74	23.9%
1	39	12.6%
'	37	12.0%
0	33	10.7%
2	32	10.4%
9	21	6.8%
8	17	5.5%
3	16	5.2%
5	14	4.5%
6	9	2.9%
Other values (2)	17	5.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	309	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
.	74	23.9%
1	39	12.6%
'	37	12.0%
0	33	10.7%
2	32	10.4%
9	21	6.8%
8	17	5.5%
3	16	5.2%
5	14	4.5%
6	9	2.9%
Other values (2)	17	5.5%

호선

호선

Phik (φk)
Auto

Heatmap
Table

	회사	호선	역	역 수	연 장(km)	개통일
회사	1.000	1.000	1.000	0.624	0.000	0.928
호선	1.000	1.000	1.000	0.000	0.000	0.915
역	1.000	1.000	1.000	1.000	1.000	1.000
역 수	0.624	0.000	1.000	1.000	0.983	0.898
연 장(km)	0.000	0.000	1.000	0.983	1.000	0.933
개통일	0.928	0.915	1.000	0.898	0.933	1.000

Heatmap
Table

	호선	회사
호선	1.000	0.907
회사	0.907	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	회사	호선	역	역 수	연 장(km)	개통일
0	서울메트로	1	서울↔청량리	9	7.8	'74.8.15
1	서울메트로	1	동묘앞	1	-	'05.12.21
2	서울메트로	2	신설동(2)↔종합운동장	11	14.3	'80.10.31
3	서울메트로	2	종합운동장↔교대(2)	5	5.5	'82.12.23
4	서울메트로	2	을지입구↔성수	9	7.9	'83.9.16
5	서울메트로	2	교대(2)↔서울대입구	5	6.7	'83.12.17
6	서울메트로	2	서울대입구↔을지입구	16	19.8	'84.5.22
7	서울메트로	2	신도림↔양천구청	2	2.7	'92.5.22
8	서울메트로	2	양천구청↔신정네거리	1	1.9	'96.2.29
9	서울메트로	2	신정네거리↔까치산	-	1.4	'96.3.20

	회사	호선	역	역 수	연 장(km)	개통일
27	도시철도공사	6	이태원↔약수	4	-	'01.3.9
28	도시철도공사	7	장암↔건대입구	19	19	'96.10.11
29	도시철도공사	7	온수↔신풍	8	9.2	'00.2.29
30	도시철도공사	7	온수↔부평구청	9	10.2	'12.10.27
31	도시철도공사	7	건대입구↔신풍	15	18.7	'00.8.1
32	도시철도공사	8	잠실↔모란	13	13.1	'96.11.23
33	도시철도공사	8	암사↔잠실	4	4.6	'99.07.02
34	메트로9호선	9	개화↔신논현	24	27	'09.07.24
35	메트로9호선	9	마곡나루역	1	-	'14.05.24
36	메트로9호선	9	신논현↔종합운동장	5	4.7	'15.3.28

Overview

Variables

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Math Symbol

Close Punctuation

Open Punctuation

Decimal Number

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Arrows

Hangul

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Other Punctuation

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Other Punctuation

Most occurring scripts

Most frequent character per script

Common

Most occurring blocks

Most frequent character per block

ASCII

Interactions

Correlations

Missing values

Sample