gimi9 Pandas Profiling

Dataset statistics

Number of variables	4
Number of observations	784
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	24.6 KiB
Average record size in memory	32.2 B

Variable types

Text	3
Categorical	1

Dataset

Description	전철역코드,전철역명,호선,외부코드
Author	서울교통공사
URL	https://data.seoul.go.kr/dataList/OA-121/S/1/datasetView.do

Alerts

`전철역코드` has unique values	Unique
`외부코드` has unique values	Unique

Reproduction

Analysis started	2024-05-18 04:45:40.986060
Analysis finished	2024-05-18 04:45:41.848125
Duration	0.86 seconds
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

전철역코드
Text

UNIQUE

Distinct	784
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	6.3 KiB

Length

Max length	4
Median length	4
Mean length	4
Min length	4

Characters and Unicode

Total characters	3136
Distinct characters	11
Distinct categories	2 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	784 ?
Unique (%)	100.0%

Sample

1st row	0008
2nd row	0009
3rd row	0011
4th row	0228
5th row	0318

Value	Count	Frequency (%)
0008	1	0.1%
1268	1	0.1%
1902	1	0.1%
1329	1	0.1%
1501	1	0.1%
1801	1	0.1%
1831	1	0.1%
1850	1	0.1%
1856	1	0.1%
1866	1	0.1%
Other values (774)	774	98.7%

Most occurring characters

Value	Count	Frequency (%)
1	654	20.9%
2	521	16.6%
0	399	12.7%
4	329	10.5%
3	310	9.9%
5	235	7.5%
7	204	6.5%
8	197	6.3%
6	165	5.3%
9	113	3.6%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	3127	99.7%
Uppercase Letter	9	0.3%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	654	20.9%
2	521	16.7%
0	399	12.8%
4	329	10.5%
3	310	9.9%
5	235	7.5%
7	204	6.5%
8	197	6.3%
6	165	5.3%
9	113	3.6%

Uppercase Letter

Value	Count	Frequency (%)
C	9	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	3127	99.7%
Latin	9	0.3%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	654	20.9%
2	521	16.7%
0	399	12.8%
4	329	10.5%
3	310	9.9%
5	235	7.5%
7	204	6.5%
8	197	6.3%
6	165	5.3%
9	113	3.6%

Latin

Value	Count	Frequency (%)
C	9	100.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	3136	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	654	20.9%
2	521	16.6%
0	399	12.7%
4	329	10.5%
3	310	9.9%
5	235	7.5%
7	204	6.5%
8	197	6.3%
6	165	5.3%
9	113	3.6%

전철역명
Text

Distinct	645
Distinct (%)	82.3%
Missing	0
Missing (%)	0.0%
Memory size	6.3 KiB

Length

Max length	9
Median length	2
Mean length	2.8596939
Min length	2

Characters and Unicode

Total characters	2242
Distinct characters	306
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	528 ?
Unique (%)	67.3%

Sample

1st row	수서
2nd row	성남
3rd row	동탄
4th row	서울대입구
5th row	안국

Value	Count	Frequency (%)
김포공항	5	0.6%
서울역	4	0.5%
청량리	4	0.5%
공덕	4	0.5%
왕십리	4	0.5%
회기	3	0.4%
수서	3	0.4%
디지털미디어시티	3	0.4%
상봉	3	0.4%
고속터미널	3	0.4%
Other values (635)	748	95.4%

Most occurring characters

Value	Count	Frequency (%)
대	66	2.9%
산	60	2.7%
구	53	2.4%
신	51	2.3%
동	49	2.2%
천	48	2.1%
정	44	2.0%
청	41	1.8%
원	40	1.8%
지	33	1.5%
Other values (296)	1757	78.4%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	2226	99.3%
Decimal Number	13	0.6%
Other Punctuation	3	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
대	66	3.0%
산	60	2.7%
구	53	2.4%
신	51	2.3%
동	49	2.2%
천	48	2.2%
정	44	2.0%
청	41	1.8%
원	40	1.8%
지	33	1.5%
Other values (288)	1741	78.2%

Decimal Number

Value	Count	Frequency (%)
3	5	38.5%
4	3	23.1%
1	2	15.4%
9	1	7.7%
2	1	7.7%
5	1	7.7%

Other Punctuation

Value	Count	Frequency (%)
.	2	66.7%
?	1	33.3%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	2226	99.3%
Common	16	0.7%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
대	66	3.0%
산	60	2.7%
구	53	2.4%
신	51	2.3%
동	49	2.2%
천	48	2.2%
정	44	2.0%
청	41	1.8%
원	40	1.8%
지	33	1.5%
Other values (288)	1741	78.2%

Common

Value	Count	Frequency (%)
3	5	31.2%
4	3	18.8%
.	2	12.5%
1	2	12.5%
9	1	6.2%
?	1	6.2%
2	1	6.2%
5	1	6.2%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	2226	99.3%
ASCII	16	0.7%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
대	66	3.0%
산	60	2.7%
구	53	2.4%
신	51	2.3%
동	49	2.2%
천	48	2.2%
정	44	2.0%
청	41	1.8%
원	40	1.8%
지	33	1.5%
Other values (288)	1741	78.2%

ASCII

Value	Count	Frequency (%)
3	5	31.2%
4	3	18.8%
.	2	12.5%
1	2	12.5%
9	1	6.2%
?	1	6.2%
2	1	6.2%
5	1	6.2%

호선
Categorical

Distinct	24
Distinct (%)	3.1%
Missing	0
Missing (%)	0.0%
Memory size	6.3 KiB

01호선	102
수인분당선	63
경의선	57
05호선	56
07호선	53
Other values (19)	453

Length

Max length	7
Median length	4
Mean length	4.0522959
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	GTX-A
2nd row	GTX-A
3rd row	GTX-A
4th row	02호선
5th row	03호선

Common Values

Value	Count	Frequency (%)
01호선	102	13.0%
수인분당선	63	8.0%
경의선	57	7.3%
05호선	56	7.1%
07호선	53	6.8%
02호선	51	6.5%
04호선	51	6.5%
03호선	44	5.6%
06호선	39	5.0%
09호선	38	4.8%
Other values (14)	230	29.3%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
01호선	102	13.0%
수인분당선	63	8.0%
경의선	57	7.3%
05호선	56	7.1%
07호선	53	6.8%
02호선	51	6.5%
04호선	51	6.5%
03호선	44	5.6%
06호선	39	5.0%
09호선	38	4.8%
Other values (14)	230	29.3%

외부코드
Text

UNIQUE

Distinct	784
Distinct (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	6.3 KiB

Length

Max length	6
Median length	3
Mean length	3.4145408
Min length	2

Characters and Unicode

Total characters	2677
Distinct characters	20
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	784 ?
Unique (%)	100.0%

Sample

1st row	X108
2nd row	X109
3rd row	X111
4th row	228
5th row	328

Value	Count	Frequency (%)
x108	1	0.1%
k318	1	0.1%
114	1	0.1%
p140	1	0.1%
k409	1	0.1%
143	1	0.1%
k252	1	0.1%
k214	1	0.1%
k229	1	0.1%
k238	1	0.1%
Other values (774)	774	98.7%

Most occurring characters

Value	Count	Frequency (%)
1	513	19.2%
2	392	14.6%
3	282	10.5%
4	251	9.4%
5	203	7.6%
0	155	5.8%
6	149	5.6%
7	141	5.3%
9	134	5.0%
K	130	4.9%
Other values (10)	327	12.2%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	2311	86.3%
Uppercase Letter	353	13.2%
Dash Punctuation	13	0.5%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
1	513	22.2%
2	392	17.0%
3	282	12.2%
4	251	10.9%
5	203	8.8%
0	155	6.7%
6	149	6.4%
7	141	6.1%
9	134	5.8%
8	91	3.9%

Uppercase Letter

Value	Count	Frequency (%)
K	130	36.8%
P	71	20.1%
I	57	16.1%
S	32	9.1%
D	16	4.5%
U	15	4.2%
Y	15	4.2%
A	14	4.0%
X	3	0.8%

Dash Punctuation

Value	Count	Frequency (%)
-	13	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	2324	86.8%
Latin	353	13.2%

Most frequent character per script

Common

Value	Count	Frequency (%)
1	513	22.1%
2	392	16.9%
3	282	12.1%
4	251	10.8%
5	203	8.7%
0	155	6.7%
6	149	6.4%
7	141	6.1%
9	134	5.8%
8	91	3.9%

Latin

Value	Count	Frequency (%)
K	130	36.8%
P	71	20.1%
I	57	16.1%
S	32	9.1%
D	16	4.5%
U	15	4.2%
Y	15	4.2%
A	14	4.0%
X	3	0.8%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	2677	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
1	513	19.2%
2	392	14.6%
3	282	10.5%
4	251	9.4%
5	203	7.6%
0	155	5.8%
6	149	5.6%
7	141	5.3%
9	134	5.0%
K	130	4.9%
Other values (10)	327	12.2%

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	전철역코드	전철역명	호선	외부코드
0	0008	수서	GTX-A	X108
1	0009	성남	GTX-A	X109
2	0011	동탄	GTX-A	X111
3	0228	서울대입구	02호선	228
4	0318	안국	03호선	328
5	0321	충무로	03호선	331
6	1209	도심	경의선	K127
7	1307	회기	경춘선	P118
8	1308	중랑	경춘선	P119
9	1404	탕정	01호선	P173

	전철역코드	전철역명	호선	외부코드
774	4920	양촌	김포도시철도	690
775	4921	구래	김포도시철도	691
776	4922	마산	김포도시철도	692
777	4923	장기	김포도시철도	693
778	4924	운양	김포도시철도	694
779	4925	걸포북변	김포도시철도	695
780	4926	사우	김포도시철도	696
781	1271	능곡	경의선	K321
782	1326	강촌	경춘선	P137
783	0300	대곡	경의선	K322

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Missing values

Sample