gimi9 Pandas Profiling

Dataset statistics

Number of variables	6
Number of observations	9764
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	321
Duplicate rows (%)	3.3%
Total size in memory	467.4 KiB
Average record size in memory	49.0 B

Variable types

Categorical	2
Text	3
Numeric	1

Dataset

Description	서울교통공사의 역별 역세권 현황 정보 입니다. 해당 데이터는 호선, 외부코드, 전철역코드, 출구번호, 역세권 명 데이터를 포함하고 있습니다.
Author	서울교통공사
URL	https://www.data.go.kr/data/15044230/fileData.do

Alerts

Dataset has 321 (3.3%) duplicate rows	Duplicates
`전철역코드` is highly overall correlated with `호선`	High correlation
`호선` is highly overall correlated with `전철역코드`	High correlation

Reproduction

Analysis started	2023-12-12 17:09:30.270381
Analysis finished	2023-12-12 17:09:31.302973
Duration	1.03 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

호선
Categorical

HIGH CORRELATION

Distinct	18
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	76.4 KiB

2	1877
1	1394
3	1248
5	936
4	761
Other values (13)	3548

Length

Max length	2
Median length	1
Mean length	1.0101393
Min length	1

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	2
2nd row	1
3rd row	1
4th row	2
5th row	2

Common Values

Value	Count	Frequency (%)
2	1877	19.2%
1	1394	14.3%
3	1248	12.8%
5	936	9.6%
4	761	7.8%
7	759	7.8%
I	576	5.9%
6	509	5.2%
B	403	4.1%
K	321	3.3%
Other values (8)	980	10.0%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
2	1877	19.2%
1	1394	14.3%
3	1248	12.8%
5	936	9.6%
4	761	7.8%
7	759	7.8%
i	576	5.9%
6	509	5.2%
b	403	4.1%
k	321	3.3%
Other values (8)	980	10.0%

외부코드
Text

Distinct	532
Distinct (%)	5.4%
Missing	0
Missing (%)	0.0%
Memory size	76.4 KiB

Length

Max length	6
Median length	3
Mean length	3.2473372
Min length	2

Characters and Unicode

Total characters	31707
Distinct characters	19
Distinct categories	4 ?
Distinct scripts	2 ?
Distinct blocks	1 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6 ?
Unique (%)	0.1%

Sample

1st row	202
2nd row	133
3rd row	133
4th row	221
5th row	225

Value	Count	Frequency (%)
132	158	1.6%
328	103	1.1%
327	103	1.1%
226	79	0.8%
126	74	0.8%
203	70	0.7%
133	69	0.7%
i114	68	0.7%
233	63	0.6%
218	62	0.6%
Other values (522)	8915	91.3%

Most occurring characters

Value	Count	Frequency (%)
2	6372	20.1%
1	5860	18.5%
3	4435	14.0%
4	3008	9.5%
5	2444	7.7%
7	1770	5.6%
6	1560	4.9%
0	1510	4.8%
8	1266	4.0%
9	1177	3.7%
Other values (9)	2305	7.3%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	29402	92.7%
Uppercase Letter	2089	6.6%
Dash Punctuation	154	0.5%
Lowercase Letter	62	0.2%

Most frequent character per category

Decimal Number

Value	Count	Frequency (%)
2	6372	21.7%
1	5860	19.9%
3	4435	15.1%
4	3008	10.2%
5	2444	8.3%
7	1770	6.0%
6	1560	5.3%
0	1510	5.1%
8	1266	4.3%
9	1177	4.0%

Uppercase Letter

Value	Count	Frequency (%)
K	757	36.2%
I	576	27.6%
P	537	25.7%
Y	115	5.5%
U	67	3.2%
D	21	1.0%
A	16	0.8%

Dash Punctuation

Value	Count	Frequency (%)
-	154	100.0%

Lowercase Letter

Value	Count	Frequency (%)
k	62	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	29556	93.2%
Latin	2151	6.8%

Most frequent character per script

Common

Value	Count	Frequency (%)
2	6372	21.6%
1	5860	19.8%
3	4435	15.0%
4	3008	10.2%
5	2444	8.3%
7	1770	6.0%
6	1560	5.3%
0	1510	5.1%
8	1266	4.3%
9	1177	4.0%

Latin

Value	Count	Frequency (%)
K	757	35.2%
I	576	26.8%
P	537	25.0%
Y	115	5.3%
U	67	3.1%
k	62	2.9%
D	21	1.0%
A	16	0.7%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	31707	100.0%

Most frequent character per block

ASCII

Value	Count	Frequency (%)
2	6372	20.1%
1	5860	18.5%
3	4435	14.0%
4	3008	9.5%
5	2444	7.7%
7	1770	5.6%
6	1560	4.9%
0	1510	4.8%
8	1266	4.0%
9	1177	3.7%
Other values (9)	2305	7.3%

전철역코드
Real number (ℝ)

HIGH CORRELATION

Distinct	532
Distinct (%)	5.4%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	1511.7837

Minimum	150
Maximum	4615
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	85.9 KiB

Quantile statistics

Minimum	150
5-th percentile	157
Q1	249
median	1328
Q3	2622
95-th percentile	4103
Maximum	4615
Range	4465
Interquartile range (IQR)	2373

Descriptive statistics

Standard deviation	1254.6623
Coefficient of variation (CV)	0.82992183
Kurtosis	-0.96777667
Mean	1511.7837
Median Absolute Deviation (MAD)	1110
Skewness	0.4486409
Sum	14761056
Variance	1574177.5
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
151	158	1.6%
318	103	1.1%
317	103	1.1%
226	79	0.8%
156	74	0.8%
203	70	0.7%
150	69	0.7%
3114	68	0.7%
233	63	0.6%
218	62	0.6%
Other values (522)	8915	91.3%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
150	69	0.7%
151	158	1.6%
152	33	0.3%
153	37	0.4%
154	54	0.6%
155	61	0.6%
156	74	0.8%
157	46	0.5%
158	27	0.3%
159	59	0.6%

Value	Count	Frequency (%)
4615	1	< 0.1%
4614	11	0.1%
4613	12	0.1%
4612	9	0.1%
4611	2	< 0.1%
4610	4	< 0.1%
4609	2	< 0.1%
4608	4	< 0.1%
4606	3	< 0.1%
4605	5	0.1%

전철역명
Text

Distinct	530
Distinct (%)	5.4%
Missing	0
Missing (%)	0.0%
Memory size	76.4 KiB

Length

Max length	9
Median length	2
Mean length	2.8467841
Min length	2

Characters and Unicode

Total characters	27796
Distinct characters	288
Distinct categories	5 ?
Distinct scripts	2 ?
Distinct blocks	3 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6 ?
Unique (%)	0.1%

Sample

1st row	을지로입구
2nd row	서울
3rd row	서울
4th row	역삼
5th row	방배

Value	Count	Frequency (%)
시청	158	1.6%
안국	103	1.1%
경복궁	103	1.1%
사당	79	0.8%
신설동	74	0.8%
을지로3가	70	0.7%
서울	69	0.7%
계산	68	0.7%
대림	63	0.6%
종합운동장	62	0.6%
Other values (520)	8915	91.3%

Most occurring characters

Value	Count	Frequency (%)
대	1025	3.7%
동	858	3.1%
구	855	3.1%
신	733	2.6%
산	594	2.1%
청	564	2.0%
천	466	1.7%
수	465	1.7%
문	459	1.7%
입	455	1.6%
Other values (278)	21322	76.7%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	27441	98.7%
Decimal Number	193	0.7%
Open Punctuation	68	0.2%
Close Punctuation	68	0.2%
Other Punctuation	26	0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
대	1025	3.7%
동	858	3.1%
구	855	3.1%
신	733	2.7%
산	594	2.2%
청	564	2.1%
천	466	1.7%
수	465	1.7%
문	459	1.7%
입	455	1.7%
Other values (272)	20967	76.4%

Decimal Number

Value	Count	Frequency (%)
3	107	55.4%
5	54	28.0%
4	32	16.6%

Open Punctuation

Value	Count	Frequency (%)
(	68	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	68	100.0%

Other Punctuation

Value	Count	Frequency (%)
·	26	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	27441	98.7%
Common	355	1.3%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
대	1025	3.7%
동	858	3.1%
구	855	3.1%
신	733	2.7%
산	594	2.2%
청	564	2.1%
천	466	1.7%
수	465	1.7%
문	459	1.7%
입	455	1.7%
Other values (272)	20967	76.4%

Common

Value	Count	Frequency (%)
3	107	30.1%
(	68	19.2%
)	68	19.2%
5	54	15.2%
4	32	9.0%
·	26	7.3%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	27441	98.7%
ASCII	329	1.2%
None	26	0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
대	1025	3.7%
동	858	3.1%
구	855	3.1%
신	733	2.7%
산	594	2.2%
청	564	2.1%
천	466	1.7%
수	465	1.7%
문	459	1.7%
입	455	1.7%
Other values (272)	20967	76.4%

ASCII

Value	Count	Frequency (%)
3	107	32.5%
(	68	20.7%
)	68	20.7%
5	54	16.4%
4	32	9.7%

None

Value	Count	Frequency (%)
·	26	100.0%

출구번호
Categorical

Distinct	20
Distinct (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	76.4 KiB

1	2477
2	1918
3	1505
4	1257
5	720
Other values (15)	1887

Length

Max length	7
Median length	1
Mean length	1.0409668
Min length	1

Unique

Unique	2 ?
Unique (%)	< 0.1%

Sample

1st row	3
2nd row	13
3rd row	13
4th row	7
5th row	1

Common Values

Value	Count	Frequency (%)
1	2477	25.4%
2	1918	19.6%
3	1505	15.4%
4	1257	12.9%
5	720	7.4%
6	658	6.7%
7	423	4.3%
8	340	3.5%
9	136	1.4%
10	121	1.2%
Other values (10)	209	2.1%

Length

Histogram of lengths of the category

Value	Count	Frequency (%)
1	2477	25.3%
2	1918	19.6%
3	1505	15.4%
4	1257	12.9%
5	720	7.4%
6	658	6.7%
7	423	4.3%
8	340	3.5%
9	136	1.4%
10	121	1.2%
Other values (11)	223	2.3%

역세권명
Text

Distinct	7788
Distinct (%)	79.8%
Missing	0
Missing (%)	0.0%
Memory size	76.4 KiB

Length

Max length	37
Median length	26
Mean length	6.7970094
Min length	2

Characters and Unicode

Total characters	66366
Distinct characters	644
Distinct categories	11 ?
Distinct scripts	3 ?
Distinct blocks	5 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	6520 ?
Unique (%)	66.8%

Sample

1st row	광교
2nd row	대일학원
3rd row	삼광초등학교
4th row	GS타워
5th row	KT 서초지사

Value	Count	Frequency (%)
방면	80	0.7%
아파트	35	0.3%
기업은행	34	0.3%
주민센터	34	0.3%
국민은행	33	0.3%
현대아파트	30	0.3%
우리은행	23	0.2%
우체국	23	0.2%
국민건강보험공단	20	0.2%
우성아파트	19	0.2%
Other values (7987)	10429	96.9%

Most occurring characters

Value	Count	Frequency (%)
동	2185	3.3%
교	2180	3.3%
학	1910	2.9%
0	1249	1.9%
소	1223	1.8%
파	1196	1.8%
등	1191	1.8%
사	1083	1.6%
아	1042	1.6%
대	1022	1.5%
Other values (634)	52085	78.5%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	58608	88.3%
Decimal Number	3701	5.6%
Space Separator	996	1.5%
Open Punctuation	920	1.4%
Close Punctuation	915	1.4%
Other Punctuation	685	1.0%
Uppercase Letter	494	0.7%
Dash Punctuation	17	< 0.1%
Other Symbol	15	< 0.1%
Math Symbol	10	< 0.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
동	2185	3.7%
교	2180	3.7%
학	1910	3.3%
소	1223	2.1%
파	1196	2.0%
등	1191	2.0%
사	1083	1.8%
아	1042	1.8%
대	1022	1.7%
서	1009	1.7%
Other values (585)	44567	76.0%

Uppercase Letter

Value	Count	Frequency (%)
A	70	14.2%
K	60	12.1%
S	58	11.7%
T	54	10.9%
C	38	7.7%
G	37	7.5%
L	23	4.7%
B	23	4.7%
P	22	4.5%
M	17	3.4%
Other values (13)	92	18.6%

Decimal Number

Value	Count	Frequency (%)
0	1249	33.7%
1	623	16.8%
2	564	15.2%
3	340	9.2%
5	337	9.1%
4	222	6.0%
6	150	4.1%
7	94	2.5%
8	65	1.8%
9	57	1.5%

Other Punctuation

Value	Count	Frequency (%)
/	471	68.8%
.	138	20.1%
·	73	10.7%
@	3	0.4%

Lowercase Letter

Value	Count	Frequency (%)
o	2	40.0%
l	2	40.0%
d	1	20.0%

Open Punctuation

Value	Count	Frequency (%)
(	919	99.9%
[	1	0.1%

Close Punctuation

Value	Count	Frequency (%)
)	914	99.9%
]	1	0.1%

Math Symbol

Value	Count	Frequency (%)
~	6	60.0%
∼	4	40.0%

Space Separator

Value	Count	Frequency (%)
	996	100.0%

Dash Punctuation

Value	Count	Frequency (%)
-	17	100.0%

Other Symbol

Value	Count	Frequency (%)
㈜	15	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	58623	88.3%
Common	7244	10.9%
Latin	499	0.8%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
동	2185	3.7%
교	2180	3.7%
학	1910	3.3%
소	1223	2.1%
파	1196	2.0%
등	1191	2.0%
사	1083	1.8%
아	1042	1.8%
대	1022	1.7%
서	1009	1.7%
Other values (586)	44582	76.0%

Latin

Value	Count	Frequency (%)
A	70	14.0%
K	60	12.0%
S	58	11.6%
T	54	10.8%
C	38	7.6%
G	37	7.4%
L	23	4.6%
B	23	4.6%
P	22	4.4%
M	17	3.4%
Other values (16)	97	19.4%

Common

Value	Count	Frequency (%)
0	1249	17.2%
	996	13.7%
(	919	12.7%
)	914	12.6%
1	623	8.6%
2	564	7.8%
/	471	6.5%
3	340	4.7%
5	337	4.7%
4	222	3.1%
Other values (12)	609	8.4%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	58603	88.3%
ASCII	7666	11.6%
None	88	0.1%
Compat Jamo	5	< 0.1%
Math Operators	4	< 0.1%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
동	2185	3.7%
교	2180	3.7%
학	1910	3.3%
소	1223	2.1%
파	1196	2.0%
등	1191	2.0%
사	1083	1.8%
아	1042	1.8%
대	1022	1.7%
서	1009	1.7%
Other values (584)	44562	76.0%

ASCII

Value	Count	Frequency (%)
0	1249	16.3%
	996	13.0%
(	919	12.0%
)	914	11.9%
1	623	8.1%
2	564	7.4%
/	471	6.1%
3	340	4.4%
5	337	4.4%
4	222	2.9%
Other values (36)	1031	13.4%

None

Value	Count	Frequency (%)
·	73	83.0%
㈜	15	17.0%

Compat Jamo

Value	Count	Frequency (%)
ㆍ	5	100.0%

Math Operators

Value	Count	Frequency (%)
∼	4	100.0%

전철역코드

전철역코드

Heatmap
Table

	호선	전철역코드	출구번호
호선	1.000	0.989	0.303
전철역코드	0.989	1.000	0.315
출구번호	0.303	0.315	1.000

Heatmap
Table

	출구번호	호선
출구번호	1.000	0.095
호선	0.095	1.000

Heatmap
Table

	전철역코드	호선	출구번호
전철역코드	1.000	0.828	0.131
호선	0.828	1.000	0.095
출구번호	0.131	0.095	1.000

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows
Last rows

	호선	외부코드	전철역코드	전철역명	출구번호	역세권명
0	2	202	202	을지로입구	3	광교
1	1	133	150	서울	13	대일학원
2	1	133	150	서울	13	삼광초등학교
3	2	221	221	역삼	7	GS타워
4	2	225	225	방배	1	KT 서초지사
5	2	225	225	방배	1	남부순환로
6	2	225	225	방배	1	삼익
7	2	225	225	방배	1	서울강남지방노동사무소
8	2	225	225	방배	1	서초여성회관
9	2	225	225	방배	1	서울남부보훈지청

	호선	외부코드	전철역코드	전철역명	출구번호	역세권명
9754	U	U123	4613	어룡	2	신한은행
9755	U	U123	4613	어룡	2	충의중학교
9756	U	U123	4613	어룡	2	오동초등학교
9757	U	U123	4613	어룡	2	송현고등학교
9758	U	U123	4613	어룡	2	송산2동주민센터
9759	U	U123	4613	어룡	2	의정부용현초등학교
9760	U	U123	4613	어룡	2	송산푸르지오아파트
9761	U	U123	4613	어룡	2	송산주공2.5.7단지아파트
9762	U	U123	4613	어룡	2	근제근린공원
9763	U	U124	4614	송산	1	클나무지역아동센터

Most frequently occurring

	호선	외부코드	전철역코드	전철역명	출구번호	역세권명	# duplicates
27	1	132	151	시청	3	대한성공회	3
216	3	327	317	경복궁	5	경복궁	3
0	1	124	158	청량리	3	미주아파트	2
1	1	124	158	청량리	6	동부청과시장	2
2	1	124	158	청량리	6	성바오로병원	2
3	1	126	156	신설동	1	대광초등학교	2
4	1	126	156	신설동	10	동대문우체국	2
5	1	126	156	신설동	4	동대문등기소	2
6	1	126	156	신설동	5	국민연금동대문중랑지사	2
7	1	126	156	신설동	7	동대문등기소	2

Overview

Variables

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Decimal Number

Uppercase Letter

Dash Punctuation

Lowercase Letter

Most occurring scripts

Most frequent character per script

Common

Latin

Most occurring blocks

Most frequent character per block

ASCII

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Decimal Number

Open Punctuation

Close Punctuation

Other Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

None

Common Values

Length

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Uppercase Letter

Decimal Number

Other Punctuation

Lowercase Letter

Open Punctuation

Close Punctuation

Math Symbol

Space Separator

Dash Punctuation

Other Symbol

Most occurring scripts

Most frequent character per script

Hangul

Latin

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

None

Compat Jamo

Math Operators

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring