gimi9 Pandas Profiling

Dataset statistics

Number of variables	5
Number of observations	34
Missing cells	15
Missing cells (%)	8.8%
Duplicate rows	1
Duplicate rows (%)	2.9%
Total size in memory	1.5 KiB
Average record size in memory	45.9 B

Variable types

Categorical	2
Text	1
Numeric	2

Dataset

Description	서울시메트로9호선에서 운영하는 노선의 역 위치에 대한 데이터로, 철도운영기관명, 선명, 역명, 경도, 위도의 데이터가 있습니다.
Author	국가철도공단
URL	https://www.data.go.kr/data/15041335/fileData.do

Alerts

Dataset has 1 (2.9%) duplicate rows	Duplicates
`철도운영기관명` is highly overall correlated with `경도` and 2 other fields	High correlation
`선명` is highly overall correlated with `경도` and 2 other fields	High correlation
`경도` is highly overall correlated with `위도` and 2 other fields	High correlation
`위도` is highly overall correlated with `경도` and 2 other fields	High correlation
`역명` has 5 (14.7%) missing values	Missing
`경도` has 5 (14.7%) missing values	Missing
`위도` has 5 (14.7%) missing values	Missing

Reproduction

Analysis started	2023-12-12 05:00:19.083365
Analysis finished	2023-12-12 05:00:20.518275
Duration	1.43 second
Software version	ydata-profiling vv4.5.1
Download configuration	config.json

철도운영기관명
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	5.9%
Missing	0
Missing (%)	0.0%
Memory size	404.0 B

서울9호선	29
<NA>	5

Length

Max length	5
Median length	5
Mean length	4.8529412
Min length	4

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	서울9호선
2nd row	서울9호선
3rd row	서울9호선
4th row	서울9호선
5th row	서울9호선

Common Values

Value	Count	Frequency (%)
서울9호선	29	85.3%
<NA>	5	14.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
서울9호선	29	85.3%
na	5	14.7%

선명
Categorical

HIGH CORRELATION

Distinct	2
Distinct (%)	5.9%
Missing	0
Missing (%)	0.0%
Memory size	404.0 B

9호선	29
<NA>	5

Length

Max length	4
Median length	3
Mean length	3.1470588
Min length	3

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	9호선
2nd row	9호선
3rd row	9호선
4th row	9호선
5th row	9호선

Common Values

Value	Count	Frequency (%)
9호선	29	85.3%
<NA>	5	14.7%

Length

Histogram of lengths of the category

Common Values (Plot)

Value	Count	Frequency (%)
9호선	29	85.3%
na	5	14.7%

역명
Text

MISSING

Distinct	29
Distinct (%)	100.0%
Missing	5
Missing (%)	14.7%
Memory size	404.0 B

Length

Max length	9
Median length	5
Mean length	3.137931
Min length	2

Characters and Unicode

Total characters	91
Distinct characters	69
Distinct categories	3 ?
Distinct scripts	2 ?
Distinct blocks	2 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	29 ?
Unique (%)	100.0%

Sample

1st row	개화
2nd row	김포공항
3rd row	공항시장
4th row	신방화
5th row	마곡나루

Value	Count	Frequency (%)
김포공항	1	3.4%
노량진	1	3.4%
봉은사	1	3.4%
삼성중앙	1	3.4%
선정릉	1	3.4%
언주	1	3.4%
신논현	1	3.4%
사평	1	3.4%
고속터미널	1	3.4%
신반포	1	3.4%
Other values (19)	19	65.5%

Most occurring characters

Value	Count	Frequency (%)
신	4	4.4%
사	3	3.3%
포	3	3.3%
항	2	2.2%
반	2	2.2%
선	2	2.2%
미	2	2.2%
도	2	2.2%
당	2	2.2%
구	2	2.2%
Other values (59)	67	73.6%

Most occurring categories

Value	Count	Frequency (%)
Other Letter	89	97.8%
Open Punctuation	1	1.1%
Close Punctuation	1	1.1%

Most frequent character per category

Other Letter

Value	Count	Frequency (%)
신	4	4.5%
사	3	3.4%
포	3	3.4%
항	2	2.2%
반	2	2.2%
선	2	2.2%
미	2	2.2%
도	2	2.2%
당	2	2.2%
구	2	2.2%
Other values (57)	65	73.0%

Open Punctuation

Value	Count	Frequency (%)
(	1	100.0%

Close Punctuation

Value	Count	Frequency (%)
)	1	100.0%

Most occurring scripts

Value	Count	Frequency (%)
Hangul	89	97.8%
Common	2	2.2%

Most frequent character per script

Hangul

Value	Count	Frequency (%)
신	4	4.5%
사	3	3.4%
포	3	3.4%
항	2	2.2%
반	2	2.2%
선	2	2.2%
미	2	2.2%
도	2	2.2%
당	2	2.2%
구	2	2.2%
Other values (57)	65	73.0%

Common

Value	Count	Frequency (%)
(	1	50.0%
)	1	50.0%

Most occurring blocks

Value	Count	Frequency (%)
Hangul	89	97.8%
ASCII	2	2.2%

Most frequent character per block

Hangul

Value	Count	Frequency (%)
신	4	4.5%
사	3	3.4%
포	3	3.4%
항	2	2.2%
반	2	2.2%
선	2	2.2%
미	2	2.2%
도	2	2.2%
당	2	2.2%
구	2	2.2%
Other values (57)	65	73.0%

ASCII

Value	Count	Frequency (%)
(	1	50.0%
)	1	50.0%

경도
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	29
Distinct (%)	100.0%
Missing	5
Missing (%)	14.7%
Infinite	0
Infinite (%)	0.0%
Mean	126.92627

Minimum	126.79815
Maximum	127.06024
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	438.0 B

Quantile statistics

Minimum	126.79815
5-th percentile	126.80491
Q1	126.86194
median	126.92419
Q3	126.99592
95-th percentile	127.04941
Maximum	127.06024
Range	0.262092
Interquartile range (IQR)	0.133986

Descriptive statistics

Standard deviation	0.082446978
Coefficient of variation (CV)	0.0006495659
Kurtosis	-1.2455454
Mean	126.92627
Median Absolute Deviation (MAD)	0.069735
Skewness	0.047027533
Sum	3680.8619
Variance	0.0067975042
Monotonicity	Strictly increasing

Histogram with fixed size bins (bins=29)

Value	Count	Frequency (%)
126.801058	1	2.9%
127.060245	1	2.9%
127.053282	1	2.9%
127.043593	1	2.9%
127.033868	1	2.9%
127.02506	1	2.9%
127.015259	1	2.9%
127.004943	1	2.9%
126.995925	1	2.9%
126.987332	1	2.9%
Other values (19)	19	55.9%
(Missing)	5	14.7%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
126.798153	1	2.9%
126.801058	1	2.9%
126.810678	1	2.9%
126.816601	1	2.9%
126.829497	1	2.9%
126.841333	1	2.9%
126.854456	1	2.9%
126.861939	1	2.9%
126.865689	1	2.9%
126.874916	1	2.9%

Value	Count	Frequency (%)
127.060245	1	2.9%
127.053282	1	2.9%
127.043593	1	2.9%
127.033868	1	2.9%
127.02506	1	2.9%
127.015259	1	2.9%
127.004943	1	2.9%
126.995925	1	2.9%
126.987332	1	2.9%
126.979306	1	2.9%

위도
Real number (ℝ)

HIGH CORRELATION MISSING

Distinct	28
Distinct (%)	96.6%
Missing	5
Missing (%)	14.7%
Infinite	0
Infinite (%)	0.0%
Mean	37.531407

Minimum	37.501364
Maximum	37.578608
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	438.0 B

Quantile statistics

Minimum	37.501364
5-th percentile	37.503149
Q1	37.50877
median	37.521624
Q3	37.557402
95-th percentile	37.568041
Maximum	37.578608
Range	0.077244
Interquartile range (IQR)	0.048632

Descriptive statistics

Standard deviation	0.025601819
Coefficient of variation (CV)	0.00068214387
Kurtosis	-1.4355069
Mean	37.531407
Median Absolute Deviation (MAD)	0.017418
Skewness	0.41091336
Sum	1088.4108
Variance	0.00065545314
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=28)

Value	Count	Frequency (%)
37.514219	2	5.9%
37.517274	1	2.9%
37.513011	1	2.9%
37.51098	1	2.9%
37.507287	1	2.9%
37.504598	1	2.9%
37.504206	1	2.9%
37.50481	1	2.9%
37.503415	1	2.9%
37.501364	1	2.9%
Other values (18)	18	52.9%
(Missing)	5	14.7%

Minimum 10 values
Maximum 10 values

Value	Count	Frequency (%)
37.501364	1	2.9%
37.502971	1	2.9%
37.503415	1	2.9%
37.504206	1	2.9%
37.504598	1	2.9%
37.50481	1	2.9%
37.507287	1	2.9%
37.50877	1	2.9%
37.51098	1	2.9%
37.512887	1	2.9%

Value	Count	Frequency (%)
37.578608	1	2.9%
37.568381	1	2.9%
37.567532	1	2.9%
37.567336	1	2.9%
37.563726	1	2.9%
37.562434	1	2.9%
37.561391	1	2.9%
37.557402	1	2.9%
37.550632	1	2.9%
37.546936	1	2.9%

경도
위도

위도
경도

위도
경도

Heatmap
Table

	역명	경도	위도
역명	1.000	1.000	1.000
경도	1.000	1.000	0.921
위도	1.000	0.921	1.000

Heatmap
Table

	철도운영기관명	선명
철도운영기관명	1.000	1.000
선명	1.000	1.000

Heatmap
Table

	경도	위도	철도운영기관명	선명
경도	1.000	-0.885	1.000	1.000
위도	-0.885	1.000	1.000	1.000
철도운영기관명	1.000	1.000	1.000	1.000
선명	1.000	1.000	1.000	1.000

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

First rows
Last rows

	철도운영기관명	선명	역명	경도	위도
0	서울9호선	9호선	개화	126.798153	37.578608
1	서울9호선	9호선	김포공항	126.801058	37.562434
2	서울9호선	9호선	공항시장	126.810678	37.563726
3	서울9호선	9호선	신방화	126.816601	37.567532
4	서울9호선	9호선	마곡나루	126.829497	37.567336
5	서울9호선	9호선	양천향교	126.841333	37.568381
6	서울9호선	9호선	가양	126.854456	37.561391
7	서울9호선	9호선	증미	126.861939	37.557402
8	서울9호선	9호선	등촌	126.865689	37.550632
9	서울9호선	9호선	염창	126.874916	37.546936

	철도운영기관명	선명	역명	경도	위도
24	서울9호선	9호선	신논현	127.02506	37.504598
25	서울9호선	9호선	언주	127.033868	37.507287
26	서울9호선	9호선	선정릉	127.043593	37.51098
27	서울9호선	9호선	삼성중앙	127.053282	37.513011
28	서울9호선	9호선	봉은사	127.060245	37.514219
29	<NA>	<NA>	<NA>	<NA>	<NA>
30	<NA>	<NA>	<NA>	<NA>	<NA>
31	<NA>	<NA>	<NA>	<NA>	<NA>
32	<NA>	<NA>	<NA>	<NA>	<NA>
33	<NA>	<NA>	<NA>	<NA>	<NA>

Most frequently occurring

	철도운영기관명	선명	역명	경도	위도	# duplicates
0	<NA>	<NA>	<NA>	<NA>	<NA>	5

Overview

Variables

Common Values

Length

Common Values (Plot)

Common Values

Length

Common Values (Plot)

Most occurring characters

Most occurring categories

Most frequent character per category

Other Letter

Open Punctuation

Close Punctuation

Most occurring scripts

Most frequent character per script

Hangul

Common

Most occurring blocks

Most frequent character per block

Hangul

ASCII

Interactions

Correlations

Missing values

Sample

Duplicate rows

Most frequently occurring