Overview

Dataset statistics

Number of variables5
Number of observations34
Missing cells15
Missing cells (%)8.8%
Duplicate rows1
Duplicate rows (%)2.9%
Total size in memory1.5 KiB
Average record size in memory45.9 B

Variable types

Categorical2
Text1
Numeric2

Dataset

Description서울시메트로9호선에서 운영하는 노선의 역 위치에 대한 데이터로, 철도운영기관명, 선명, 역명, 경도, 위도의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15041335/fileData.do

Alerts

Dataset has 1 (2.9%) duplicate rowsDuplicates
철도운영기관명 is highly overall correlated with 경도 and 2 other fieldsHigh correlation
선명 is highly overall correlated with 경도 and 2 other fieldsHigh correlation
경도 is highly overall correlated with 위도 and 2 other fieldsHigh correlation
위도 is highly overall correlated with 경도 and 2 other fieldsHigh correlation
역명 has 5 (14.7%) missing valuesMissing
경도 has 5 (14.7%) missing valuesMissing
위도 has 5 (14.7%) missing valuesMissing

Reproduction

Analysis started2023-12-12 05:00:19.083365
Analysis finished2023-12-12 05:00:20.518275
Duration1.43 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size404.0 B
서울9호선
29 
<NA>

Length

Max length5
Median length5
Mean length4.8529412
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row서울9호선
2nd row서울9호선
3rd row서울9호선
4th row서울9호선
5th row서울9호선

Common Values

ValueCountFrequency (%)
서울9호선 29
85.3%
<NA> 5
 
14.7%

Length

2023-12-12T14:00:20.606383image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:00:20.733398image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
서울9호선 29
85.3%
na 5
 
14.7%

선명
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Memory size404.0 B
9호선
29 
<NA>

Length

Max length4
Median length3
Mean length3.1470588
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row9호선
2nd row9호선
3rd row9호선
4th row9호선
5th row9호선

Common Values

ValueCountFrequency (%)
9호선 29
85.3%
<NA> 5
 
14.7%

Length

2023-12-12T14:00:20.872563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T14:00:20.995472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
9호선 29
85.3%
na 5
 
14.7%

역명
Text

MISSING 

Distinct29
Distinct (%)100.0%
Missing5
Missing (%)14.7%
Memory size404.0 B
2023-12-12T14:00:21.190474image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length9
Median length5
Mean length3.137931
Min length2

Characters and Unicode

Total characters91
Distinct characters69
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique29 ?
Unique (%)100.0%

Sample

1st row개화
2nd row김포공항
3rd row공항시장
4th row신방화
5th row마곡나루
ValueCountFrequency (%)
김포공항 1
 
3.4%
노량진 1
 
3.4%
봉은사 1
 
3.4%
삼성중앙 1
 
3.4%
선정릉 1
 
3.4%
언주 1
 
3.4%
신논현 1
 
3.4%
사평 1
 
3.4%
고속터미널 1
 
3.4%
신반포 1
 
3.4%
Other values (19) 19
65.5%
2023-12-12T14:00:21.588673image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4
 
4.4%
3
 
3.3%
3
 
3.3%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
Other values (59) 67
73.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 89
97.8%
Open Punctuation 1
 
1.1%
Close Punctuation 1
 
1.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
4
 
4.5%
3
 
3.4%
3
 
3.4%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
Other values (57) 65
73.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 89
97.8%
Common 2
 
2.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
4
 
4.5%
3
 
3.4%
3
 
3.4%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
Other values (57) 65
73.0%
Common
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 89
97.8%
ASCII 2
 
2.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
4
 
4.5%
3
 
3.4%
3
 
3.4%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
2
 
2.2%
Other values (57) 65
73.0%
ASCII
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

경도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct29
Distinct (%)100.0%
Missing5
Missing (%)14.7%
Infinite0
Infinite (%)0.0%
Mean126.92627
Minimum126.79815
Maximum127.06024
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size438.0 B
2023-12-12T14:00:21.742478image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.79815
5-th percentile126.80491
Q1126.86194
median126.92419
Q3126.99592
95-th percentile127.04941
Maximum127.06024
Range0.262092
Interquartile range (IQR)0.133986

Descriptive statistics

Standard deviation0.082446978
Coefficient of variation (CV)0.0006495659
Kurtosis-1.2455454
Mean126.92627
Median Absolute Deviation (MAD)0.069735
Skewness0.047027533
Sum3680.8619
Variance0.0067975042
MonotonicityStrictly increasing
2023-12-12T14:00:21.891840image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
126.801058 1
 
2.9%
127.060245 1
 
2.9%
127.053282 1
 
2.9%
127.043593 1
 
2.9%
127.033868 1
 
2.9%
127.02506 1
 
2.9%
127.015259 1
 
2.9%
127.004943 1
 
2.9%
126.995925 1
 
2.9%
126.987332 1
 
2.9%
Other values (19) 19
55.9%
(Missing) 5
 
14.7%
ValueCountFrequency (%)
126.798153 1
2.9%
126.801058 1
2.9%
126.810678 1
2.9%
126.816601 1
2.9%
126.829497 1
2.9%
126.841333 1
2.9%
126.854456 1
2.9%
126.861939 1
2.9%
126.865689 1
2.9%
126.874916 1
2.9%
ValueCountFrequency (%)
127.060245 1
2.9%
127.053282 1
2.9%
127.043593 1
2.9%
127.033868 1
2.9%
127.02506 1
2.9%
127.015259 1
2.9%
127.004943 1
2.9%
126.995925 1
2.9%
126.987332 1
2.9%
126.979306 1
2.9%

위도
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct28
Distinct (%)96.6%
Missing5
Missing (%)14.7%
Infinite0
Infinite (%)0.0%
Mean37.531407
Minimum37.501364
Maximum37.578608
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size438.0 B
2023-12-12T14:00:22.038506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum37.501364
5-th percentile37.503149
Q137.50877
median37.521624
Q337.557402
95-th percentile37.568041
Maximum37.578608
Range0.077244
Interquartile range (IQR)0.048632

Descriptive statistics

Standard deviation0.025601819
Coefficient of variation (CV)0.00068214387
Kurtosis-1.4355069
Mean37.531407
Median Absolute Deviation (MAD)0.017418
Skewness0.41091336
Sum1088.4108
Variance0.00065545314
MonotonicityNot monotonic
2023-12-12T14:00:22.168282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
37.514219 2
 
5.9%
37.517274 1
 
2.9%
37.513011 1
 
2.9%
37.51098 1
 
2.9%
37.507287 1
 
2.9%
37.504598 1
 
2.9%
37.504206 1
 
2.9%
37.50481 1
 
2.9%
37.503415 1
 
2.9%
37.501364 1
 
2.9%
Other values (18) 18
52.9%
(Missing) 5
 
14.7%
ValueCountFrequency (%)
37.501364 1
2.9%
37.502971 1
2.9%
37.503415 1
2.9%
37.504206 1
2.9%
37.504598 1
2.9%
37.50481 1
2.9%
37.507287 1
2.9%
37.50877 1
2.9%
37.51098 1
2.9%
37.512887 1
2.9%
ValueCountFrequency (%)
37.578608 1
2.9%
37.568381 1
2.9%
37.567532 1
2.9%
37.567336 1
2.9%
37.563726 1
2.9%
37.562434 1
2.9%
37.561391 1
2.9%
37.557402 1
2.9%
37.550632 1
2.9%
37.546936 1
2.9%

Interactions

2023-12-12T14:00:19.516683image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:00:19.282798image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:00:19.648450image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T14:00:19.400321image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T14:00:22.256922image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역명경도위도
역명1.0001.0001.000
경도1.0001.0000.921
위도1.0000.9211.000
2023-12-12T14:00:22.339844image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명선명
철도운영기관명1.0001.000
선명1.0001.000
2023-12-12T14:00:22.419933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
경도위도철도운영기관명선명
경도1.000-0.8851.0001.000
위도-0.8851.0001.0001.000
철도운영기관명1.0001.0001.0001.000
선명1.0001.0001.0001.000

Missing values

2023-12-12T14:00:19.805171image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T14:00:20.288505image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-12T14:00:20.421057image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

철도운영기관명선명역명경도위도
0서울9호선9호선개화126.79815337.578608
1서울9호선9호선김포공항126.80105837.562434
2서울9호선9호선공항시장126.81067837.563726
3서울9호선9호선신방화126.81660137.567532
4서울9호선9호선마곡나루126.82949737.567336
5서울9호선9호선양천향교126.84133337.568381
6서울9호선9호선가양126.85445637.561391
7서울9호선9호선증미126.86193937.557402
8서울9호선9호선등촌126.86568937.550632
9서울9호선9호선염창126.87491637.546936
철도운영기관명선명역명경도위도
24서울9호선9호선신논현127.0250637.504598
25서울9호선9호선언주127.03386837.507287
26서울9호선9호선선정릉127.04359337.51098
27서울9호선9호선삼성중앙127.05328237.513011
28서울9호선9호선봉은사127.06024537.514219
29<NA><NA><NA><NA><NA>
30<NA><NA><NA><NA><NA>
31<NA><NA><NA><NA><NA>
32<NA><NA><NA><NA><NA>
33<NA><NA><NA><NA><NA>

Duplicate rows

Most frequently occurring

철도운영기관명선명역명경도위도# duplicates
0<NA><NA><NA><NA><NA>5