Overview

Dataset statistics

Number of variables6
Number of observations56
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.8 KiB
Average record size in memory50.3 B

Variable types

Categorical6

Dataset

Description역명(한글),역명(영문),역명(로마자),역명(일본어),역명(중국어간체),역명(중국어번체) 등의 정보를 제공
Author국가철도공단
URLhttps://www.data.go.kr/data/15064043/fileData.do

Alerts

역명 has a high cardinality: 56 distinct values High cardinality
역명(영문) has a high cardinality: 56 distinct values High cardinality
역명(로마자) has a high cardinality: 56 distinct values High cardinality
역명(일본어) has a high cardinality: 56 distinct values High cardinality
역명(중국어 간체) has a high cardinality: 56 distinct values High cardinality
역명(중국어 번체) has a high cardinality: 56 distinct values High cardinality
역명(중국어 간체) is highly correlated with 역명 and 4 other fieldsHigh correlation
역명 is highly correlated with 역명(중국어 간체) and 4 other fieldsHigh correlation
역명(로마자) is highly correlated with 역명(중국어 간체) and 4 other fieldsHigh correlation
역명(중국어 번체) is highly correlated with 역명(중국어 간체) and 4 other fieldsHigh correlation
역명(일본어) is highly correlated with 역명(중국어 간체) and 4 other fieldsHigh correlation
역명(영문) is highly correlated with 역명(중국어 간체) and 4 other fieldsHigh correlation
역명 has unique values Unique
역명(영문) has unique values Unique
역명(로마자) has unique values Unique
역명(일본어) has unique values Unique
역명(중국어 간체) has unique values Unique
역명(중국어 번체) has unique values Unique

Reproduction

Analysis started2023-02-18 08:47:57.244795
Analysis finished2023-02-18 08:47:58.020980
Duration0.78 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

역명
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
방화
 
1
개화산
 
1
영등포구청
 
1
김포공항
 
1
송정
 
1
Other values (51)
51 

Length

Max length13
Median length11
Mean length4.357142857
Min length2

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st row방화
2nd row개화산
3rd row김포공항
4th row송정
5th row마곡

Common Values

ValueCountFrequency (%)
방화1
 
1.8%
개화산1
 
1.8%
영등포구청1
 
1.8%
김포공항1
 
1.8%
송정1
 
1.8%
마곡1
 
1.8%
발산1
 
1.8%
우장산1
 
1.8%
화곡1
 
1.8%
까치산1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:58.092057image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
방화1
 
1.8%
개화산1
 
1.8%
명일1
 
1.8%
왕십리1
 
1.8%
마장1
 
1.8%
답십리1
 
1.8%
장한평1
 
1.8%
군자(능동1
 
1.8%
아차산(어린이대공원후문1
 
1.8%
광나루(장신대1
 
1.8%
Other values (46)46
82.1%

역명(영문)
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
Banghwa
 
1
Gaehwasan
 
1
Yeongdeungpo-gu Office
 
1
Gimpo Int'l Airport
 
1
Songjeong
 
1
Other values (51)
51 

Length

Max length55
Median length40
Mean length14.875
Min length4

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st rowBanghwa
2nd rowGaehwasan
3rd rowGimpo Int'l Airport
4th rowSongjeong
5th rowMagok

Common Values

ValueCountFrequency (%)
Banghwa1
 
1.8%
Gaehwasan1
 
1.8%
Yeongdeungpo-gu Office1
 
1.8%
Gimpo Int'l Airport1
 
1.8%
Songjeong1
 
1.8%
Magok1
 
1.8%
Balsan1
 
1.8%
Ujangsan1
 
1.8%
Hwagok1
 
1.8%
Kkachisan1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:58.238592image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
park4
 
3.8%
hanam3
 
2.9%
3
 
2.9%
gangdong2
 
1.9%
univ2
 
1.9%
center2
 
1.9%
mokdong2
 
1.9%
cheonho1
 
1.0%
seminary1
 
1.0%
theological1
 
1.0%
Other values (84)84
80.0%

역명(로마자)
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
Banghwa
 
1
Gaehwasan
 
1
Yeongdeungpo-gu Office
 
1
Gimpo Int'l Airport
 
1
Songjeong
 
1
Other values (51)
51 

Length

Max length36
Median length28
Mean length13.30357143
Min length4

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st rowBanghwa
2nd rowGaehwasan
3rd rowGimpo Int'l Airport
4th rowSongjeong
5th rowMagok

Common Values

ValueCountFrequency (%)
Banghwa1
 
1.8%
Gaehwasan1
 
1.8%
Yeongdeungpo-gu Office1
 
1.8%
Gimpo Int'l Airport1
 
1.8%
Songjeong1
 
1.8%
Magok1
 
1.8%
Balsan1
 
1.8%
Ujangsan1
 
1.8%
Hwagok1
 
1.8%
Kkachisan1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:58.396276image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hanam3
 
4.1%
park2
 
2.7%
banghwa1
 
1.4%
gil-dong1
 
1.4%
cheonho(pungnaptoseong1
 
1.4%
sem1
 
1.4%
1
 
1.4%
college1
 
1.4%
gwangnaru(presby1
 
1.4%
achasan(eorinidaegongwonhumun1
 
1.4%
Other values (60)60
82.2%

역명(일본어)
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
パンファ
 
1
ケファサン
 
1
ヨンドンポグチョン
 
1
キンポゴンハン
 
1
ソンジョン
 
1
Other values (51)
51 

Length

Max length30
Median length18
Mean length6
Min length2

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st rowパンファ
2nd rowケファサン
3rd rowキンポゴンハン
4th rowソンジョン
5th rowマゴク

Common Values

ValueCountFrequency (%)
パンファ1
 
1.8%
ケファサン1
 
1.8%
ヨンドンポグチョン1
 
1.8%
キンポゴンハン1
 
1.8%
ソンジョン1
 
1.8%
マゴク1
 
1.8%
パルサン1
 
1.8%
ウジャンサン1
 
1.8%
ファゴク1
 
1.8%
カチサン1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:58.546921image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
パンファ1
 
1.8%
ケファサン1
 
1.8%
ミョンイル1
 
1.8%
ワンシムニ1
 
1.8%
マジャン1
 
1.8%
タプシムニ1
 
1.8%
チャンハンピョン1
 
1.8%
クンジャ1
 
1.8%
アチャサン1
 
1.8%
クァンナル1
 
1.8%
Other values (46)46
82.1%

역명(중국어 간체)
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
傍花
 
1
开花山
 
1
永登浦区厅
 
1
金浦机场
 
1
松亭
 
1
Other values (51)
51 

Length

Max length14
Median length11
Mean length3.732142857
Min length2

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st row傍花
2nd row开花山
3rd row金浦机场
4th row松亭
5th row麻谷

Common Values

ValueCountFrequency (%)
傍花1
 
1.8%
开花山1
 
1.8%
永登浦区厅1
 
1.8%
金浦机场1
 
1.8%
松亭1
 
1.8%
麻谷1
 
1.8%
钵山1
 
1.8%
雨裝山1
 
1.8%
禾谷1
 
1.8%
喜鹊山1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:58.689416image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
傍花1
 
1.8%
开花山1
 
1.8%
明逸1
 
1.8%
往十里1
 
1.8%
马场1
 
1.8%
踏十里1
 
1.8%
长汉坪1
 
1.8%
君子(陵洞1
 
1.8%
峨嵯山1
 
1.8%
广渡口(长神大学1
 
1.8%
Other values (46)46
82.1%

역명(중국어 번체)
Categorical

HIGH CARDINALITY
HIGH CORRELATION
UNIQUE

Distinct56
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size576.0 B
傍花
 
1
開花山
 
1
永登浦區廳
 
1
金浦空港
 
1
松亭
 
1
Other values (51)
51 

Length

Max length13
Median length11
Mean length4.357142857
Min length2

Unique

Unique56 ?
Unique (%)100.0%

Sample

1st row傍花
2nd row開花山
3rd row金浦空港
4th row松亭
5th row麻谷

Common Values

ValueCountFrequency (%)
傍花1
 
1.8%
開花山1
 
1.8%
永登浦區廳1
 
1.8%
金浦空港1
 
1.8%
松亭1
 
1.8%
麻谷1
 
1.8%
鉢山1
 
1.8%
雨裝山1
 
1.8%
禾谷1
 
1.8%
까치山1
 
1.8%
Other values (46)46
82.1%

Length

2023-02-18T17:47:59.000346image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
傍花1
 
1.8%
開花山1
 
1.8%
明逸1
 
1.8%
往十里1
 
1.8%
馬場1
 
1.8%
踏十里1
 
1.8%
長漢坪1
 
1.8%
君子(陵洞1
 
1.8%
峨嵯山(어린이大公園後門1
 
1.8%
광나루(長神大1
 
1.8%
Other values (46)46
82.1%

Correlations

2023-02-18T17:47:59.106693image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2023-02-18T17:47:59.252715image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2023-02-18T17:47:57.788618image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-02-18T17:47:57.953848image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
0방화BanghwaBanghwaパンファ傍花傍花
1개화산GaehwasanGaehwasanケファサン开花山開花山
2김포공항Gimpo Int'l AirportGimpo Int'l Airportキンポゴンハン金浦机场金浦空港
3송정SongjeongSongjeongソンジョン松亭松亭
4마곡MagokMagokマゴク麻谷麻谷
5발산BalsanBalsanパルサン钵山鉢山
6우장산UjangsanUjangsanウジャンサン雨裝山雨裝山
7화곡HwagokHwagokファゴク禾谷禾谷
8까치산KkachisanKkachisanカチサン喜鹊山까치山
9신정(은행정)Sinjeong (Eunhaengjeong)Sinjeong(Eunhaengjeong)シンジョン新亭新亭(銀杏亭)

Last rows

역명역명(영문)역명(로마자)역명(일본어)역명(중국어 간체)역명(중국어 번체)
46하남풍산Hanam PungsanHanam Pungsanハナムプンサン(河南豊山)河南丰山河南豊山
47하남시청(덕풍·신장)Hanam City Hall(Deokpung·Sinjang)Hanam Sicheong(Deokpung·Sinjang)ハナムシチョン-ドクプン·シンジャん(河南市庁-德豊·新長)河南市庁(德丰·新长)河南市廳(德豊·新長)
48하남검단산Hanam GeomdansanHanam Geomdansanハナムゴムダンサン(河南黔丹山)河南黔丹山河南黔丹山
49둔촌동DunchondongDunchon-dongトゥンチョンドン遁村洞遁村洞
50올림픽공원(한국체대)Olympic Park (Korea National Sport Univ.)Olympic park(Hangukchedae)オリンピック·コンウォン奥林匹克公园(韩国体育大学)올림픽公園(韓國體大)
51방이BangiBangパンイ芳荑芳荑
52오금OgeumOgeumオグム梧琴梧琴
53개롱GaerongGaerongケロン开笼開籠
54거여GeoyeoGeoyeoコヨ巨余巨余
55마천MacheonMacheonマチョン马川馬川