Overview

Dataset statistics

Number of variables5
Number of observations66
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.7 KiB
Average record size in memory42.0 B

Variable types

Categorical2
Text3

Dataset

Description한국철도공사에서 관리하는 전국 고속철도역들의 철도운영기관명, 선명, 역명, 지번주소, 도로명주소의 데이터가 있습니다.
Author국가철도공단
URLhttps://www.data.go.kr/data/15096786/fileData.do

Alerts

철도운영기관명 is highly overall correlated with 선명High correlation
선명 is highly overall correlated with 철도운영기관명High correlation
철도운영기관명 is highly imbalanced (73.3%)Imbalance
역명 has unique valuesUnique
지번주소 has unique valuesUnique

Reproduction

Analysis started2023-12-12 20:13:25.278736
Analysis finished2023-12-12 20:13:25.843099
Duration0.56 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

철도운영기관명
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
한국철도공사
63 
주식회사 에스알
 
3

Length

Max length8
Median length6
Mean length6.0909091
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row한국철도공사
2nd row한국철도공사
3rd row한국철도공사
4th row한국철도공사
5th row한국철도공사

Common Values

ValueCountFrequency (%)
한국철도공사 63
95.5%
주식회사 에스알 3
 
4.5%

Length

2023-12-13T05:13:25.941861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T05:13:26.085727image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
한국철도공사 63
91.3%
주식회사 3
 
4.3%
에스알 3
 
4.3%

선명
Categorical

HIGH CORRELATION 

Distinct15
Distinct (%)22.7%
Missing0
Missing (%)0.0%
Memory size660.0 B
경부선
14 
호남선
중앙선
전라선
강릉선
Other values (10)
22 

Length

Max length7
Median length3
Mean length3.3333333
Min length3

Unique

Unique4 ?
Unique (%)6.1%

Sample

1st row경부선
2nd row경의선
3rd row경부선
4th row경부고속
5th row경부선

Common Values

ValueCountFrequency (%)
경부선 14
21.2%
호남선 8
12.1%
중앙선 8
12.1%
전라선 7
10.6%
강릉선 7
10.6%
경전선 5
 
7.6%
영동선 3
 
4.5%
중부내륙선 3
 
4.5%
수서평택고속선 3
 
4.5%
경부고속 2
 
3.0%
Other values (5) 6
9.1%

Length

2023-12-13T05:13:26.223298image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
경부선 14
21.2%
호남선 8
12.1%
중앙선 8
12.1%
전라선 7
10.6%
강릉선 7
10.6%
경전선 5
 
7.6%
영동선 3
 
4.5%
중부내륙선 3
 
4.5%
수서평택고속선 3
 
4.5%
경부고속 2
 
3.0%
Other values (5) 6
9.1%

역명
Text

UNIQUE 

Distinct66
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
2023-12-13T05:13:26.521960image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length2
Mean length2.4545455
Min length2

Characters and Unicode

Total characters162
Distinct characters82
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)100.0%

Sample

1st row서울
2nd row행신
3rd row영등포
4th row광명
5th row수원
ValueCountFrequency (%)
서울 1
 
1.5%
정동진 1
 
1.5%
지제 1
 
1.5%
구례구 1
 
1.5%
순천 1
 
1.5%
여천 1
 
1.5%
여수엑스포 1
 
1.5%
청량리 1
 
1.5%
상봉 1
 
1.5%
양평 1
 
1.5%
Other values (56) 56
84.8%
2023-12-13T05:13:27.070387image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
10
 
6.2%
8
 
4.9%
7
 
4.3%
6
 
3.7%
6
 
3.7%
5
 
3.1%
5
 
3.1%
5
 
3.1%
5
 
3.1%
4
 
2.5%
Other values (72) 101
62.3%

Most occurring categories

ValueCountFrequency (%)
Other Letter 160
98.8%
Open Punctuation 1
 
0.6%
Close Punctuation 1
 
0.6%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10
 
6.2%
8
 
5.0%
7
 
4.4%
6
 
3.8%
6
 
3.8%
5
 
3.1%
5
 
3.1%
5
 
3.1%
5
 
3.1%
4
 
2.5%
Other values (70) 99
61.9%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 160
98.8%
Common 2
 
1.2%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10
 
6.2%
8
 
5.0%
7
 
4.4%
6
 
3.8%
6
 
3.8%
5
 
3.1%
5
 
3.1%
5
 
3.1%
5
 
3.1%
4
 
2.5%
Other values (70) 99
61.9%
Common
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 160
98.8%
ASCII 2
 
1.2%

Most frequent character per block

Hangul
ValueCountFrequency (%)
10
 
6.2%
8
 
5.0%
7
 
4.4%
6
 
3.8%
6
 
3.8%
5
 
3.1%
5
 
3.1%
5
 
3.1%
5
 
3.1%
4
 
2.5%
Other values (70) 99
61.9%
ASCII
ValueCountFrequency (%)
( 1
50.0%
) 1
50.0%

지번주소
Text

UNIQUE 

Distinct66
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size660.0 B
2023-12-13T05:13:27.467404image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length23
Mean length19.863636
Min length14

Characters and Unicode

Total characters1311
Distinct characters149
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique66 ?
Unique (%)100.0%

Sample

1st row서울특별시 용산구 동자동 43-205
2nd row경기도 고양시 덕양구 행신동 812
3rd row서울특별시 영등포구 영등포동 618-496
4th row경기도 광명시 일직동 276-1
5th row경기도 수원시 팔달구 매산로1가 18
ValueCountFrequency (%)
강원도 11
 
3.7%
경기도 8
 
2.7%
전라남도 7
 
2.3%
서울특별시 6
 
2.0%
경상남도 6
 
2.0%
경상북도 6
 
2.0%
충청북도 5
 
1.7%
충청남도 4
 
1.3%
전라북도 4
 
1.3%
원주시 3
 
1.0%
Other values (218) 238
79.9%
2023-12-13T05:13:28.067369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
232
 
17.7%
1 64
 
4.9%
58
 
4.4%
51
 
3.9%
50
 
3.8%
- 50
 
3.8%
3 35
 
2.7%
2 26
 
2.0%
6 26
 
2.0%
0 26
 
2.0%
Other values (139) 693
52.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 757
57.7%
Decimal Number 272
 
20.7%
Space Separator 232
 
17.7%
Dash Punctuation 50
 
3.8%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
7.7%
51
 
6.7%
50
 
6.6%
25
 
3.3%
24
 
3.2%
23
 
3.0%
21
 
2.8%
21
 
2.8%
20
 
2.6%
18
 
2.4%
Other values (127) 446
58.9%
Decimal Number
ValueCountFrequency (%)
1 64
23.5%
3 35
12.9%
2 26
9.6%
6 26
9.6%
0 26
9.6%
4 23
 
8.5%
7 22
 
8.1%
9 20
 
7.4%
8 16
 
5.9%
5 14
 
5.1%
Space Separator
ValueCountFrequency (%)
232
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 50
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 757
57.7%
Common 554
42.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
7.7%
51
 
6.7%
50
 
6.6%
25
 
3.3%
24
 
3.2%
23
 
3.0%
21
 
2.8%
21
 
2.8%
20
 
2.6%
18
 
2.4%
Other values (127) 446
58.9%
Common
ValueCountFrequency (%)
232
41.9%
1 64
 
11.6%
- 50
 
9.0%
3 35
 
6.3%
2 26
 
4.7%
6 26
 
4.7%
0 26
 
4.7%
4 23
 
4.2%
7 22
 
4.0%
9 20
 
3.6%
Other values (2) 30
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 757
57.7%
ASCII 554
42.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
232
41.9%
1 64
 
11.6%
- 50
 
9.0%
3 35
 
6.3%
2 26
 
4.7%
6 26
 
4.7%
0 26
 
4.7%
4 23
 
4.2%
7 22
 
4.0%
9 20
 
3.6%
Other values (2) 30
 
5.4%
Hangul
ValueCountFrequency (%)
58
 
7.7%
51
 
6.7%
50
 
6.6%
25
 
3.3%
24
 
3.2%
23
 
3.0%
21
 
2.8%
21
 
2.8%
20
 
2.6%
18
 
2.4%
Other values (127) 446
58.9%
Distinct65
Distinct (%)98.5%
Missing0
Missing (%)0.0%
Memory size660.0 B
2023-12-13T05:13:28.446445image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length26
Median length22
Mean length18.409091
Min length1

Characters and Unicode

Total characters1215
Distinct characters149
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique64 ?
Unique (%)97.0%

Sample

1st row서울특별시 용산구 한강대로 405(동자동)
2nd row경기도 고양시 덕양구 소원로 102
3rd row서울특별시 영등포구 경인로 846
4th row경기도 광명시 광명역로 21(일직동)
5th row경기도 수원시 팔달구 덕영대로 924
ValueCountFrequency (%)
강원도 11
 
3.8%
경기도 7
 
2.4%
전라남도 7
 
2.4%
경상북도 6
 
2.1%
서울특별시 6
 
2.1%
경상남도 6
 
2.1%
충청북도 5
 
1.7%
충청남도 4
 
1.4%
전라북도 4
 
1.4%
원주시 3
 
1.0%
Other values (207) 229
79.5%
2023-12-13T05:13:28.966358image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
226
 
18.6%
58
 
4.8%
57
 
4.7%
51
 
4.2%
1 41
 
3.4%
2 29
 
2.4%
25
 
2.1%
25
 
2.1%
24
 
2.0%
23
 
1.9%
Other values (139) 656
54.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 786
64.7%
Space Separator 226
 
18.6%
Decimal Number 180
 
14.8%
Close Punctuation 9
 
0.7%
Open Punctuation 9
 
0.7%
Dash Punctuation 5
 
0.4%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
58
 
7.4%
57
 
7.3%
51
 
6.5%
25
 
3.2%
25
 
3.2%
24
 
3.1%
23
 
2.9%
23
 
2.9%
19
 
2.4%
18
 
2.3%
Other values (125) 463
58.9%
Decimal Number
ValueCountFrequency (%)
1 41
22.8%
2 29
16.1%
0 22
12.2%
5 18
10.0%
6 14
 
7.8%
3 14
 
7.8%
9 13
 
7.2%
8 12
 
6.7%
7 11
 
6.1%
4 6
 
3.3%
Space Separator
ValueCountFrequency (%)
226
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Open Punctuation
ValueCountFrequency (%)
( 9
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 786
64.7%
Common 429
35.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
58
 
7.4%
57
 
7.3%
51
 
6.5%
25
 
3.2%
25
 
3.2%
24
 
3.1%
23
 
2.9%
23
 
2.9%
19
 
2.4%
18
 
2.3%
Other values (125) 463
58.9%
Common
ValueCountFrequency (%)
226
52.7%
1 41
 
9.6%
2 29
 
6.8%
0 22
 
5.1%
5 18
 
4.2%
6 14
 
3.3%
3 14
 
3.3%
9 13
 
3.0%
8 12
 
2.8%
7 11
 
2.6%
Other values (4) 29
 
6.8%

Most occurring blocks

ValueCountFrequency (%)
Hangul 786
64.7%
ASCII 429
35.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
226
52.7%
1 41
 
9.6%
2 29
 
6.8%
0 22
 
5.1%
5 18
 
4.2%
6 14
 
3.3%
3 14
 
3.3%
9 13
 
3.0%
8 12
 
2.8%
7 11
 
2.6%
Other values (4) 29
 
6.8%
Hangul
ValueCountFrequency (%)
58
 
7.4%
57
 
7.3%
51
 
6.5%
25
 
3.2%
25
 
3.2%
24
 
3.1%
23
 
2.9%
23
 
2.9%
19
 
2.4%
18
 
2.3%
Other values (125) 463
58.9%

Correlations

2023-12-13T05:13:29.097615image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명선명역명지번주소도로명주소
철도운영기관명1.0001.0001.0001.0001.000
선명1.0001.0001.0001.0001.000
역명1.0001.0001.0001.0001.000
지번주소1.0001.0001.0001.0001.000
도로명주소1.0001.0001.0001.0001.000
2023-12-13T05:13:29.218371image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명선명
철도운영기관명1.0000.893
선명0.8931.000
2023-12-13T05:13:29.329181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
철도운영기관명선명
철도운영기관명1.0000.893
선명0.8931.000

Missing values

2023-12-13T05:13:25.663958image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T05:13:25.800459image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

철도운영기관명선명역명지번주소도로명주소
0한국철도공사경부선서울서울특별시 용산구 동자동 43-205서울특별시 용산구 한강대로 405(동자동)
1한국철도공사경의선행신경기도 고양시 덕양구 행신동 812경기도 고양시 덕양구 소원로 102
2한국철도공사경부선영등포서울특별시 영등포구 영등포동 618-496서울특별시 영등포구 경인로 846
3한국철도공사경부고속광명경기도 광명시 일직동 276-1경기도 광명시 광명역로 21(일직동)
4한국철도공사경부선수원경기도 수원시 팔달구 매산로1가 18경기도 수원시 팔달구 덕영대로 924
5한국철도공사경부고속천안아산충청남도 아산시 배방읍 장재리 364-4충청남도 아산시 배방읍 희망로 100
6한국철도공사충북선오송충청북도 청주시 흥덕구 오송읍 봉산리 370-31충청북도 청주시 흥덕구 오송읍 오송가락로 123
7한국철도공사경부선대전대전광역시 동구 중동 317대전광역시 동구 중앙로 218
8한국철도공사경부선김천구미경상북도 김천시 남면 옥산리 787-1경상북도 김천시 남면 혁신1로 51
9한국철도공사경부선서대구대구광역시 서구 이현동 232-1대구광역시 서구 와룡로 527
철도운영기관명선명역명지번주소도로명주소
56한국철도공사중앙선영주경상북도 영주시 휴천동 257경상북도 영주시 선비로 64
57한국철도공사중앙선안동경상북도 안동시 송현동 646-1경상북도 안동시 경동로 122-16
58한국철도공사강릉선부발경기도 이천시 부발읍 아미리 505-7경기도 이천시 부발읍 신아로 87
59한국철도공사중부내륙선가남경기도 여주시 가남읍 태평리 516
60한국철도공사중부내륙선감곡장호원충북 음성군 감곡면 왕장리 312-2
61한국철도공사중부내륙선앙성온천충청북도 충주시 앙성면 돈산리 317충청북도 충주시 앙성면 가곡로 1390-22
62한국철도공사충북선충주충청북도 충주시 봉방동 409충청북도 충주시 충원대로 539
63주식회사 에스알수서평택고속선수서서울특별시 강남구 수서동 214-3서울특별시 강남구 밤고개로 99
64주식회사 에스알수서평택고속선지제경기도 평택시 지제동 202-6경기도 평택시 지제로 21
65주식회사 에스알수서평택고속선동탄경기도 화성시 오산동 967-164경기도 화성시 동탄역로 지하 151