Overview

Dataset statistics

Number of variables4
Number of observations78
Missing cells1
Missing cells (%)0.3%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.6 KiB
Average record size in memory34.7 B

Variable types

Categorical2
Text1
Numeric1

Dataset

Description한국철도공사에서 관리하는 전력 시설물, 통신시설물, 전철전력, 일반전력, 신호제어 등 관리, 현황 정보가 있는 데이터
Author한국철도공사
URLhttps://www.data.go.kr/data/3071238/fileData.do

Alerts

수량 is highly overall correlated with 단위High correlation
단위 is highly overall correlated with 수량High correlation
단위 is highly imbalanced (70.9%)Imbalance
수량 has 1 (1.3%) missing valuesMissing
수량 has 19 (24.4%) zerosZeros

Reproduction

Analysis started2023-12-12 03:26:50.786393
Analysis finished2023-12-12 03:26:51.455796
Duration0.67 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시설물명
Categorical

Distinct17
Distinct (%)21.8%
Missing0
Missing (%)0.0%
Memory size756.0 B
역무자동화
13 
전화기
열차무선
기타
반송단국장치
Other values (12)
37 

Length

Max length8
Median length5
Mean length4.2307692
Min length2

Unique

Unique1 ?
Unique (%)1.3%

Sample

1st row전주
2nd row전주
3rd row전주
4th row전주
5th row전선

Common Values

ValueCountFrequency (%)
역무자동화 13
16.7%
전화기 8
10.3%
열차무선 7
9.0%
기타 7
9.0%
반송단국장치 6
 
7.7%
전선 5
 
6.4%
전신기 4
 
5.1%
전기시계설비 4
 
5.1%
여객자동안내장치 4
 
5.1%
전주 4
 
5.1%
Other values (7) 16
20.5%

Length

2023-12-12T12:26:51.577356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
역무자동화 13
16.7%
전화기 8
10.3%
열차무선 7
9.0%
기타 7
9.0%
반송단국장치 6
 
7.7%
전선 5
 
6.4%
전주 4
 
5.1%
여객자동안내장치 4
 
5.1%
전기시계설비 4
 
5.1%
전신기 4
 
5.1%
Other values (7) 16
20.5%
Distinct76
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size756.0 B
2023-12-12T12:26:51.958596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length17
Mean length7.1794872
Min length2

Characters and Unicode

Total characters560
Distinct characters144
Distinct categories7 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique74 ?
Unique (%)94.9%

Sample

1st row목주
2nd row철주
3rd row콘크리트주
4th row철탑
5th row나전선
ValueCountFrequency (%)
공전식 2
 
2.6%
표시기tdi 2
 
2.6%
trs_이동국 1
 
1.3%
목주 1
 
1.3%
trs_기지국 1
 
1.3%
교통카드system_무인정산기 1
 
1.3%
교통카드system_자동발매기pom 1
 
1.3%
교통카드system_자동발권기(1회권발권기 1
 
1.3%
열차자동방호장치 1
 
1.3%
trs_휴대용 1
 
1.3%
Other values (66) 66
84.6%
2023-12-12T12:26:52.537962image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
26
 
4.6%
_ 23
 
4.1%
S 22
 
3.9%
e 17
 
3.0%
m 16
 
2.9%
t 15
 
2.7%
14
 
2.5%
14
 
2.5%
14
 
2.5%
s 13
 
2.3%
Other values (134) 386
68.9%

Most occurring categories

ValueCountFrequency (%)
Other Letter 336
60.0%
Lowercase Letter 92
 
16.4%
Uppercase Letter 90
 
16.1%
Connector Punctuation 23
 
4.1%
Decimal Number 11
 
2.0%
Close Punctuation 4
 
0.7%
Open Punctuation 4
 
0.7%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
26
 
7.7%
14
 
4.2%
14
 
4.2%
14
 
4.2%
13
 
3.9%
11
 
3.3%
11
 
3.3%
10
 
3.0%
8
 
2.4%
8
 
2.4%
Other values (91) 207
61.6%
Uppercase Letter
ValueCountFrequency (%)
S 22
24.4%
T 10
11.1%
C 8
 
8.9%
M 7
 
7.8%
A 6
 
6.7%
F 5
 
5.6%
R 5
 
5.6%
H 4
 
4.4%
D 4
 
4.4%
P 3
 
3.3%
Other values (8) 16
17.8%
Lowercase Letter
ValueCountFrequency (%)
e 17
18.5%
m 16
17.4%
t 15
16.3%
s 13
14.1%
y 13
14.1%
a 3
 
3.3%
b 2
 
2.2%
h 2
 
2.2%
r 2
 
2.2%
l 2
 
2.2%
Other values (7) 7
7.6%
Decimal Number
ValueCountFrequency (%)
0 4
36.4%
1 3
27.3%
6 2
18.2%
8 1
 
9.1%
4 1
 
9.1%
Connector Punctuation
ValueCountFrequency (%)
_ 23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 336
60.0%
Latin 182
32.5%
Common 42
 
7.5%

Most frequent character per script

Hangul
ValueCountFrequency (%)
26
 
7.7%
14
 
4.2%
14
 
4.2%
14
 
4.2%
13
 
3.9%
11
 
3.3%
11
 
3.3%
10
 
3.0%
8
 
2.4%
8
 
2.4%
Other values (91) 207
61.6%
Latin
ValueCountFrequency (%)
S 22
 
12.1%
e 17
 
9.3%
m 16
 
8.8%
t 15
 
8.2%
s 13
 
7.1%
y 13
 
7.1%
T 10
 
5.5%
C 8
 
4.4%
M 7
 
3.8%
A 6
 
3.3%
Other values (25) 55
30.2%
Common
ValueCountFrequency (%)
_ 23
54.8%
0 4
 
9.5%
) 4
 
9.5%
( 4
 
9.5%
1 3
 
7.1%
6 2
 
4.8%
8 1
 
2.4%
4 1
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
Hangul 336
60.0%
ASCII 224
40.0%

Most frequent character per block

Hangul
ValueCountFrequency (%)
26
 
7.7%
14
 
4.2%
14
 
4.2%
14
 
4.2%
13
 
3.9%
11
 
3.3%
11
 
3.3%
10
 
3.0%
8
 
2.4%
8
 
2.4%
Other values (91) 207
61.6%
ASCII
ValueCountFrequency (%)
_ 23
 
10.3%
S 22
 
9.8%
e 17
 
7.6%
m 16
 
7.1%
t 15
 
6.7%
s 13
 
5.8%
y 13
 
5.8%
T 10
 
4.5%
C 8
 
3.6%
M 7
 
3.1%
Other values (33) 80
35.7%

단위
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Memory size756.0 B
69 
 
4
외장m
 
2
m
 
1
Suburbs
 
1

Length

Max length7
Median length1
Mean length1.1666667
Min length1

Unique

Unique3 ?
Unique (%)3.8%

Sample

1st row
2nd row
3rd row
4th row
5th rowm

Common Values

ValueCountFrequency (%)
69
88.5%
4
 
5.1%
외장m 2
 
2.6%
m 1
 
1.3%
Suburbs 1
 
1.3%
City 1
 
1.3%

Length

2023-12-12T12:26:52.763668image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:26:52.939131image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
69
88.5%
4
 
5.1%
외장m 2
 
2.6%
m 1
 
1.3%
suburbs 1
 
1.3%
city 1
 
1.3%

수량
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct55
Distinct (%)71.4%
Missing1
Missing (%)1.3%
Infinite0
Infinite (%)0.0%
Mean179091.36
Minimum0
Maximum5999102
Zeros19
Zeros (%)24.4%
Negative0
Negative (%)0.0%
Memory size834.0 B
2023-12-12T12:26:53.133475image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median262
Q31953
95-th percentile206496
Maximum5999102
Range5999102
Interquartile range (IQR)1952

Descriptive statistics

Standard deviation890961.36
Coefficient of variation (CV)4.9748985
Kurtosis32.207512
Mean179091.36
Median Absolute Deviation (MAD)262
Skewness5.6102539
Sum13790035
Variance7.9381214 × 1011
MonotonicityNot monotonic
2023-12-12T12:26:53.345277image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 19
24.4%
4 3
 
3.8%
3 2
 
2.6%
5 2
 
2.6%
419 1
 
1.3%
13857 1
 
1.3%
284 1
 
1.3%
775 1
 
1.3%
1581 1
 
1.3%
1874 1
 
1.3%
Other values (45) 45
57.7%
ValueCountFrequency (%)
0 19
24.4%
1 1
 
1.3%
2 1
 
1.3%
3 2
 
2.6%
4 3
 
3.8%
5 2
 
2.6%
6 1
 
1.3%
16 1
 
1.3%
22 1
 
1.3%
57 1
 
1.3%
ValueCountFrequency (%)
5999102 1
1.3%
4661310 1
1.3%
2044253 1
1.3%
949672 1
1.3%
20702 1
1.3%
18986 1
1.3%
13857 1
1.3%
13172 1
1.3%
11530 1
1.3%
9081 1
1.3%

Interactions

2023-12-12T12:26:51.094610image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T12:26:53.482985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설물명시설물세부명단위수량
시설물명1.0000.4630.7660.000
시설물세부명0.4631.0001.0001.000
단위0.7661.0001.0000.908
수량0.0001.0000.9081.000
2023-12-12T12:26:53.619093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
시설물명단위
시설물명1.0000.453
단위0.4531.000
2023-12-12T12:26:53.741630image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
수량시설물명단위
수량1.0000.0000.850
시설물명0.0001.0000.453
단위0.8500.4531.000

Missing values

2023-12-12T12:26:51.255988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T12:26:51.399866image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

시설물명시설물세부명단위수량
0전주목주0
1전주철주553
2전주콘크리트주0
3전주철탑57
4전선나전선m0
5전선케이블Cable(m)_시외Suburbs4661310
6전선케이블Cable(m)_시내City2044253
7전선광케이블외장m5999102
8전선LCX외장m949672
9전신기TT교환기0
시설물명시설물세부명단위수량
68여객자동안내장치운용자모니터104
69여객자동안내장치보수자모니터72
70여객자동안내장치표시기TDI2103
71열차행선안내장치주제어장치HSE3
72열차행선안내장치국부역장치LSE271
73열차행선안내장치표시기TDI2884
74전기시계설비모시계MMC3
75전기시계설비부모시계SMC16
76전기시계설비중계시계RMC5
77전기시계설비자시계SC730