Overview

Dataset statistics

Number of variables3
Number of observations319
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.9 KiB
Average record size in memory25.4 B

Variable types

Numeric1
Text1
Categorical1

Dataset

Description한국철도공사 광역정보시스템(KOTRIS) 내 대중교통에 대한 코드 데이터입니다. R은 철도, B는 버스, T는 택시입니다.
URLhttps://www.data.go.kr/data/15121099/fileData.do

Alerts

교통수단코드 is highly overall correlated with 구분R철도B버스T택시High correlation
구분R철도B버스T택시 is highly overall correlated with 교통수단코드High correlation
구분R철도B버스T택시 is highly imbalanced (71.4%)Imbalance
교통수단코드 has unique valuesUnique

Reproduction

Analysis started2023-12-12 22:15:00.136055
Analysis finished2023-12-12 22:15:00.515456
Duration0.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

교통수단코드
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct319
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean585.32602
Minimum100
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.9 KiB
2023-12-13T07:15:00.589891image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile133.9
Q1488.5
median620
Q3703.5
95-th percentile823
Maximum999
Range899
Interquartile range (IQR)215

Descriptive statistics

Standard deviation195.45129
Coefficient of variation (CV)0.33391868
Kurtosis0.52712588
Mean585.32602
Median Absolute Deviation (MAD)106
Skewness-0.65193117
Sum186719
Variance38201.208
MonotonicityStrictly increasing
2023-12-13T07:15:00.719066image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 1
 
0.3%
101 1
 
0.3%
679 1
 
0.3%
678 1
 
0.3%
677 1
 
0.3%
676 1
 
0.3%
675 1
 
0.3%
674 1
 
0.3%
673 1
 
0.3%
672 1
 
0.3%
Other values (309) 309
96.9%
ValueCountFrequency (%)
100 1
0.3%
101 1
0.3%
102 1
0.3%
103 1
0.3%
104 1
0.3%
105 1
0.3%
106 1
0.3%
110 1
0.3%
115 1
0.3%
120 1
0.3%
ValueCountFrequency (%)
999 1
0.3%
998 1
0.3%
997 1
0.3%
996 1
0.3%
995 1
0.3%
994 1
0.3%
993 1
0.3%
992 1
0.3%
991 1
0.3%
990 1
0.3%

내용
Text

Distinct297
Distinct (%)93.1%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
2023-12-13T07:15:00.944073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length20
Median length17
Mean length7.2100313
Min length2

Characters and Unicode

Total characters2300
Distinct characters160
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique294 ?
Unique (%)92.2%

Sample

1st row버스
2nd row도시형버스
3rd row일반좌석버스
4th row고급좌석버스(심야좌석포함)
5th row구순환버스
ValueCountFrequency (%)
예비 19
 
5.9%
공항예비 4
 
1.2%
공항철도 3
 
0.9%
용인경전철 2
 
0.6%
대전급행버스 1
 
0.3%
천안좌석 1
 
0.3%
천안시외 1
 
0.3%
천안마을 1
 
0.3%
천안호서대버스 1
 
0.3%
아산일반 1
 
0.3%
Other values (288) 288
89.4%
2023-12-13T07:15:01.274915image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
) 111
 
4.8%
( 111
 
4.8%
0 94
 
4.1%
91
 
4.0%
91
 
4.0%
82
 
3.6%
82
 
3.6%
76
 
3.3%
65
 
2.8%
1 62
 
2.7%
Other values (150) 1435
62.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 1616
70.3%
Decimal Number 355
 
15.4%
Close Punctuation 111
 
4.8%
Open Punctuation 111
 
4.8%
Dash Punctuation 44
 
1.9%
Space Separator 35
 
1.5%
Other Punctuation 16
 
0.7%
Connector Punctuation 9
 
0.4%
Uppercase Letter 2
 
0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
91
 
5.6%
91
 
5.6%
82
 
5.1%
82
 
5.1%
76
 
4.7%
65
 
4.0%
51
 
3.2%
46
 
2.8%
45
 
2.8%
42
 
2.6%
Other values (129) 945
58.5%
Decimal Number
ValueCountFrequency (%)
0 94
26.5%
1 62
17.5%
5 40
11.3%
4 38
10.7%
3 33
 
9.3%
6 32
 
9.0%
2 22
 
6.2%
8 16
 
4.5%
7 9
 
2.5%
9 9
 
2.5%
Other Punctuation
ValueCountFrequency (%)
/ 14
87.5%
, 1
 
6.2%
: 1
 
6.2%
Uppercase Letter
ValueCountFrequency (%)
A 1
50.0%
B 1
50.0%
Close Punctuation
ValueCountFrequency (%)
) 111
100.0%
Open Punctuation
ValueCountFrequency (%)
( 111
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 44
100.0%
Space Separator
ValueCountFrequency (%)
35
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%
Math Symbol
ValueCountFrequency (%)
~ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 1616
70.3%
Common 682
29.7%
Latin 2
 
0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
91
 
5.6%
91
 
5.6%
82
 
5.1%
82
 
5.1%
76
 
4.7%
65
 
4.0%
51
 
3.2%
46
 
2.8%
45
 
2.8%
42
 
2.6%
Other values (129) 945
58.5%
Common
ValueCountFrequency (%)
) 111
16.3%
( 111
16.3%
0 94
13.8%
1 62
9.1%
- 44
 
6.5%
5 40
 
5.9%
4 38
 
5.6%
35
 
5.1%
3 33
 
4.8%
6 32
 
4.7%
Other values (9) 82
12.0%
Latin
ValueCountFrequency (%)
A 1
50.0%
B 1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 1616
70.3%
ASCII 684
29.7%

Most frequent character per block

ASCII
ValueCountFrequency (%)
) 111
16.2%
( 111
16.2%
0 94
13.7%
1 62
9.1%
- 44
 
6.4%
5 40
 
5.8%
4 38
 
5.6%
35
 
5.1%
3 33
 
4.8%
6 32
 
4.7%
Other values (11) 84
12.3%
Hangul
ValueCountFrequency (%)
91
 
5.6%
91
 
5.6%
82
 
5.1%
82
 
5.1%
76
 
4.7%
65
 
4.0%
51
 
3.2%
46
 
2.8%
45
 
2.8%
42
 
2.6%
Other values (129) 945
58.5%

구분R철도B버스T택시
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size2.6 KiB
B
294 
R
 
20
T
 
5

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowB
4th rowB
5th rowB

Common Values

ValueCountFrequency (%)
B 294
92.2%
R 20
 
6.3%
T 5
 
1.6%

Length

2023-12-13T07:15:01.421834image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-13T07:15:01.557309image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
b 294
92.2%
r 20
 
6.3%
t 5
 
1.6%

Interactions

2023-12-13T07:15:00.313282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-13T07:15:01.624324image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교통수단코드구분R철도B버스T택시
교통수단코드1.0000.953
구분R철도B버스T택시0.9531.000
2023-12-13T07:15:01.712563image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
교통수단코드구분R철도B버스T택시
교통수단코드1.0000.936
구분R철도B버스T택시0.9361.000

Missing values

2023-12-13T07:15:00.407201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-13T07:15:00.481306image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

교통수단코드내용구분R철도B버스T택시
0100버스B
1101도시형버스B
2102일반좌석버스B
3103고급좌석버스(심야좌석포함)B
4104구순환버스B
5105마을버스(105)B
6106공항버스B
7110주간선버스B
8115간선버스B
9120지선버스(120)B
교통수단코드내용구분R철도B버스T택시
309990제주시외버스B
310991제주공항리무진B
311992제주읍면순환버스B
312993제주시외예비B
313994제주시내일반B
314995제주시내좌석B
315996제주시내예비B
316997서귀포시내B
317998서귀포좌석B
318999서귀포예비B