Overview

Dataset statistics

Number of variables4
Number of observations504
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory16.4 KiB
Average record size in memory33.3 B

Variable types

Categorical2
Text2

Dataset

Description샘플 데이터
Author서울시(스마트카드사)
URLhttps://bigdata.seoul.go.kr/data/selectSampleData.do?sample_data_seq=13

Alerts

코드구분명(GBN_NM) is highly overall correlated with 코드구분(GBN_CD)High correlation
코드구분(GBN_CD) is highly overall correlated with 코드구분명(GBN_NM)High correlation
코드구분(GBN_CD) is highly imbalanced (62.1%)Imbalance
코드구분명(GBN_NM) is highly imbalanced (62.1%)Imbalance

Reproduction

Analysis started2024-01-14 06:50:15.371379
Analysis finished2024-01-14 06:50:15.725309
Duration0.35 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

코드구분(GBN_CD)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
3
424 
1
 
42
2
 
23
4
 
13
5
 
2

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
3 424
84.1%
1 42
 
8.3%
2 23
 
4.6%
4 13
 
2.6%
5 2
 
0.4%

Length

2024-01-14T15:50:15.840086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:50:15.962468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
3 424
84.1%
1 42
 
8.3%
2 23
 
4.6%
4 13
 
2.6%
5 2
 
0.4%

코드구분명(GBN_NM)
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
교통카드발행사
424 
교통수단코드
 
42
사용자구분코드
 
23
1회권사용자구분코드
 
13
1회권발행사ID
 
2

Length

Max length10
Median length7
Mean length6.9980159
Min length6

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row교통수단코드
2nd row교통수단코드
3rd row교통수단코드
4th row교통수단코드
5th row교통수단코드

Common Values

ValueCountFrequency (%)
교통카드발행사 424
84.1%
교통수단코드 42
 
8.3%
사용자구분코드 23
 
4.6%
1회권사용자구분코드 13
 
2.6%
1회권발행사ID 2
 
0.4%

Length

2024-01-14T15:50:16.117715image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-01-14T15:50:16.282753image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
교통카드발행사 424
84.1%
교통수단코드 42
 
8.3%
사용자구분코드 23
 
4.6%
1회권사용자구분코드 13
 
2.6%
1회권발행사id 2
 
0.4%
Distinct497
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
2024-01-14T15:50:16.601002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length7
Median length7
Mean length6.3095238
Min length2

Characters and Unicode

Total characters3180
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique490 ?
Unique (%)97.2%

Sample

1st row105
2nd row115
3rd row120
4th row121
5th row130
ValueCountFrequency (%)
c900001 2
 
0.4%
22 2
 
0.4%
23 2
 
0.4%
24 2
 
0.4%
13 2
 
0.4%
c900008 2
 
0.4%
21 2
 
0.4%
3119013 1
 
0.2%
3119002 1
 
0.2%
3119003 1
 
0.2%
Other values (487) 487
96.6%
2024-01-14T15:50:17.064381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 1113
35.0%
1 599
18.8%
3 400
 
12.6%
2 333
 
10.5%
9 158
 
5.0%
4 156
 
4.9%
5 154
 
4.8%
7 108
 
3.4%
6 85
 
2.7%
8 68
 
2.1%
Other values (3) 6
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3174
99.8%
Uppercase Letter 6
 
0.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 1113
35.1%
1 599
18.9%
3 400
 
12.6%
2 333
 
10.5%
9 158
 
5.0%
4 156
 
4.9%
5 154
 
4.9%
7 108
 
3.4%
6 85
 
2.7%
8 68
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
C 4
66.7%
S 1
 
16.7%
L 1
 
16.7%

Most occurring scripts

ValueCountFrequency (%)
Common 3174
99.8%
Latin 6
 
0.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 1113
35.1%
1 599
18.9%
3 400
 
12.6%
2 333
 
10.5%
9 158
 
5.0%
4 156
 
4.9%
5 154
 
4.9%
7 108
 
3.4%
6 85
 
2.7%
8 68
 
2.1%
Latin
ValueCountFrequency (%)
C 4
66.7%
S 1
 
16.7%
L 1
 
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 1113
35.0%
1 599
18.8%
3 400
 
12.6%
2 333
 
10.5%
9 158
 
5.0%
4 156
 
4.9%
5 154
 
4.8%
7 108
 
3.4%
6 85
 
2.7%
8 68
 
2.1%
Other values (3) 6
 
0.2%
Distinct182
Distinct (%)36.1%
Missing0
Missing (%)0.0%
Memory size4.1 KiB
2024-01-14T15:50:17.293672image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length24
Median length18
Mean length5.8293651
Min length2

Characters and Unicode

Total characters2938
Distinct characters193
Distinct categories10 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique158 ?
Unique (%)31.3%

Sample

1st row마을버스(105)
2nd row간선버스
3rd row지선버스(120)
4th row지선버스(121)
5th row광역버스(130)
ValueCountFrequency (%)
티머니 99
16.7%
레일플러스 70
 
11.8%
이비 43
 
7.3%
한페이시스 38
 
6.4%
카드 29
 
4.9%
조합 28
 
4.7%
선불 27
 
4.6%
경기도버스조합 18
 
3.0%
일반 10
 
1.7%
인천시버스조합 8
 
1.3%
Other values (173) 223
37.6%
2024-01-14T15:50:17.652796image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
168
 
5.7%
158
 
5.4%
121
 
4.1%
104
 
3.5%
103
 
3.5%
102
 
3.5%
102
 
3.5%
98
 
3.3%
89
 
3.0%
72
 
2.5%
Other values (183) 1821
62.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 2378
80.9%
Lowercase Letter 131
 
4.5%
Uppercase Letter 95
 
3.2%
Space Separator 89
 
3.0%
Decimal Number 86
 
2.9%
Connector Punctuation 64
 
2.2%
Close Punctuation 40
 
1.4%
Open Punctuation 40
 
1.4%
Dash Punctuation 14
 
0.5%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
168
 
7.1%
158
 
6.6%
121
 
5.1%
104
 
4.4%
103
 
4.3%
102
 
4.3%
102
 
4.3%
98
 
4.1%
72
 
3.0%
70
 
2.9%
Other values (131) 1280
53.8%
Uppercase Letter
ValueCountFrequency (%)
P 12
12.6%
C 11
11.6%
B 9
9.5%
S 9
9.5%
K 7
 
7.4%
M 6
 
6.3%
O 6
 
6.3%
T 6
 
6.3%
A 5
 
5.3%
Q 5
 
5.3%
Other values (9) 19
20.0%
Lowercase Letter
ValueCountFrequency (%)
a 39
29.8%
n 27
20.6%
y 17
13.0%
b 16
12.2%
d 6
 
4.6%
p 4
 
3.1%
e 4
 
3.1%
o 4
 
3.1%
m 3
 
2.3%
i 2
 
1.5%
Other values (7) 9
 
6.9%
Decimal Number
ValueCountFrequency (%)
1 32
37.2%
5 13
15.1%
0 8
 
9.3%
8 7
 
8.1%
4 7
 
8.1%
3 7
 
8.1%
2 6
 
7.0%
9 3
 
3.5%
7 2
 
2.3%
6 1
 
1.2%
Space Separator
ValueCountFrequency (%)
89
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 64
100.0%
Close Punctuation
ValueCountFrequency (%)
) 40
100.0%
Open Punctuation
ValueCountFrequency (%)
( 40
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 14
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 2378
80.9%
Common 334
 
11.4%
Latin 226
 
7.7%

Most frequent character per script

Hangul
ValueCountFrequency (%)
168
 
7.1%
158
 
6.6%
121
 
5.1%
104
 
4.4%
103
 
4.3%
102
 
4.3%
102
 
4.3%
98
 
4.1%
72
 
3.0%
70
 
2.9%
Other values (131) 1280
53.8%
Latin
ValueCountFrequency (%)
a 39
17.3%
n 27
 
11.9%
y 17
 
7.5%
b 16
 
7.1%
P 12
 
5.3%
C 11
 
4.9%
B 9
 
4.0%
S 9
 
4.0%
K 7
 
3.1%
M 6
 
2.7%
Other values (26) 73
32.3%
Common
ValueCountFrequency (%)
89
26.6%
_ 64
19.2%
) 40
12.0%
( 40
12.0%
1 32
 
9.6%
- 14
 
4.2%
5 13
 
3.9%
0 8
 
2.4%
8 7
 
2.1%
4 7
 
2.1%
Other values (6) 20
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 2378
80.9%
ASCII 560
 
19.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
168
 
7.1%
158
 
6.6%
121
 
5.1%
104
 
4.4%
103
 
4.3%
102
 
4.3%
102
 
4.3%
98
 
4.1%
72
 
3.0%
70
 
2.9%
Other values (131) 1280
53.8%
ASCII
ValueCountFrequency (%)
89
15.9%
_ 64
 
11.4%
) 40
 
7.1%
( 40
 
7.1%
a 39
 
7.0%
1 32
 
5.7%
n 27
 
4.8%
y 17
 
3.0%
b 16
 
2.9%
- 14
 
2.5%
Other values (42) 182
32.5%

Correlations

2024-01-14T15:50:17.747136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
코드구분(GBN_CD)코드구분명(GBN_NM)
코드구분(GBN_CD)1.0001.000
코드구분명(GBN_NM)1.0001.000
2024-01-14T15:50:17.839360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
코드구분명(GBN_NM)코드구분(GBN_CD)
코드구분명(GBN_NM)1.0001.000
코드구분(GBN_CD)1.0001.000
2024-01-14T15:50:17.925809image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
코드구분(GBN_CD)코드구분명(GBN_NM)
코드구분(GBN_CD)1.0001.000
코드구분명(GBN_NM)1.0001.000

Missing values

2024-01-14T15:50:15.598791image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-01-14T15:50:15.687823image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

코드구분(GBN_CD)코드구분명(GBN_NM)코드값(CODE)코드명(CODE_NM)
01교통수단코드105마을버스(105)
11교통수단코드115간선버스
21교통수단코드120지선버스(120)
31교통수단코드121지선버스(121)
41교통수단코드130광역버스(130)
51교통수단코드131심야버스(131)
61교통수단코드140순환버스
71교통수단코드142도심순환(142)
81교통수단코드201서울메트로
91교통수단코드202한국철도공사
코드구분(GBN_CD)코드구분명(GBN_NM)코드값(CODE)코드명(CODE_NM)
49441회권사용자구분코드23장애
49541회권사용자구분코드24동반무임
49641회권사용자구분코드41영어 일반
49741회권사용자구분코드42일어 일반
49841회권사용자구분코드43중국어 일반
49941회권사용자구분코드44영어 어린이
50041회권사용자구분코드45일어 어린이
50141회권사용자구분코드46중국어 어린이
50251회권발행사IDC900001코레일1회권
50351회권발행사IDC900008티머니