Overview

Dataset statistics

Number of variables9
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory810.5 KiB
Average record size in memory83.0 B

Variable types

Text1
Categorical5
Numeric3

Dataset

Description개집표기를 통해 거래가 지불되지 않았던 미처리 현황에 대한 데이터이며,2006년 3월부터 2023 9월까지와 관련된 자료입니다.월별로 역사와 카드 종류에 따른 미처리 현황을 제공하고 있습니다.
Author대전교통공사
URLhttps://www.data.go.kr/data/15122861/fileData.do

Alerts

카드사구분 is highly overall correlated with 선후불구분High correlation
선후불구분 is highly overall correlated with 카드사구분High correlation
선후불구분 is highly imbalanced (52.0%)Imbalance
거래금액 is highly skewed (γ1 = 23.71684396)Skewed
선불잔액 is highly skewed (γ1 = 21.96614883)Skewed
거래금액 has 3223 (32.2%) zerosZeros
선불잔액 has 8938 (89.4%) zerosZeros
후불잔액 has 1341 (13.4%) zerosZeros

Reproduction

Analysis started2023-12-12 03:57:18.034114
Analysis finished2023-12-12 03:57:21.416125
Duration3.38 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct211
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T12:57:21.892286image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters60000
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJul-09
2nd rowJan-14
3rd rowJun-08
4th rowApr-06
5th rowMar-17
ValueCountFrequency (%)
nov-18 887
 
8.9%
feb-11 437
 
4.4%
jan-14 340
 
3.4%
aug-16 292
 
2.9%
nov-11 243
 
2.4%
jun-11 196
 
2.0%
mar-11 163
 
1.6%
mar-19 150
 
1.5%
dec-07 144
 
1.4%
may-07 135
 
1.4%
Other values (201) 7013
70.1%
2023-12-12T12:57:22.565428image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 10000
16.7%
1 7716
 
12.9%
0 3206
 
5.3%
a 2520
 
4.2%
J 2231
 
3.7%
e 2227
 
3.7%
2 2223
 
3.7%
u 2218
 
3.7%
8 2122
 
3.5%
r 1777
 
3.0%
Other values (23) 23760
39.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 20000
33.3%
Lowercase Letter 20000
33.3%
Dash Punctuation 10000
16.7%
Uppercase Letter 10000
16.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 2520
12.6%
e 2227
11.1%
u 2218
11.1%
r 1777
8.9%
o 1677
8.4%
v 1677
8.4%
n 1606
8.0%
p 1366
6.8%
c 1273
6.4%
b 946
 
4.7%
Other values (4) 2713
13.6%
Decimal Number
ValueCountFrequency (%)
1 7716
38.6%
0 3206
16.0%
2 2223
 
11.1%
8 2122
 
10.6%
9 1325
 
6.6%
7 1095
 
5.5%
6 934
 
4.7%
3 543
 
2.7%
4 504
 
2.5%
5 332
 
1.7%
Uppercase Letter
ValueCountFrequency (%)
J 2231
22.3%
N 1677
16.8%
M 1659
16.6%
A 1650
16.5%
F 946
9.5%
D 717
 
7.2%
S 564
 
5.6%
O 556
 
5.6%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 30000
50.0%
Latin 30000
50.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 2520
 
8.4%
J 2231
 
7.4%
e 2227
 
7.4%
u 2218
 
7.4%
r 1777
 
5.9%
N 1677
 
5.6%
o 1677
 
5.6%
v 1677
 
5.6%
M 1659
 
5.5%
A 1650
 
5.5%
Other values (12) 10687
35.6%
Common
ValueCountFrequency (%)
- 10000
33.3%
1 7716
25.7%
0 3206
 
10.7%
2 2223
 
7.4%
8 2122
 
7.1%
9 1325
 
4.4%
7 1095
 
3.6%
6 934
 
3.1%
3 543
 
1.8%
4 504
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 10000
16.7%
1 7716
 
12.9%
0 3206
 
5.3%
a 2520
 
4.2%
J 2231
 
3.7%
e 2227
 
3.7%
2 2223
 
3.7%
u 2218
 
3.7%
8 2122
 
3.5%
r 1777
 
3.0%
Other values (23) 23760
39.6%

역이름
Categorical

Distinct22
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
대전역
922 
유성온천역
695 
정부청사역
682 
서대전네거리역
643 
시청역
 
633
Other values (17)
6425 

Length

Max length7
Median length3
Mean length3.7749
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row현충원역
2nd row월드컵경기장역
3rd row용문역
4th row대동역
5th row대전역

Common Values

ValueCountFrequency (%)
대전역 922
 
9.2%
유성온천역 695
 
7.0%
정부청사역 682
 
6.8%
서대전네거리역 643
 
6.4%
시청역 633
 
6.3%
용문역 601
 
6.0%
탄방역 596
 
6.0%
중앙로역 593
 
5.9%
반석역 437
 
4.4%
월평역 435
 
4.3%
Other values (12) 3763
37.6%

Length

2023-12-12T12:57:22.817111image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
대전역 922
 
9.2%
유성온천역 695
 
7.0%
정부청사역 682
 
6.8%
서대전네거리역 643
 
6.4%
시청역 633
 
6.3%
용문역 601
 
6.0%
탄방역 596
 
6.0%
중앙로역 593
 
5.9%
반석역 437
 
4.4%
월평역 435
 
4.3%
Other values (12) 3763
37.6%

선후불구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
선불
8965 
후불
1035 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row선불
2nd row선불
3rd row선불
4th row선불
5th row선불

Common Values

ValueCountFrequency (%)
선불 8965
89.6%
후불 1035
 
10.3%

Length

2023-12-12T12:57:23.004183image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:57:23.172328image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
선불 8965
89.6%
후불 1035
 
10.3%

카드사구분
Categorical

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
교통카드운임(국민)
1894 
교통카드운임(BC)
1188 
교통카드운임(하나)
1046 
교통카드운임(신한)
1022 
교통카드운임(삼성)
952 
Other values (12)
3898 

Length

Max length11
Median length10
Mean length10.0094
Min length9

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row교통카드운임(국민)
2nd row교통카드운임(신한)
3rd row교통카드운임(LG)
4th row교통카드운임(BC)
5th row교통카드운임(신한)

Common Values

ValueCountFrequency (%)
교통카드운임(국민) 1894
18.9%
교통카드운임(BC) 1188
11.9%
교통카드운임(하나) 1046
10.5%
교통카드운임(신한) 1022
10.2%
교통카드운임(삼성) 952
9.5%
교통카드운임(현대) 601
 
6.0%
교통카드운임(농협) 592
 
5.9%
교통카드운임(LG) 581
 
5.8%
한꿈이충전(선불) 513
 
5.1%
교통카드운임(외환) 510
 
5.1%
Other values (7) 1101
11.0%

Length

2023-12-12T12:57:23.340725image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
교통카드운임(국민 1894
18.9%
교통카드운임(bc 1188
11.9%
교통카드운임(하나 1046
10.5%
교통카드운임(신한 1022
10.2%
교통카드운임(삼성 952
9.5%
교통카드운임(현대 601
 
6.0%
교통카드운임(농협 592
 
5.9%
교통카드운임(lg 581
 
5.8%
한꿈이충전(선불 513
 
5.1%
교통카드운임(외환 510
 
5.1%
Other values (7) 1101
11.0%

거래금액
Real number (ℝ)

SKEWED  ZEROS 

Distinct472
Distinct (%)4.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4370.9773
Minimum0
Maximum1345160
Zeros3223
Zeros (%)32.2%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T12:57:23.566827image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1100
Q31250
95-th percentile13750
Maximum1345160
Range1345160
Interquartile range (IQR)1250

Descriptive statistics

Standard deviation25363.355
Coefficient of variation (CV)5.8026738
Kurtosis967.93883
Mean4370.9773
Median Absolute Deviation (MAD)1100
Skewness23.716844
Sum43709773
Variance6.432998 × 108
MonotonicityNot monotonic
2023-12-12T12:57:23.807799image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 3223
32.2%
1250 2226
22.3%
950 600
 
6.0%
1100 554
 
5.5%
100 535
 
5.3%
2500 373
 
3.7%
1350 318
 
3.2%
200 124
 
1.2%
1900 118
 
1.2%
2200 99
 
1.0%
Other values (462) 1830
18.3%
ValueCountFrequency (%)
0 3223
32.2%
1 3
 
< 0.1%
80 8
 
0.1%
100 535
 
5.3%
150 10
 
0.1%
180 1
 
< 0.1%
200 124
 
1.2%
230 3
 
< 0.1%
250 4
 
< 0.1%
280 1
 
< 0.1%
ValueCountFrequency (%)
1345160 1
< 0.1%
867910 1
< 0.1%
445960 1
< 0.1%
432250 1
< 0.1%
354830 1
< 0.1%
345610 1
< 0.1%
336300 1
< 0.1%
329650 1
< 0.1%
328700 1
< 0.1%
320090 1
< 0.1%

선불잔액
Real number (ℝ)

SKEWED  ZEROS 

Distinct872
Distinct (%)8.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26315.325
Minimum0
Maximum11987730
Zeros8938
Zeros (%)89.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T12:57:24.002857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile28750
Maximum11987730
Range11987730
Interquartile range (IQR)0

Descriptive statistics

Standard deviation253011.28
Coefficient of variation (CV)9.6145984
Kurtosis740.4862
Mean26315.325
Median Absolute Deviation (MAD)0
Skewness21.966149
Sum2.6315325 × 108
Variance6.401471 × 1010
MonotonicityNot monotonic
2023-12-12T12:57:24.232985image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8938
89.4%
28750 37
 
0.4%
1800 8
 
0.1%
1250 6
 
0.1%
27500 6
 
0.1%
2500 5
 
0.1%
4050 5
 
0.1%
57500 5
 
0.1%
4800 4
 
< 0.1%
2100 4
 
< 0.1%
Other values (862) 982
 
9.8%
ValueCountFrequency (%)
0 8938
89.4%
8 1
 
< 0.1%
50 1
 
< 0.1%
140 2
 
< 0.1%
150 1
 
< 0.1%
200 2
 
< 0.1%
230 1
 
< 0.1%
250 1
 
< 0.1%
300 1
 
< 0.1%
340 1
 
< 0.1%
ValueCountFrequency (%)
11987730 1
< 0.1%
8771804 1
< 0.1%
5642659 1
< 0.1%
5075220 1
< 0.1%
3976253 1
< 0.1%
3971150 1
< 0.1%
3844580 1
< 0.1%
3542323 1
< 0.1%
3464730 1
< 0.1%
3427790 1
< 0.1%

후불잔액
Real number (ℝ)

ZEROS 

Distinct2160
Distinct (%)21.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74747.3
Minimum0
Maximum6500564
Zeros1341
Zeros (%)13.4%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T12:57:24.454472image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11250
median2600
Q319312.5
95-th percentile371730
Maximum6500564
Range6500564
Interquartile range (IQR)18062.5

Descriptive statistics

Standard deviation302859.69
Coefficient of variation (CV)4.0517811
Kurtosis98.062928
Mean74747.3
Median Absolute Deviation (MAD)2600
Skewness8.3560124
Sum7.47473 × 108
Variance9.1723995 × 1010
MonotonicityNot monotonic
2023-12-12T12:57:25.105570image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1250 1513
 
15.1%
0 1341
 
13.4%
950 487
 
4.9%
2500 427
 
4.3%
1100 394
 
3.9%
1350 263
 
2.6%
3750 178
 
1.8%
1900 167
 
1.7%
2200 101
 
1.0%
2850 95
 
0.9%
Other values (2150) 5034
50.3%
ValueCountFrequency (%)
0 1341
13.4%
450 3
 
< 0.1%
500 2
 
< 0.1%
550 1
 
< 0.1%
800 38
 
0.4%
880 1
 
< 0.1%
900 1
 
< 0.1%
950 487
 
4.9%
1050 61
 
0.6%
1100 394
 
3.9%
ValueCountFrequency (%)
6500564 1
< 0.1%
6055594 1
< 0.1%
5184020 1
< 0.1%
4854167 1
< 0.1%
4767292 1
< 0.1%
4341828 1
< 0.1%
3949214 1
< 0.1%
3695426 1
< 0.1%
3531414 1
< 0.1%
3482600 1
< 0.1%

승하차구분
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
승차
6637 
하차
3363 

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row승차
2nd row하차
3rd row승차
4th row하차
5th row승차

Common Values

ValueCountFrequency (%)
승차 6637
66.4%
하차 3363
33.6%

Length

2023-12-12T12:57:25.297946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:57:25.435330image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
승차 6637
66.4%
하차 3363
33.6%

환승구분
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
X
8363 
O
1637 

Length

Max length1
Median length1
Mean length1
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowX
2nd rowX
3rd rowX
4th rowX
5th rowX

Common Values

ValueCountFrequency (%)
X 8363
83.6%
O 1637
 
16.4%

Length

2023-12-12T12:57:25.616711image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T12:57:25.807265image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
x 8363
83.6%
o 1637
 
16.4%

Interactions

2023-12-12T12:57:20.397372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:19.280825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:19.804718image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:20.584304image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:19.457050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:19.988801image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:20.768287image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:19.622440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-12T12:57:20.175032image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-12T12:57:25.919628image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
역이름선후불구분카드사구분거래금액선불잔액후불잔액승하차구분환승구분
역이름1.0000.0610.0860.0000.0000.0600.0900.098
선후불구분0.0611.0000.9630.1550.2200.0530.3490.092
카드사구분0.0860.9631.0000.1480.2100.0550.3430.117
거래금액0.0000.1550.1481.0000.8570.5150.1510.037
선불잔액0.0000.2200.2100.8571.0000.0000.0660.000
후불잔액0.0600.0530.0550.5150.0001.0000.1200.031
승하차구분0.0900.3490.3430.1510.0660.1201.0000.295
환승구분0.0980.0920.1170.0370.0000.0310.2951.000
2023-12-12T12:57:26.116378image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
승하차구분환승구분카드사구분역이름선후불구분
승하차구분1.0000.1910.3080.0710.227
환승구분0.1911.0000.1050.0780.059
카드사구분0.3080.1051.0000.0260.960
역이름0.0710.0780.0261.0000.048
선후불구분0.2270.0590.9600.0481.000
2023-12-12T12:57:26.307279image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
거래금액선불잔액후불잔액역이름선후불구분카드사구분승하차구분환승구분
거래금액1.0000.0470.0260.0000.1110.0700.1090.027
선불잔액0.0471.000-0.4870.0000.2350.0960.0710.000
후불잔액0.026-0.4871.0000.0230.0530.0220.1200.031
역이름0.0000.0000.0231.0000.0480.0260.0710.078
선후불구분0.1110.2350.0530.0481.0000.9600.2270.059
카드사구분0.0700.0960.0220.0260.9601.0000.3080.105
승하차구분0.1090.0710.1200.0710.2270.3081.0000.191
환승구분0.0270.0000.0310.0780.0590.1050.1911.000

Missing values

2023-12-12T12:57:21.061867image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-12T12:57:21.306778image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

정산일자역이름선후불구분카드사구분거래금액선불잔액후불잔액승하차구분환승구분
2333Jul-09현충원역선불교통카드운임(국민)9500950승차X
5211Jan-14월드컵경기장역선불교통카드운임(신한)6600018650하차X
1499Jun-08용문역선불교통카드운임(LG)001900승차X
140Apr-06대동역선불교통카드운임(BC)2400023000하차X
6552Mar-17대전역선불교통카드운임(신한)125001250승차X
1963Jan-09서대전네거리역선불교통카드운임(국민)100050000승차X
421Mar-07지족역선불교통카드운임(BC)0052000승차X
7240Jul-18탄방역선불교통카드운임(하나)250007500하차X
9832Apr-21용문역선불교통카드운임(하나)125001250승차X
3603Mar-11용문역선불교통카드운임(국민)38000117250하차X
정산일자역이름선후불구분카드사구분거래금액선불잔액후불잔액승하차구분환승구분
1307Apr-08대전역선불교통카드운임(국민)20003150승차O
239Jun-06시청역선불교통카드운임(BC)006850승차X
10829Jun-23중앙로역선불교통카드운임(신한)125001250승차X
7156May-18유성온천역후불교통카드운임(유페이)094400승차X
5467Feb-15대동역선불교통카드운임(삼성)005600승차X
931Nov-07중앙로역선불교통카드운임(국민)0055450승차X
7556Nov-18대전역선불교통카드운임(신한)435002338392승차X
1152Jan-08용문역선불교통카드운임(BC)100019400승차X
3907Jun-11시청역선불교통카드운임(외환)5700074300하차X
3370Feb-11정부청사역선불교통카드운임(하나)161500356800하차X