Overview

Dataset statistics

Number of variables9
Number of observations89
Missing cells115
Missing cells (%)14.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.2 KiB
Average record size in memory82.5 B

Variable types

Numeric5
Categorical3
Unsupported1

Dataset

Description경기도_BMS 예비차 상태 정보
Author경기도
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=GTBX9KXJQ425RIVG609U33362683&infSeq=1

Alerts

업체아이디 is highly overall correlated with 시외무정차수High correlation
예비차총수 is highly overall correlated with 일반수 and 4 other fieldsHigh correlation
일반수 is highly overall correlated with 예비차총수High correlation
무정차수 is highly overall correlated with 예비차총수 and 1 other fieldsHigh correlation
갱신일자 is highly overall correlated with 예비차총수 and 1 other fieldsHigh correlation
좌석수 is highly overall correlated with 예비차총수High correlation
급행차수 is highly overall correlated with 무정차수High correlation
시외무정차수 is highly overall correlated with 업체아이디 and 2 other fieldsHigh correlation
좌석수 is highly imbalanced (53.0%)Imbalance
급행차수 is highly imbalanced (60.3%)Imbalance
시외무정차수 is highly imbalanced (63.3%)Imbalance
일반수 has 13 (14.6%) missing valuesMissing
무정차수 has 13 (14.6%) missing valuesMissing
갱신아이디 has 89 (100.0%) missing valuesMissing
업체아이디 has unique valuesUnique
갱신아이디 is an unsupported type, check if it needs cleaning or further analysisUnsupported
예비차총수 has 25 (28.1%) zerosZeros
일반수 has 35 (39.3%) zerosZeros
무정차수 has 43 (48.3%) zerosZeros

Reproduction

Analysis started2023-12-10 22:30:56.708998
Analysis finished2023-12-10 22:30:59.365647
Duration2.66 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

업체아이디
Real number (ℝ)

HIGH CORRELATION  UNIQUE 

Distinct89
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4112749.4
Minimum4100200
Maximum4155200
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size933.0 B
2023-12-11T07:30:59.434553image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum4100200
5-th percentile4100640
Q14102700
median4106300
Q34110700
95-th percentile4151640
Maximum4155200
Range55000
Interquartile range (IQR)8000

Descriptive statistics

Standard deviation17329.737
Coefficient of variation (CV)0.0042136624
Kurtosis1.4466162
Mean4112749.4
Median Absolute Deviation (MAD)4100
Skewness1.7730721
Sum3.660347 × 108
Variance3.003198 × 108
MonotonicityNot monotonic
2023-12-11T07:30:59.646552image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4111700 1
 
1.1%
4103900 1
 
1.1%
4108900 1
 
1.1%
4105900 1
 
1.1%
4106100 1
 
1.1%
4110600 1
 
1.1%
4103600 1
 
1.1%
4107400 1
 
1.1%
4103500 1
 
1.1%
4111300 1
 
1.1%
Other values (79) 79
88.8%
ValueCountFrequency (%)
4100200 1
1.1%
4100300 1
1.1%
4100400 1
1.1%
4100500 1
1.1%
4100600 1
1.1%
4100700 1
1.1%
4100800 1
1.1%
4100900 1
1.1%
4101100 1
1.1%
4101200 1
1.1%
ValueCountFrequency (%)
4155200 1
1.1%
4155100 1
1.1%
4155000 1
1.1%
4153600 1
1.1%
4151800 1
1.1%
4151400 1
1.1%
4151100 1
1.1%
4150700 1
1.1%
4150600 1
1.1%
4150500 1
1.1%

예비차총수
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct22
Distinct (%)24.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.2134831
Minimum0
Maximum36
Zeros25
Zeros (%)28.1%
Negative0
Negative (%)0.0%
Memory size933.0 B
2023-12-11T07:30:59.875548image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q37
95-th percentile23.2
Maximum36
Range36
Interquartile range (IQR)7

Descriptive statistics

Standard deviation7.5309542
Coefficient of variation (CV)1.4445149
Kurtosis4.3877762
Mean5.2134831
Median Absolute Deviation (MAD)2
Skewness2.1265539
Sum464
Variance56.715271
MonotonicityNot monotonic
2023-12-11T07:31:00.104632image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
0 25
28.1%
1 12
13.5%
2 9
 
10.1%
3 9
 
10.1%
4 6
 
6.7%
7 4
 
4.5%
5 3
 
3.4%
6 2
 
2.2%
15 2
 
2.2%
8 2
 
2.2%
Other values (12) 15
16.9%
ValueCountFrequency (%)
0 25
28.1%
1 12
13.5%
2 9
 
10.1%
3 9
 
10.1%
4 6
 
6.7%
5 3
 
3.4%
6 2
 
2.2%
7 4
 
4.5%
8 2
 
2.2%
9 2
 
2.2%
ValueCountFrequency (%)
36 1
1.1%
28 2
2.2%
27 1
1.1%
24 1
1.1%
22 1
1.1%
19 1
1.1%
17 1
1.1%
15 2
2.2%
14 1
1.1%
13 2
2.2%

일반수
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct15
Distinct (%)19.7%
Missing13
Missing (%)14.6%
Infinite0
Infinite (%)0.0%
Mean2.9736842
Minimum0
Maximum27
Zeros35
Zeros (%)39.3%
Negative0
Negative (%)0.0%
Memory size933.0 B
2023-12-11T07:31:00.246745image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile13
Maximum27
Range27
Interquartile range (IQR)4

Descriptive statistics

Standard deviation4.8825504
Coefficient of variation (CV)1.6419196
Kurtosis7.7040092
Mean2.9736842
Median Absolute Deviation (MAD)1
Skewness2.5102497
Sum226
Variance23.839298
MonotonicityNot monotonic
2023-12-11T07:31:00.379819image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0 35
39.3%
1 10
 
11.2%
3 6
 
6.7%
4 5
 
5.6%
5 4
 
4.5%
2 4
 
4.5%
7 2
 
2.2%
12 2
 
2.2%
13 2
 
2.2%
9 1
 
1.1%
Other values (5) 5
 
5.6%
(Missing) 13
 
14.6%
ValueCountFrequency (%)
0 35
39.3%
1 10
 
11.2%
2 4
 
4.5%
3 6
 
6.7%
4 5
 
5.6%
5 4
 
4.5%
7 2
 
2.2%
8 1
 
1.1%
9 1
 
1.1%
11 1
 
1.1%
ValueCountFrequency (%)
27 1
 
1.1%
16 1
 
1.1%
15 1
 
1.1%
13 2
2.2%
12 2
2.2%
11 1
 
1.1%
9 1
 
1.1%
8 1
 
1.1%
7 2
2.2%
5 4
4.5%

좌석수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Memory size844.0 B
0
71 
<NA>
13 
1
 
4
4
 
1

Length

Max length4
Median length1
Mean length1.4382022
Min length1

Unique

Unique1 ?
Unique (%)1.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 71
79.8%
<NA> 13
 
14.6%
1 4
 
4.5%
4 1
 
1.1%

Length

2023-12-11T07:31:00.508543image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T07:31:00.614161image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 71
79.8%
na 13
 
14.6%
1 4
 
4.5%
4 1
 
1.1%

무정차수
Real number (ℝ)

HIGH CORRELATION  MISSING  ZEROS 

Distinct15
Distinct (%)19.7%
Missing13
Missing (%)14.6%
Infinite0
Infinite (%)0.0%
Mean2.6447368
Minimum0
Maximum27
Zeros43
Zeros (%)48.3%
Negative0
Negative (%)0.0%
Memory size933.0 B
2023-12-11T07:31:00.718259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile12
Maximum27
Range27
Interquartile range (IQR)3

Descriptive statistics

Standard deviation5.0377348
Coefficient of variation (CV)1.9048151
Kurtosis8.6925659
Mean2.6447368
Median Absolute Deviation (MAD)0
Skewness2.7795179
Sum201
Variance25.378772
MonotonicityNot monotonic
2023-12-11T07:31:00.829239image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
0 43
48.3%
3 7
 
7.9%
1 6
 
6.7%
2 4
 
4.5%
5 3
 
3.4%
11 2
 
2.2%
4 2
 
2.2%
8 2
 
2.2%
20 1
 
1.1%
9 1
 
1.1%
Other values (5) 5
 
5.6%
(Missing) 13
 
14.6%
ValueCountFrequency (%)
0 43
48.3%
1 6
 
6.7%
2 4
 
4.5%
3 7
 
7.9%
4 2
 
2.2%
5 3
 
3.4%
7 1
 
1.1%
8 2
 
2.2%
9 1
 
1.1%
10 1
 
1.1%
ValueCountFrequency (%)
27 1
 
1.1%
20 1
 
1.1%
17 1
 
1.1%
15 1
 
1.1%
11 2
2.2%
10 1
 
1.1%
9 1
 
1.1%
8 2
2.2%
7 1
 
1.1%
5 3
3.4%

급행차수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size844.0 B
0
72 
<NA>
13 
1
 
2
13
 
1
2
 
1

Length

Max length4
Median length1
Mean length1.4494382
Min length1

Unique

Unique2 ?
Unique (%)2.2%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 72
80.9%
<NA> 13
 
14.6%
1 2
 
2.2%
13 1
 
1.1%
2 1
 
1.1%

Length

2023-12-11T07:31:01.258252image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T07:31:01.369861image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0 72
80.9%
na 13
 
14.6%
1 2
 
2.2%
13 1
 
1.1%
2 1
 
1.1%

시외무정차수
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)5.6%
Missing0
Missing (%)0.0%
Memory size844.0 B
<NA>
76 
0
 
7
3
 
2
1
 
2
2
 
2

Length

Max length4
Median length4
Mean length3.5617978
Min length1

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<NA>
2nd row<NA>
3rd row<NA>
4th row<NA>
5th row<NA>

Common Values

ValueCountFrequency (%)
<NA> 76
85.4%
0 7
 
7.9%
3 2
 
2.2%
1 2
 
2.2%
2 2
 
2.2%

Length

2023-12-11T07:31:01.493134image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-11T07:31:01.607439image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
na 76
85.4%
0 7
 
7.9%
3 2
 
2.2%
1 2
 
2.2%
2 2
 
2.2%

갱신일자
Real number (ℝ)

HIGH CORRELATION 

Distinct40
Distinct (%)44.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0223948 × 1013
Minimum2.0220701 × 1013
Maximum2.023051 × 1013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size933.0 B
2023-12-11T07:31:01.745440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2.0220701 × 1013
5-th percentile2.0220701 × 1013
Q12.0221209 × 1013
median2.0221209 × 1013
Q32.0230112 × 1013
95-th percentile2.023051 × 1013
Maximum2.023051 × 1013
Range9.8090255 × 109
Interquartile range (IQR)8.9030281 × 109

Descriptive statistics

Standard deviation4.2623893 × 109
Coefficient of variation (CV)0.00021075951
Kurtosis-1.2668479
Mean2.0223948 × 1013
Median Absolute Deviation (MAD)2
Skewness0.86847063
Sum1.7999314 × 1015
Variance1.8167963 × 1019
MonotonicityNot monotonic
2023-12-11T07:31:01.882655image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
20221209125050 16
18.0%
20221209125048 15
16.9%
20221209125049 15
16.9%
20220701090116 3
 
3.4%
20221209125047 3
 
3.4%
20220701090115 3
 
3.4%
20220701085634 1
 
1.1%
20230510110659 1
 
1.1%
20230510110339 1
 
1.1%
20230510110406 1
 
1.1%
Other values (30) 30
33.7%
ValueCountFrequency (%)
20220701085634 1
 
1.1%
20220701090115 3
 
3.4%
20220701090116 3
 
3.4%
20221209125047 3
 
3.4%
20221209125048 15
16.9%
20221209125049 15
16.9%
20221209125050 16
18.0%
20221216094014 1
 
1.1%
20221223135829 1
 
1.1%
20221223140025 1
 
1.1%
ValueCountFrequency (%)
20230510111110 1
1.1%
20230510111045 1
1.1%
20230510110919 1
1.1%
20230510110659 1
1.1%
20230510110613 1
1.1%
20230510110534 1
1.1%
20230510110502 1
1.1%
20230510110432 1
1.1%
20230510110406 1
1.1%
20230510110339 1
1.1%

갱신아이디
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing89
Missing (%)100.0%
Memory size933.0 B

Interactions

2023-12-11T07:30:58.586579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:56.999182image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.407654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.819331image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.195781image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.673530image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.095757image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.487651image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.896119image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.270906image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.762251image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.172137image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.576348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.978386image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.349359image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.839802image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.254026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.660179image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.047422image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.419282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.919523image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.329053image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:57.738391image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.122484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-11T07:30:58.501852image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-11T07:31:01.976407image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체아이디예비차총수일반수좌석수무정차수급행차수시외무정차수갱신일자
업체아이디1.0000.1100.0000.0000.0000.000NaN0.125
예비차총수0.1101.0000.8010.8730.9310.529NaN0.524
일반수0.0000.8011.0000.5280.3640.336NaN0.306
좌석수0.0000.8730.5281.0000.8070.303NaN0.065
무정차수0.0000.9310.3640.8071.0000.693NaN0.536
급행차수0.0000.5290.3360.3030.6931.000NaN0.354
시외무정차수NaNNaNNaNNaNNaNNaN1.0000.904
갱신일자0.1250.5240.3060.0650.5360.3540.9041.000
2023-12-11T07:31:02.104187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
급행차수좌석수시외무정차수
급행차수1.0000.288NaN
좌석수0.2881.000NaN
시외무정차수NaNNaN1.000
2023-12-11T07:31:02.196196image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
업체아이디예비차총수일반수무정차수갱신일자좌석수급행차수시외무정차수
업체아이디1.000-0.436-0.167-0.214-0.2370.0000.0001.000
예비차총수-0.4361.0000.6790.6510.5960.5710.3531.000
일반수-0.1670.6791.0000.0480.3060.4030.2300.000
무정차수-0.2140.6510.0481.0000.4430.4920.5070.000
갱신일자-0.2370.5960.3060.4431.0000.1100.2340.643
좌석수0.0000.5710.4030.4920.1101.0000.2880.000
급행차수0.0000.3530.2300.5070.2340.2881.0000.000
시외무정차수1.0001.0000.0000.0000.6430.0000.0001.000

Missing values

2023-12-11T07:30:59.027535image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-11T07:30:59.170597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-12-11T07:30:59.293574image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

업체아이디예비차총수일반수좌석수무정차수급행차수시외무정차수갱신일자갱신아이디
0411170010010<NA>20220701085634<NA>
1411110077000<NA>20221209125050<NA>
2410440055000<NA>20221209125050<NA>
3411150000000<NA>20221209125048<NA>
4410850000000<NA>20230206135049<NA>
5410870022000<NA>20230127113402<NA>
6411070000000<NA>20221209125047<NA>
741057001212000<NA>20230510110534<NA>
8415140000000<NA>20221209125047<NA>
9410500050050<NA>20230510110613<NA>
업체아이디예비차총수일반수좌석수무정차수급행차수시외무정차수갱신일자갱신아이디
7941506000<NA><NA><NA><NA>020220701090115<NA>
8041501000<NA><NA><NA><NA>020220701090115<NA>
8141511001<NA><NA><NA><NA>120221209125050<NA>
8241503000<NA><NA><NA><NA>020230206140738<NA>
8341552002<NA><NA><NA><NA>220230510111045<NA>
8441507000<NA><NA><NA><NA>020220701090116<NA>
8541505000<NA><NA><NA><NA>020220701090116<NA>
8641536002<NA><NA><NA><NA>220230510111110<NA>
8741550001<NA><NA><NA><NA>120221209125050<NA>
8841518000<NA><NA><NA><NA>020220701090116<NA>