Overview

Dataset statistics

Number of variables5
Number of observations161
Missing cells0
Missing cells (%)0.0%
Duplicate rows2
Duplicate rows (%)1.2%
Total size in memory6.7 KiB
Average record size in memory42.8 B

Variable types

Categorical3
Numeric2

Alerts

Dataset has 2 (1.2%) duplicate rowsDuplicates
위도 is highly overall correlated with 택시IDHigh correlation
경도 is highly overall correlated with 택시IDHigh correlation
택시ID is highly overall correlated with 위도 and 2 other fieldsHigh correlation
승객탑승여부 is highly overall correlated with 택시IDHigh correlation

Reproduction

Analysis started2023-12-10 06:24:08.319538
Analysis finished2023-12-10 06:24:10.067615
Duration1.75 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

택시ID
Categorical

HIGH CORRELATION 

Distinct34
Distinct (%)21.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
T_92408066
 
5
T_94605377
 
5
T_70612477
 
5
T_45227948
 
5
T_45154705
 
5
Other values (29)
136 

Length

Max length10
Median length10
Mean length10
Min length10

Unique

Unique1 ?
Unique (%)0.6%

Sample

1st rowT_92408066
2nd rowT_15668530
3rd rowT_70612477
4th rowT_45227948
5th rowT_45154705

Common Values

ValueCountFrequency (%)
T_92408066 5
 
3.1%
T_94605377 5
 
3.1%
T_70612477 5
 
3.1%
T_45227948 5
 
3.1%
T_45154705 5
 
3.1%
T_90650218 5
 
3.1%
T_91089680 5
 
3.1%
T_66950294 5
 
3.1%
T_17133403 5
 
3.1%
T_93872940 5
 
3.1%
Other values (24) 111
68.9%

Length

2023-12-10T15:24:10.182244image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
t_92408066 5
 
3.1%
t_95484301 5
 
3.1%
t_43836318 5
 
3.1%
t_15888261 5
 
3.1%
t_45520923 5
 
3.1%
t_15668530 5
 
3.1%
t_67170025 5
 
3.1%
t_19257470 5
 
3.1%
t_42591176 5
 
3.1%
t_69001117 5
 
3.1%
Other values (24) 111
68.9%

위도
Real number (ℝ)

HIGH CORRELATION 

Distinct112
Distinct (%)69.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.488582
Minimum37.388767
Maximum37.73002
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB
2023-12-10T15:24:10.494788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum37.388767
5-th percentile37.42045
Q137.45636
median37.468243
Q337.508747
95-th percentile37.56846
Maximum37.73002
Range0.341253
Interquartile range (IQR)0.052387

Descriptive statistics

Standard deviation0.058850422
Coefficient of variation (CV)0.0015698226
Kurtosis6.7164068
Mean37.488582
Median Absolute Deviation (MAD)0.025592
Skewness2.0539725
Sum6035.6617
Variance0.0034633722
MonotonicityNot monotonic
2023-12-10T15:24:10.786885image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
37.484592 5
 
3.1%
37.54526 5
 
3.1%
37.460526 5
 
3.1%
37.49383 4
 
2.5%
37.56846 4
 
2.5%
37.501568 4
 
2.5%
37.458023 4
 
2.5%
37.542427 3
 
1.9%
37.45625 3
 
1.9%
37.462593 3
 
1.9%
Other values (102) 121
75.2%
ValueCountFrequency (%)
37.388767 1
 
0.6%
37.388905 1
 
0.6%
37.38904 1
 
0.6%
37.38918 1
 
0.6%
37.420433 2
1.2%
37.420437 1
 
0.6%
37.420444 1
 
0.6%
37.42045 1
 
0.6%
37.423233 3
1.9%
37.423237 1
 
0.6%
ValueCountFrequency (%)
37.73002 1
 
0.6%
37.730015 1
 
0.6%
37.730007 1
 
0.6%
37.730003 1
 
0.6%
37.729992 1
 
0.6%
37.56846 4
2.5%
37.568455 1
 
0.6%
37.54526 5
3.1%
37.542427 3
1.9%
37.542423 2
 
1.2%

경도
Real number (ℝ)

HIGH CORRELATION 

Distinct98
Distinct (%)60.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean126.70777
Minimum126.63418
Maximum126.96877
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.5 KiB
2023-12-10T15:24:11.040154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum126.63418
5-th percentile126.63836
Q1126.67371
median126.70456
Q3126.72907
95-th percentile126.76492
Maximum126.96877
Range0.334585
Interquartile range (IQR)0.05536

Descriptive statistics

Standard deviation0.058437438
Coefficient of variation (CV)0.00046119853
Kurtosis10.096486
Mean126.70777
Median Absolute Deviation (MAD)0.0264
Skewness2.6334851
Sum20399.951
Variance0.0034149342
MonotonicityNot monotonic
2023-12-10T15:24:11.302118image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
126.76492 5
 
3.1%
126.67371 5
 
3.1%
126.66354 5
 
3.1%
126.65718 5
 
3.1%
126.73673 5
 
3.1%
126.63836 5
 
3.1%
126.63418 4
 
2.5%
126.69036 4
 
2.5%
126.646324 4
 
2.5%
126.71691 4
 
2.5%
Other values (88) 115
71.4%
ValueCountFrequency (%)
126.63418 4
2.5%
126.63768 1
 
0.6%
126.63778 1
 
0.6%
126.63787 1
 
0.6%
126.637955 1
 
0.6%
126.63836 5
3.1%
126.64631 1
 
0.6%
126.646324 4
2.5%
126.65718 5
3.1%
126.66354 5
3.1%
ValueCountFrequency (%)
126.968765 1
 
0.6%
126.968506 1
 
0.6%
126.96826 1
 
0.6%
126.968 1
 
0.6%
126.96776 1
 
0.6%
126.76492 5
3.1%
126.75928 1
 
0.6%
126.75912 1
 
0.6%
126.758965 1
 
0.6%
126.758835 1
 
0.6%

승객탑승여부
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
미탑승
87 
탑승
74 

Length

Max length3
Median length3
Mean length2.5403727
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row탑승
2nd row탑승
3rd row미탑승
4th row탑승
5th row탑승

Common Values

ValueCountFrequency (%)
미탑승 87
54.0%
탑승 74
46.0%

Length

2023-12-10T15:24:11.535761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:24:11.704754image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
미탑승 87
54.0%
탑승 74
46.0%

측정 시간
Categorical

Distinct5
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
2020-06-08 00:00:01
34 
2020-06-08 00:00:03
34 
2020-06-08 00:00:00
33 
2020-06-08 00:00:02
32 
2020-06-08 00:00:04
28 

Length

Max length19
Median length19
Mean length19
Min length19

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2020-06-08 00:00:00
2nd row2020-06-08 00:00:00
3rd row2020-06-08 00:00:00
4th row2020-06-08 00:00:00
5th row2020-06-08 00:00:00

Common Values

ValueCountFrequency (%)
2020-06-08 00:00:01 34
21.1%
2020-06-08 00:00:03 34
21.1%
2020-06-08 00:00:00 33
20.5%
2020-06-08 00:00:02 32
19.9%
2020-06-08 00:00:04 28
17.4%

Length

2023-12-10T15:24:11.894496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-10T15:24:12.109362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2020-06-08 161
50.0%
00:00:01 34
 
10.6%
00:00:03 34
 
10.6%
00:00:00 33
 
10.2%
00:00:02 32
 
9.9%
00:00:04 28
 
8.7%

Interactions

2023-12-10T15:24:09.423236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:24:08.667994image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:24:09.635254image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-10T15:24:09.220497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-10T15:24:12.260026image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
택시ID위도경도승객탑승여부측정 시간
택시ID1.0001.0001.0001.0000.000
위도1.0001.0000.5980.3220.000
경도1.0000.5981.0000.3550.000
승객탑승여부1.0000.3220.3551.0000.000
측정 시간0.0000.0000.0000.0001.000
2023-12-10T15:24:12.434855image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
택시ID측정 시간승객탑승여부
택시ID1.0000.0000.894
측정 시간0.0001.0000.000
승객탑승여부0.8940.0001.000
2023-12-10T15:24:12.611388image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
위도경도택시ID승객탑승여부측정 시간
위도1.0000.4560.9080.3390.000
경도0.4561.0000.9020.4280.000
택시ID0.9080.9021.0000.8940.000
승객탑승여부0.3390.4280.8941.0000.000
측정 시간0.0000.0000.0000.0001.000

Missing values

2023-12-10T15:24:09.848533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-10T15:24:10.004203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

택시ID위도경도승객탑승여부측정 시간
0T_9240806637.448227126.73584탑승2020-06-08 00:00:00
1T_1566853037.50838126.968765탑승2020-06-08 00:00:00
2T_7061247737.457962126.6897미탑승2020-06-08 00:00:00
3T_4522794837.46821126.69036탑승2020-06-08 00:00:00
4T_4515470537.504585126.73096탑승2020-06-08 00:00:00
5T_9065021837.475273126.69245미탑승2020-06-08 00:00:00
6T_9108968037.462593126.70907탑승2020-06-08 00:00:00
7T_6695029437.542427126.73673미탑승2020-06-08 00:00:00
8T_1713340337.521465126.669044미탑승2020-06-08 00:00:00
9T_4229820137.456356126.71468미탑승2020-06-08 00:00:00
택시ID위도경도승객탑승여부측정 시간
151T_6958706637.73002126.75872미탑승2020-06-08 00:00:04
152T_9460537737.534145126.75168미탑승2020-06-08 00:00:04
153T_4552092337.52068126.704575미탑승2020-06-08 00:00:04
154T_1588826137.42045126.72907탑승2020-06-08 00:00:04
155T_6900111737.462425126.68045탑승2020-06-08 00:00:04
156T_9548430137.53736126.72796탑승2020-06-08 00:00:04
157T_9292077237.501564126.74687탑승2020-06-08 00:00:04
158T_4383631837.50567126.71688탑승2020-06-08 00:00:04
159T_4317712537.49383126.68102미탑승2020-06-08 00:00:04
160T_9387294037.491005126.71512탑승2020-06-08 00:00:04

Duplicate rows

Most frequently occurring

택시ID위도경도승객탑승여부측정 시간# duplicates
0T_4288415037.460526126.63836탑승2020-06-08 00:00:012
1T_4288415037.460526126.63836탑승2020-06-08 00:00:032