Analyzing "Call a Bike" bike sharing data.

Call a bike is a service of the “Deutsche Bahn”, providing a rental bikes for short trips similar to citibikeNYC. I used it extensively for some time. Recently I found out that they provide individual trip data trough their API. I pulled last year’s data from the “CallaBike”-SOAP API.

So it looks like I did 403 trips using a Call a Bike in 2014.

After some data cleaning, we will take an initial glimpse at the data.

plot of chunk unnamed-chunk-3

We see a clear pattern of high usage during the week and at commuting hours.

plot of chunk unnamed-chunk-4

Commute times are basically the same for the morning/evening trips.

plot of chunk unnamed-chunk-5

Commute times did not change over the year. (some ups and downs -but it’s basically stable.)

According to Google Maps the distance is 4.9km. Which makes a total of 1969.8 KM in 2014, at an average speed of: 20.45.

As this data indicates starting and ending of commutes, we can calculate the time spend at work.

plot of chunk unnamed-chunk-7

Looks like an easy 50.512 hours week (on average). With the time spend at work being quite similar from Monday till Tuesday, and Fridays being more relaxed.

plot of chunk unnamed-chunk-8

And finally some regression model (using the fantastic stargazer package) to explain the time spend at work …

lm <- lm(wt~wday+mon+dt.x+dt.y, data=working)
lm1 <- lm(wt~wday+mon, data=working)
stargazer(lm1, lm,  type = "html", title="Explaining time spend at work.")
Explaining time spend at work.
Dependent variable:
wday2 Di0.1540.115
wday3 Mi0.0220.004
wday4 Do0.1080.144
wday5 Fr-1.499***-1.536***
mon02 Feb-0.726**-0.775***
mon03 Mrz-0.613**-0.680**
mon04 Apr-0.842***-0.930***
mon05 Mai-1.369***-1.303***
mon06 Jun-1.260***-1.262***
mon07 Jul-0.858***-0.947***
mon08 Aug-0.395-0.406
mon09 Sep-0.600**-0.585**
mon10 Okt-0.922***-0.894***
mon11 Nov-0.559**-0.656***
Adjusted R20.4420.466
Residual Std. Error0.759 (df = 146)0.743 (df = 144)
F Statistic10.069*** (df = 14; 146)9.743*** (df = 16; 144)
Note:*p<0.1; **p<0.05; ***p<0.01

Why is time spend at work negatively correlated with drive time back from work (variable dt.y)? Finally 3 models to explain the commute time back from work.

lm3 <- lm(dt.y~wt, data=working)
lm4 <- lm(dt.y~wt+wday, data=working)
lm5 <- lm(dt.y~wt+wday +mon, data=working)
stargazer(lm3,lm4,lm5,  type = "html", title="Explaining commute time back from work.")
Explaining commute time back from work.
Dependent variable:
wday2 Di-1.534-1.367
wday3 Mi-0.447-0.324
wday4 Do0.2760.592
wday5 Fr-1.438-2.103*
mon02 Feb-2.597*
mon03 Mrz-2.163
mon04 Apr-3.239**
mon05 Mai-1.090
mon06 Jun-2.306*
mon07 Jul-3.666***
mon08 Aug-0.901
mon09 Sep-0.042
mon10 Okt-1.016
mon11 Nov-3.420***
Adjusted R20.0140.0200.070
Residual Std. Error3.966 (df = 159)3.953 (df = 155)3.852 (df = 145)
F Statistic3.271* (df = 1; 159)1.660 (df = 5; 155)1.800** (df = 15; 145)
Note:*p<0.1; **p<0.05; ***p<0.01

There are more potential factors that explain drive time as well, such as weather conditions (especially west wind).

To sum up; Deutsche Bahn could easily know where I live, where I work, and how much I work. I assume that car sharing data is similar privacy sensitive.

Written on December 8, 2015