Here is an example of the data frame I have (it is the "stoptimes" table in a gtfs file):
stoptimes <- data.frame(route = c("route1", "route1", "route1", "route2", "route2", "route2", "route3", "route3", "route3"),
stops = c("stop1", "stop2", "stop3", "stop3", "stop2", "stop1", "stop3", "stop4", "stop5"))
I would like to build a data frame (or list) that has the length of the number of distinct stops (5) and associates each stop to a list of all routes that pass on that stop.
How can I build this in R?
For context, later I would like to merge this with the location of each stop, and then create a variable in another data frame that has the number of distinct routes available within a radius of certain other points.
Looks like the OP wants to summarise
the data group_by
stops.
stoptimes |>
summarise(routes = list(route), .by = stops)
stops routes
1 stop1 route1, route2
2 stop2 route1, route2
3 stop3 route1, route2, route3
4 stop4 route3
5 stop5 route3
This will output list-columns for the routes
variable.
We may want a simpler (although less "tidy") output, with character scalars describing the routes, which can be achieved with paste
or toString
stoptimes |>
summarise(routes = toString(route), .by = stops)
This simple answer works for the data given.
If there are repeating values for the routes, we may have to wrap unique
around the routes variable, as in `... routes = list(unique(route)) ...
There is also an option with tidyr::nest
, which will create a nested data.frame with individual tibbles for every value of stops
:
library(tidyr)
stoptimes_nested <-
stoptimes |>
nest(.key = "routes",
.by = stops)
stoptimes_nested$routes
[[1]]
# A tibble: 2 × 1
route
<chr>
1 route1
2 route2
[[2]]
# A tibble: 2 × 1
route
<chr>
1 route1
2 route2
[[3]]
# A tibble: 3 × 1
route
<chr>
1 route1
2 route2
3 route3
[[4]]
# A tibble: 1 × 1
route
<chr>
1 route3
[[5]]
# A tibble: 1 × 1
route
<chr>
1 route3