rmergesummarizegtfs

Create a summarised column that is a list of values


Here is an example of the data frame I have (it is the "stoptimes" table in a gtfs file):

stoptimes <- data.frame(route = c("route1", "route1", "route1", "route2", "route2", "route2", "route3", "route3", "route3"),
                    stops = c("stop1", "stop2", "stop3", "stop3", "stop2", "stop1", "stop3", "stop4", "stop5"))

I would like to build a data frame (or list) that has the length of the number of distinct stops (5) and associates each stop to a list of all routes that pass on that stop.

How can I build this in R?

For context, later I would like to merge this with the location of each stop, and then create a variable in another data frame that has the number of distinct routes available within a radius of certain other points.


Solution

  • Looks like the OP wants to summarise the data group_by stops.

    stoptimes |> 
        summarise(routes = list(route), .by = stops)
    
      stops                 routes
    1 stop1         route1, route2
    2 stop2         route1, route2
    3 stop3 route1, route2, route3
    4 stop4                 route3
    5 stop5                 route3
    

    This will output list-columns for the routes variable. We may want a simpler (although less "tidy") output, with character scalars describing the routes, which can be achieved with paste or toString

    stoptimes |> 
        summarise(routes = toString(route), .by = stops)
    

    This simple answer works for the data given. If there are repeating values for the routes, we may have to wrap unique around the routes variable, as in `... routes = list(unique(route)) ...

    There is also an option with tidyr::nest, which will create a nested data.frame with individual tibbles for every value of stops:

    library(tidyr)
    
    stoptimes_nested <-
        stoptimes |> 
        nest(.key = "routes",
             .by = stops)
    
    stoptimes_nested$routes
    
    [[1]]
    # A tibble: 2 × 1
      route 
      <chr> 
    1 route1
    2 route2
    
    [[2]]
    # A tibble: 2 × 1
      route 
      <chr> 
    1 route1
    2 route2
    
    [[3]]
    # A tibble: 3 × 1
      route 
      <chr> 
    1 route1
    2 route2
    3 route3
    
    [[4]]
    # A tibble: 1 × 1
      route 
      <chr> 
    1 route3
    
    [[5]]
    # A tibble: 1 × 1
      route 
      <chr> 
    1 route3