rshinydplyrshinytree

construct string from group dplyr


I have a large data frame and I am trying to construct a string based on groups within the data frame for displaying in shinyTree.

Here is an example of data:

dat <- data.frame("region" = c(paste("region", rep(1:3, each=4))),
              "area" = c(paste("area", rep(1:6, each=2))),
              "name" = c(paste("name",1:12)))

shinyTree requires that data is constructed in a string that looks like:

listString <- paste0("list('region 1' = list('area 1' = list('name 1'='', 'name 2'=''), 
                                         'area 2' = list('name 3'='', 'name 4'='')),
                       'region 2' = list('area 3' = list('name 5'='', 'name 6'=''), 
                                        'area 4' = list('name 7'='', 'name 8'='')),
                       'region 3' = list('area 5' = list('name 9'='', 'name 10'=''), 
                                        'area 6' = list('name 11'='', 'name 12'='')))")

Is there a way to construct this string using mutate and groups in dplyr? The "list(" elements should be concatenated onto the 1st occurrence of each group.

I have tried nested for loops and nested lapply() functions with compiler::cmpfun() to speed it up, but this is proving to be too slow to construct. My data has 5 "levels" and ~3000 rows, and it takes ~30 seconds to process which is too slow for a shiny application.

Any help would be greatly appreciated.


Solution

  • Here is a tidyverse solution. The key is to use summarise and str_c(collapse = ) to put the same hierarchy together, then mutate and str_c to add the additional list( calls and commas/spaces. Including collapse= means a character vector is turned into one of length one with the desired separator, making it possible to use with summarise. I'd try running this line by line to see how it comes together, alternately formatting then removing hierarchy. The final [[ is just to make it a string format instead of a tibble. Since there are more levels in the real code, I wrapped the more repetitive str_c calls into makelist and collapse functions, to make it clearer what's happening when and more readable.

    N.B. additional bonus is that summarise drops old variables for use and also removes grouping levels as we go, so we don't need any extra group_by or any select calls!

    library(tidyverse)
    tbl <- tibble(
      "region" = c(paste("region", rep(1:3, each=4))),
      "area" = c(paste("area", rep(1:6, each=2))),
      "name" = c(paste("name",1:12))
    )
    
    makelist <- function(parent, child) str_c("'", parent, "' = list(", child, ")")
    collapse <- function(level) str_c(level, collapse = ", ")
    
    tbl %>%
      mutate(name = str_c("'", name, "'=''")) %>%
      group_by(region, area) %>%
      summarise(names = collapse(name)) %>%
      mutate(area = makelist(area, names)) %>%
      summarise(areas = collapse(area)) %>%
      mutate(region = makelist(region, areas)) %>%
      summarise(regions = collapse(region)) %>%
      mutate(liststr = str_c("list(", regions, ")")) %>%
      `[[`(1)
    #> [1] "list('region 1' = list('area 1' = list('name 1'='', 'name 2'=''), 'area 2' = list('name 3'='', 'name 4'='')), 'region 2' = list('area 3' = list('name 5'='', 'name 6'=''), 'area 4' = list('name 7'='', 'name 8'='')), 'region 3' = list('area 5' = list('name 9'='', 'name 10'=''), 'area 6' = list('name 11'='', 'name 12'='')))"
    

    Created on 2018-03-01 by the reprex package (v0.2.0).