rtime-seriesfable-r

Automated way of defining hierarchic structure in fabletools::aggregate_key


I'm looking for an automated way to declare the .spec-part from aggregate_key, starting from a vector of strings containing the names of the variables linked to the different levels.

The following of course doesn't work, but everything I tried with adding !!as.name() or using do.call, ended in failure.

levels <- paste( c("L1",'L2','L3'), collapse = '/')

mytsibble %>% aggregate_key(levels, value = sum(value))


Solution

  • fabletools::aggregate_key() supports tidyverse style !! operations for non-standard evaluation.

    This allows you to construct the expression however you like, and use it within aggregate_key() using !!expression.

    For example, using rlang::parse_expr() to convert a string into an expression:

    library(fpp3)
    tourism %>% 
      aggregate_key(Purpose*(State/Region), Trips = sum(Trips))
    #> # A tsibble: 34,000 x 5 [1Q]
    #> # Key:       Purpose, State, Region [425]
    #>    Quarter Purpose      State        Region        Trips
    #>      <qtr> <chr*>       <chr*>       <chr*>        <dbl>
    #>  1 1998 Q1 <aggregated> <aggregated> <aggregated> 23182.
    #>  2 1998 Q2 <aggregated> <aggregated> <aggregated> 20323.
    #>  3 1998 Q3 <aggregated> <aggregated> <aggregated> 19827.
    #>  4 1998 Q4 <aggregated> <aggregated> <aggregated> 20830.
    #>  5 1999 Q1 <aggregated> <aggregated> <aggregated> 22087.
    #>  6 1999 Q2 <aggregated> <aggregated> <aggregated> 21458.
    #>  7 1999 Q3 <aggregated> <aggregated> <aggregated> 19914.
    #>  8 1999 Q4 <aggregated> <aggregated> <aggregated> 20028.
    #>  9 2000 Q1 <aggregated> <aggregated> <aggregated> 22339.
    #> 10 2000 Q2 <aggregated> <aggregated> <aggregated> 19941.
    #> # … with 33,990 more rows
    
    levels <- rlang::parse_expr("Purpose*(State/Region)")
    tourism %>% 
      aggregate_key(.spec = !!levels, Trips = sum(Trips))
    #> # A tsibble: 34,000 x 5 [1Q]
    #> # Key:       Purpose, State, Region [425]
    #>    Quarter Purpose      State        Region        Trips
    #>      <qtr> <chr*>       <chr*>       <chr*>        <dbl>
    #>  1 1998 Q1 <aggregated> <aggregated> <aggregated> 23182.
    #>  2 1998 Q2 <aggregated> <aggregated> <aggregated> 20323.
    #>  3 1998 Q3 <aggregated> <aggregated> <aggregated> 19827.
    #>  4 1998 Q4 <aggregated> <aggregated> <aggregated> 20830.
    #>  5 1999 Q1 <aggregated> <aggregated> <aggregated> 22087.
    #>  6 1999 Q2 <aggregated> <aggregated> <aggregated> 21458.
    #>  7 1999 Q3 <aggregated> <aggregated> <aggregated> 19914.
    #>  8 1999 Q4 <aggregated> <aggregated> <aggregated> 20028.
    #>  9 2000 Q1 <aggregated> <aggregated> <aggregated> 22339.
    #> 10 2000 Q2 <aggregated> <aggregated> <aggregated> 19941.
    #> # … with 33,990 more rows
    

    Created on 2022-07-29 by the reprex package (v2.0.1)

    This would work with your example as:

    levels <- rlang::parse_expr(paste( c("L1",'L2','L3'), collapse = '/'))
    mytsibble %>% aggregate_key(!!levels, value = sum(value))
    

    There are more robust ways to construct the expression (incase the variable names contain * or /), for example you could use rlang::call2() with symbols and expressions.

    library(rlang)
    call2("*", sym("Purpose"), call2("/", sym("State"), sym("Region")))
    #> Purpose * (State/Region)
    

    Or equivalently (and more compactly) for your always nested example:

    purrr::reduce(syms(c("L1",'L2','L3')), call2, .fn = "/")
    #> L1/L2/L3
    

    Created on 2022-07-29 by the reprex package (v2.0.1)

    These expressions can then be used with aggregate_key() using !! once again.


    Why didn't !!as.name(levels) work?

    as.name() produces a name (in rlang/tidyverse this is known as a 'symbol'), not an expression. A name/symbol can be thought of as a name of an object, or the name of a variable in the data. Using !!as.name(levels) will try to produce an aggregation of a column named "L1/L2/L3", not a nested hierachy of columns "L1", "L2", and "L3". For this, you need an expression.