rr-marginaleffects

Does the `by` argument in `avg_comparisons` compute the strata specific marginal effect?


I'm analyzing data from an AB test we just finished running. Our outcome is binary, y, and we have stratified results by a third variable, g.

Because the intervention could vary by g, I've fit a Poisson regression with robust covariance estimation as follows

library(tidyverse)
library(sandwich)
library(marginaleffects)

fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))

From here, I'd like to know the strata specific causal risk ration (which we usually call "lift" in industry). My approach is to use avg_comparisons as follows

avg_comparisons(fit, 
                variables = 'treatment',
                newdata = model_data,
                transform_pre = 'lnratioavg', 
                transform_post = exp, 
                by=c('g'),
                vcov = 'HC')

The result seems to be consistent with calculations of the lift when I filter the data by groups in g.

Question

By passing by=c('g'), am I actually calculating the strata specific risk ratios as I suspect? Is there any hidden "gotchas" or things I have failed to consider?

I can provide data and a minimal working example if need be.


Solution

  • Here’s a very simple base R example to show what is happening under-the-hood:

    library(marginaleffects)
    fit <- glm(carb ~ hp * am, data = mtcars, family = poisson)
    

    Unit level estimates of log ratio associated with a change of 1 in hp:

    cmp <- comparisons(fit, variables = "hp", transform_pre = "lnratio")
    cmp
    # 
    #  Term Contrast Estimate Std. Error   z Pr(>|z|)   2.5 % 97.5 %
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0054     0.0027 2.0    0.047 0.00007 0.0107
    #    hp       +1   0.0054     0.0027 2.0    0.047 0.00007 0.0107
    # --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- 
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    #    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
    # Prediction type:  response 
    # Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am
    

    This is equivalent to:

    # prediction grids with 1 unit difference
    lo <- transform(mtcars, hp = hp - .5)
    hi <- transform(mtcars, hp = hp + .5)
    
    # predictions on response scale
    y_lo <- predict(fit, newdata = lo, type = "response")
    y_hi <- predict(fit, newdata = hi, type = "response")
    
    # log ratio
    lnratio <- log(y_hi / y_lo)
    
    # equivalent to `comparisons()`
    all(cmp$estimate == lnratio)
    # [1] TRUE
    

    Now we take the strata specific means, with mean() inside log():

    by(data.frame(am = lo$am, y_lo, y_hi),
       mtcars$am,
       FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
    # mtcars$am: 0
    # [1] 0.005364414
    # ------------------------------------------------------------ 
    # mtcars$am: 1
    # [1] 0.005566092
    

    Same as:

    avg_comparisons(fit, variables = "hp", by = "am", transform_pre = "lnratio") |>
        print(digits = 7)
    # 
    #  Term Contrast am    Estimate  Std. Error        z   Pr(>|z|)        2.5 %
    #    hp mean(+1)  0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
    #    hp mean(+1)  1 0.005566092 0.001553855 3.582118    < 0.001 2.520592e-03
    #       97.5 %
    #  0.010659317
    #  0.008611592
    # 
    # Prediction type:  response 
    # Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
    

    See the list of transformation functions here: https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations

    The only thing is that by applies the function within stratas.