rdplyrtidybackquote

How to take vector content as backquote variable in tidyr/dplyr


I have the following data frame, at it works as I wanted with this code:

df <- structure(list(celltype = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), .Label = c("Bcells",
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils",
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"),
    sample = c("SP ID control", "SP ID treated", "SP ID control",
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control",
    "SP ID treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545,
    0.104788051104575, 0.125247035158472, -0.159665650045289,
    -0.134662049979712, 0.196249441751866, 0.212256889027029,
    0.0532668251890109, 0.0738264693971133, 0.151828478029596,
    0.159941552142933, -0.14128323638966, -0.120556640790534,
    0.196518649474078, 0.185264282171863, 0.0654641151966543,
    0.0837989059507186, 0.145111577618456, 0.145448549866796)), .Names = c("celltype",
"sample", "mean(score)"), row.names = c(7L, 8L, 17L, 18L, 27L,
28L, 37L, 38L, 47L, 48L, 57L, 58L, 67L, 68L, 77L, 78L, 87L, 88L,
97L, 98L), class = "data.frame")

library(tidyr)
library(dplyr)

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = `SP ID treated` / `SP ID control`)
df

##          celltype SP ID control SP ID treated pairwise_division
## 1          Bcells    0.16095354    0.15574347         0.9676300
## 2  DendriticCells    0.10478805    0.12524704         1.1952416
## 3     Macrophages   -0.15966565   -0.13466205         0.8434003
## 4       Monocytes    0.19624944    0.21225689         1.0815668
## 5         NKCells    0.05326683    0.07382647         1.3859746
## 6     Neutrophils    0.15182848    0.15994155         1.0534358
## 7       StemCells   -0.14128324   -0.12055664         0.8532976
## 8    StromalCells    0.19651865    0.18526428         0.9427313
## 9        abTcells    0.06546412    0.08379891         1.2800739
## 10       gdTCells    0.14511158    0.14544855         1.0023222

Note there, the line

mutate(pairwise_division = `SP ID treated` / `SP ID control`)

uses back quote of a string.

What I wanted to do then is to take those quoted values from the list. I tried this:

content <- c("SP ID treated" , "SP ID control")    

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = content[1] / content[2])
df

But it gave me this error:

Error: non-numeric argument to binary operator

What's the right way to do it?


Solution

  • If you want to play with strings for parameters, you are doing to have to use mutate_() rather than mutate(). For example

    df %>% spread(sample, `mean(score)`) %>% 
        mutate_(.dots=list(pairwise_division = 
            substitute(a/b, list(
            a=as.name(content[1]), 
            b=as.name(content[2]))
        )))
    

    The as.name() will make sure you get a valid variable name.