rdataframeaggregatecbindtapply

Formatting colnames to be read by cbind


Say I have a list called df such that colnames(df) yields:

"A"           "B"          "C"                 "D"           "E"                 "F" 

I would like aggregate data in the following way:

aggregate(cbind(`C`,`D`,`E`,`F`)~A+B, data = df, FUN = sum)

Of course I could do it "manually" but in my true data I have a very big amount of columns, so I am trying to change the colnames(df)[3:6] output to yield:

`C`,`D`,`E`,`F` 

instead. So far I have tried to use toString(colnames(df)[3:6]) which yields:

"C, D, E, F"

But this is not read properly by cbind.

Any suggestions?


Solution

  • Instead of the cbind you could also use a matrix created from the subsetted data frame.

    aggregate(as.matrix(df[names(df)[3:6]])~A+B, data=df, FUN=sum)
    #       A     B     C     D     E     F
    # 1  0.36 -0.11  2.02  2.29 -0.13 -2.66
    # 2 -0.56  0.40 -0.09  1.30 -0.28 -0.28
    # 3  1.37  0.63  1.51 -0.06 -1.39  0.64
    

    Or, to answer your question literally try

    (ev <- sprintf("cbind(%s)", toString(names(df)[3:6])))
    # [1] "cbind(C, D, E, F)"
    

    I don't think the backticks are needed. Are they?

    And then, of course:

    aggregate(eval(parse(text=ev))~A+B, data=df, FUN=sum)
    #       A     B     C     D    E     F
    # 1 -2.44 -1.78  1.90 -1.76 0.46 -0.61
    # 2  1.32 -0.17 -0.43  0.46 0.70  0.50
    # 3 -0.31  1.21 -0.26 -0.64 1.04 -1.72
    

    Data:

    df <- structure(list(A = c(-2.44, 1.32, -0.31), B = c(-1.78, -0.17, 
    1.21), C = c(1.9, -0.43, -0.26), D = c(-1.76, 0.46, -0.64), E = c(0.46, 
    0.7, 1.04), F = c(-0.61, 0.5, -1.72)), class = "data.frame", row.names = c(NA, 
    -3L))