I am trying to dynamically form a formula to use in dynlm
. I encounter a behaviour of function
that I do not understand, which can be seen from this code:
library(data.table)
dt_test <- data.table("a"=rnorm(10), "b"=1:5)
dt_test[, .(.(
formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]
The code above is expected to produce (identical) formulas for each value of b
. This formula is enclosed in .(.(...))
to return a list, just so that it can be properly stored in a column from the original data.table.
However, the formula returned does not match the string originally provided, but adds a comma between the +
and tt
, as you can see from the ouput:
b V1
<int> <list>
1: 1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
2: 2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
3: 3 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
4: 4 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
5: 5 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + , tt + tt2
Essentially, it adds a comma where there is none. It does so even re-arranging the terms of the sum, but it stops doing it if I erase q_val
, for example. The same goes for as.formula
.
I would like to understand what is going on and avoid it.
This is just a cosmetic printing issue due to the way R treats long formulas:
If you run:
formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2"))
You will see R will default to printing it to 2 lines, cutting it off at "tt + tt2" (no matter how wide the console is):
#z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
This is somewhat meaningful to the way R cosmetically shows you the formula - if you run deparse
, it will output a character vector of length 2:
deparse(formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")))
# [1] "z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + "
# [2] " tt + tt2"
However, assigning your original code as df_formulas
, you will see that it stores the formula as normal:
df_formulas <- dt_test[, .(.(
formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]
dt_formulas[[2]]
# [[1]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
# <environment: 0x7fa96ff6ffd8>
#
# [[2]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
# tt + tt2
# <environment: 0x7fa96ff6ffd8>
# ....
As you mentioned, this is also why you don't see the comma if you remove some of the variables in the formula code - it has nothing to do with what specifically you are removing, you're simply reducing the length sufficiently to avoid the automatic line break.