I would like to add an additional column to an existing dataframe by mutating the optimized result from the OPTIM() function. The code works when I strip the dataframe down to 1 row, but gives the following error when there are 2 rows:
Caused by error in optim()
:
! objective function in optim evaluates to length 2 not 1
library(tidyverse)
library(dbplyr)
hfcn <- function(b, D, U, U2, U3){
Din = ((1 - D)^-b - 1) / b
rsd_2 = ((U / (1 + b*Din*2)^(1/b) - U2)^2)^0.5
rsd_3 = ((U / (1 + b*Din*3)^(1/b) - U3)^2)^0.5
rst_tot = rsd_2 + rsd_3
return(rst_tot)
}
#################################
#### If I create dataframe A with a single row, the code works, but fails when there are 2 rows ####
# A <- data.frame(ID = c("A1")
# , U = c(28844)
# , D = c(0.7941582)
# , U2 = c(3417)
# , U3 = c(2465)
# )
#################################
A <- data.frame(ID = c("A1", "A2")
, U = c(72625, 28844)
, D = c(0.7785440, 0.7941582)
, U2 = c(7916, 3417)
, U3 = c(5409, 2465)
)
A2 <- mutate(A
, C = optim(par = 1.1
, hfcn
, D = A$D
, U = A$U
, U2 = A$U2
, U3 = A$U3
, method = "BFGS"
#, method = "L-BFGS-B"#, lower = 0, upper = 3
)[1]
)
Error in mutate()
:
! Problem while computing C = ...[]
.
Caused by error in optim()
:
! objective function in optim evaluates to length 2 not 1
Backtrace:
I have successfully run the code when I limit the database to a single row. I can run both rows independently through the function and mutate and additional column using OPTIM(), so the code and function inputs check out. I suspect I may have to use a function from purrr but I cannot find anything online that helps me with this problem.
Short answer
You need to pass these values row-wise to optim
and not the entire vector:
A2 <- A %>%
rowwise() %>%
mutate(C = optim(par = 1.1
, hfcn
, D = D
, U = U
, U2 = U2
, U3 = U3
, method = "BFGS"
#, method = "L-BFGS-B"#, lower = 0, upper = 3
)[1]
) %>%
ungroup()
Note, in the tidyverse when you reference a variable you do not need to use $
, as references are understood to be from the data frame, unless specified otherwise.
Long answer
When you pass your function a vector of arguments it will return a value for each set of constants (so your function is vectorized):
hfcn(b = c(.5, .5),
U = c(72625, 28844),
D = c(0.7785440, 0.7941582),
U2 = c(7916, 3417),
U3 = c(5409, 2465))
[1] 2654.797 2043.005
However you are trying to find an optima. So you need to set those values to be constant one set at a time:
optim(par = 1.1,
hfcn,
D = 0.7785440,
U = 72625,
U2 = 7916,
U3 = 5409,
method = "BFGS"
)
You are passing the arguments as a vector, but optim
is not vectorized. It will not find the optima for each set of constants (e.g., hfcn(U = 72625, D = 0.7785440, U2 = 7916, U3 = 5409)
then hfcn(U = 28844, D = 0.7941582, U2 = 3417, U3 = 2465)
)-- this is what the error is saying:
optim(par = 1.1,
hfcn,
D = c(0.7785440, 0.7941582),
U = c(72625, 28844),
U2 = c(7916, 3417),
U3 = c(5409, 2465),
method = "BFGS"
)
Error in optim(par = 1.1, hfcn, D = c(0.778544, 0.7941582), U = c(72625, :
objective function in optim evaluates to length 2 not 1
This is likely why it worked when you only did one row of data. If you had three rows you would get the same error but it would say evaluates to length 3 not 1
.
Additional notes
The above code returns C
as list-column, which is not usually what is expected. So you might try:
A2 <- A %>%
rowwise() %>%
mutate(C = unlist(optim(par = 1.1,
hfcn,
D = D,
U = U,
U2 = U2,
U3 = U3,
method = "BFGS")[1])
) %>%
ungroup()
ID U D U2 U3 C
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1 72625 0.779 7916 5409 0.805
2 A2 28844 0.794 3417 2465 1.11
Also depending on the size of your data set, rowwise
is very inefficient so it could take some time to run. This is normally combatted by transforming your data into a long-format and not using rowwise
.