dataframefunctionpurrrsapplyoptim

Use OPTIM() to MUTATE an additional column to an existing dataframe or tibble


I would like to add an additional column to an existing dataframe by mutating the optimized result from the OPTIM() function. The code works when I strip the dataframe down to 1 row, but gives the following error when there are 2 rows:

Caused by error in optim(): ! objective function in optim evaluates to length 2 not 1


library(tidyverse)
library(dbplyr)


hfcn <- function(b, D, U, U2, U3){
            Din = ((1 - D)^-b - 1) / b
            rsd_2 = ((U / (1 + b*Din*2)^(1/b) - U2)^2)^0.5
            rsd_3 = ((U / (1 + b*Din*3)^(1/b) - U3)^2)^0.5
            rst_tot = rsd_2 + rsd_3
            return(rst_tot)
          }

#################################
####   If I create dataframe A with a single row, the code works, but fails when there are 2 rows ####

# A <- data.frame(ID = c("A1")
#                 , U = c(28844)
#                 , D = c(0.7941582)
#                 , U2 = c(3417)
#                 , U3 = c(2465)
#                 )
#################################

A <- data.frame(ID = c("A1", "A2")
                , U = c(72625, 28844)
                , D = c(0.7785440, 0.7941582)
                , U2 = c(7916, 3417)
                , U3 = c(5409, 2465)
                )



A2 <- mutate(A
             , C = optim(par = 1.1
                          , hfcn
                          , D = A$D
                          , U = A$U
                          , U2 = A$U2
                          , U3 = A$U3
                          , method = "BFGS"
                          #, method = "L-BFGS-B"#, lower = 0, upper = 3
                        )[1]
             )

Error in mutate(): ! Problem while computing C = ...[]. Caused by error in optim(): ! objective function in optim evaluates to length 2 not 1 Backtrace:

  1. dplyr::mutate(...)
  2. stats::optim(...)

I have successfully run the code when I limit the database to a single row. I can run both rows independently through the function and mutate and additional column using OPTIM(), so the code and function inputs check out. I suspect I may have to use a function from purrr but I cannot find anything online that helps me with this problem.


Solution

  • Short answer

    You need to pass these values row-wise to optim and not the entire vector:

    A2 <- A %>% 
      rowwise() %>% 
      mutate(C = optim(par = 1.1
                       , hfcn
                       , D = D
                       , U = U
                       , U2 = U2
                       , U3 = U3
                       , method = "BFGS"
                       #, method = "L-BFGS-B"#, lower = 0, upper = 3
      )[1]
      ) %>% 
      ungroup()
    

    Note, in the tidyverse when you reference a variable you do not need to use $, as references are understood to be from the data frame, unless specified otherwise.

    Long answer

    When you pass your function a vector of arguments it will return a value for each set of constants (so your function is vectorized):

    hfcn(b = c(.5, .5), 
         U = c(72625, 28844), 
         D = c(0.7785440, 0.7941582), 
         U2 = c(7916, 3417), 
         U3 = c(5409, 2465))
    [1] 2654.797 2043.005
    

    However you are trying to find an optima. So you need to set those values to be constant one set at a time:

    optim(par = 1.1, 
          hfcn, 
          D = 0.7785440,
          U = 72625,
          U2 = 7916, 
          U3 = 5409, 
          method = "BFGS"
    )
    

    You are passing the arguments as a vector, but optim is not vectorized. It will not find the optima for each set of constants (e.g., hfcn(U = 72625, D = 0.7785440, U2 = 7916, U3 = 5409) then hfcn(U = 28844, D = 0.7941582, U2 = 3417, U3 = 2465))-- this is what the error is saying:

    optim(par = 1.1, 
          hfcn, 
          D = c(0.7785440, 0.7941582),
          U = c(72625, 28844),
          U2 = c(7916, 3417), 
          U3 = c(5409, 2465), 
          method = "BFGS"
    )
    Error in optim(par = 1.1, hfcn, D = c(0.778544, 0.7941582), U = c(72625,  : 
      objective function in optim evaluates to length 2 not 1
    

    This is likely why it worked when you only did one row of data. If you had three rows you would get the same error but it would say evaluates to length 3 not 1.

    Additional notes

    The above code returns C as list-column, which is not usually what is expected. So you might try:

    A2 <- A %>% 
      rowwise() %>% 
      mutate(C = unlist(optim(par = 1.1, 
                              hfcn,
                              D = D, 
                              U = U, 
                              U2 = U2,
                              U3 = U3, 
                              method = "BFGS")[1])
      ) %>% 
      ungroup()
    
      ID        U     D    U2    U3     C
      <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
    1 A1    72625 0.779  7916  5409 0.805
    2 A2    28844 0.794  3417  2465 1.11 
    

    Also depending on the size of your data set, rowwise is very inefficient so it could take some time to run. This is normally combatted by transforming your data into a long-format and not using rowwise.