roptimizationweighting

Weight optimization to find out least squared error in R


I have actuals and four different models with their prediction & fitted values. With these fitted values, I want to find optimal weights so that (summation(wifi)-actuals)^2 is minimized. Here wi's are the weights I want to find optimally & fi's are the fitted values for each models.

The restrictions I am having for wi's are;

  1. Weights have to be greater than 0,
  2. Weights have to be smaller than 1,
  3. Weights must have sum of 1

I have seen a similar example here [https://stats.stackexchange.com/questions/385372/weight-optimization-in-order-to-maximize-correlation-r] but I could not replicate it for my particular problem.

Let's generate sample data to understand the problem better

actuals <- floor(runif(10, 500,1700)) 
model1_fitted <- floor(runif(10, 600,1800)) 
model2_fitted <- floor(runif(10, 400,1600)) 
model3_fitted <- floor(runif(10, 300,1500)) 
model4_fitted <- floor(runif(10, 300,1200)) 
sample_model <- data.frame(actuals, model1_fitted, model2_fitted,model3_fitted,model4_fitted)

Now, I need to optimally find (w1,w2,w3,w4) so that (summation(wifi)-actuals)^2 is minimized. I want to save the weights, as I mentioned I also have predictions from these four models. If I get the optimal weights, my predicted values for the ensemble model will be a linear function of these weights & predicted values. First predicted value of ensemble will be like below,

ensemble_pred_1 = w1*model1_pred1+w2*model2_pred1+w3*model3_pred1+w4*model4_pred1

Please help me to find optimal wi's so that I can generate the ensemble model as desired.


Solution

  • Frame your problem according to the optimization problem and calculate the required constraints:

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    set.seed(123)
    model1_fitted <- floor(runif(10, 600,1800)) 
    model2_fitted <- floor(runif(10, 400,1600)) 
    model3_fitted <- floor(runif(10, 300,1500)) 
    model4_fitted <- floor(runif(10, 300,1200)) 
    w <- c(0.2,0.3,0.1,0.4) # sample coefficients
    sample_model <- tibble(model1_fitted, model2_fitted,model3_fitted,model4_fitted) %>%
      mutate(actuals= as.vector(as.matrix(.) %*% w)  + rnorm(10,sd=10))
    
    
    X <- as.matrix(sample_model[,1:4])
    y <- as.matrix(sample_model[,5])
    
    # From solve.QP description
    # solving quadratic programming problems of the form min(-d^T b + 1/2 b^T D b) with the constraints A^T b >= b_0.
    
    # Your problem
    # Minimize       || Xw - y ||^2     => Minimize 1/2 w'X'Xw - (y'X)w  => D=X'X , d= X'y
    # Constraint w>0,w<1, sum(w)=1      => A'w >= b0
    
    d <- t(X) %*% y
    D <- t(X) %*% X
    A <- cbind(rep(1,4),diag(4)) #constraint LHS
    b0 <- c(1,numeric(4)) # constraint RHS
    
    library(quadprog)
    soln <- solve.QP(D,d,A,b0,meq = 1)
    w1 <- soln$solution  # Your model wieghts
    w1
    #> [1] 0.20996764 0.29773563 0.07146838 0.42082836
    

    Created on 2019-05-09 by the reprex package (v0.2.1)