rregressionlinear-regressionrestriction

R: Force regression coefficients to add up to 1


I'm trying to run a simple OLS regression with a restriction that the sum of the coefficients of two variables add up to 1.

I want:

Y = α + β1 * x1 + β2 * x2 + β3 * x3,
where β1 + β2 = 1

I have found how to make a relation between coefficients like:

β1 = 2* β2

But I haven't found how to make restrictions like:

β1 = 1 - β2

How would I do it in this simple example?

data <- data.frame(
  A = c(1,2,3,4),
  B = c(3,2,2,3),
  C = c(3,3,2,3),
  D = c(5,3,3,4)
)

lm(formula = 'D ~ A + B + C', data = data)

Thanks!


Solution

  • 1) CVXR We can compute the coefficients using CVXR directly by specifying the objective and constraint. We assume that D is the response, the coefficients of A and B must sum to 1, b[1] is the intercept and b[2], b[3] and b[4] are the coefficients of A, B and C respectively.

    library(CVXR)
    
    b <- Variable(4)
    X <- cbind(1, as.matrix(data[-4]))
    obj <- Minimize(sum((data$D - X %*% b)^2))
    constraints <- list(b[2] + b[3] == 1)
    problem <- Problem(obj, constraints)
    soln <- solve(problem)
    
    bval <- soln$getValue(b)
    bval
    ##            [,1]
    ## [1,]  1.6428605
    ## [2,] -0.3571428
    ## [3,]  1.3571428
    ## [4,] -0.1428588
    

    The objective is the residual sum of squares and it equals:

    soln$value
    ## [1] 0.07142857
    

    2) pracma We can also use the pracma package to compute the coefficients. We specify the X matrix, response vector, the constraint matrix (in this case the vector given as the third argument is regarded as a one row matrix) and the right hand side of the constraint.

    library(pracma)
    
    lsqlincon(X, data$D, Aeq = c(0, 1, 1, 0), beq = 1) # X is from above
    ## [1]  1.6428571 -0.3571429  1.3571429 -0.1428571
    

    3) limSolve This package can also solve for the coefficients of regression problems with constraints. The arguments are the same as in (2).

    library(limSolve)
    lsei(X, data$D, c(0, 1, 1, 0), 1)
    

    giving:

    $X
                        A          B          C 
     1.6428571 -0.3571429  1.3571429 -0.1428571 
    
    $residualNorm
    [1] 0
    
    $solutionNorm
    [1] 0.07142857
    
    $IsError
    [1] FALSE
    
    $type
    [1] "lsei"
    

    4) nls This can be formulated as a problem for nls with the B coefficient equal to one minus the A coefficient.

    nls(D ~ b0 + b1 * A + (1-b1) * B + b2 * C, data, 
      start = list(b0 = 1, b1 = 1, b2 = 1))
    ##     D ~ b0 + b1 * A + (1 - b1) * B + b2 * C
    ##     data: data
    ##      b0      b1      b2 
    ##  1.6429 -0.3571 -0.1429 
    ##  residual sum-of-squares: 0.07143
    ##
    ## Number of iterations to convergence: 1 
    ## Achieved convergence tolerance: 2.803e-08
    

    Check

    We can double check the above by using the lm approach in the other answer:

    lm(D ~ I(A-B) + C + offset(B), data)
    

    giving:

    Call:
    lm(formula = D ~ I(A - B) + C + offset(B), data = data)
    
    Coefficients:
    (Intercept)     I(A - B)            C  
         1.6429      -0.3571      -0.1429  
    

    The I(A-B) coefficient equals the coefficient of A in the original formulation and one minus it is the coefficient of C. We see that all approaches do lead to the same coefficients.