pythonmachine-learningscikit-learnlinear-regressionstatsmodels

Linear Regression Coefficients


I am currently using statsmodels (although I would also be happy to use Scikit) to create a linear regression. On this particular model I am finding that when adding more than one factor to the model, the OLS algorithm spits out wild coefficients. These coefficients are both extremely high and low, which seems to optimise the algorithm by averaging out. It results in all of the factors being statistically insignificant. I am just wondering if there is a way that I can put an upper or lower limit on the coefficients such that the OLS has to optimize within these new boundaries?


Solution

  • I don't know if you can set a condition to OLS such that the absolute value of the coefficients are all less than a constant.

    Regularization is a good alternative to this kind of problem though. Basically, L1 or L2 regularization penalize the sum of the coefficients in the optimization function, which pushes the coefficients of the least significant variables close to zero so they don't raise the value of the cost function.

    Take a look at lasso, ridge and elastic net regression. They use L1, L2 and both forms of regularization respectively.

    You can try the following in statsmodels:

    # Import OLS
    from statsmodels.regression.linear_model import OLS
    
    # Initialize model
    reg = OLS(endog=y, exog=X)
    
    # Fit model
    reg = reg.fit_regularized()