[SOLVED] Linear regression asymmetric coeffcient

Linear regression asymmetric coeffcient - dual beta in python

In a regression framework, suppose we have two independent variables x1 and x2 and we want different slopes depending on x1>0 or x1<0, and same with x2. This sort of model is used in the computation of the dual beta, if you need an entry point to the literature.

This topic has been presented at crossvalidated site (Link), so now I am trying to code it. My first attemp is using statsmodels which is a classic linear regression model:

import numpy as np
import statsmodels.api as sm

spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

# Fit and summarize OLS model
mod = sm.OLS(spector_data.endog, spector_data.exog)

res = mod.fit()
print(res.summary())

==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.4639      0.162      2.864      0.008       0.132       0.796
x2             0.0105      0.019      0.539      0.594      -0.029       0.050
x3             0.3786      0.139      2.720      0.011       0.093       0.664
const         -1.4980      0.524     -2.859      0.008      -2.571      -0.425
==============================================================================

How would be possible to implement the positive and negative effect assuming it is asymetric so we want to quantify it?(dual beta coeffcient)

As an expected format output we would have something like (fictitious values for the sake of exemplification):

==============================================================================
              coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1+            0.1031      0.162      2.864      0.008       0.132       0.796
x1-            0.4639      0.162      2.864      0.008       0.132       0.796
x2+            0.0111      0.019      0.539      0.594      -0.029       0.050
x2-            0.212       0.019      0.539      0.594      -0.029       0.050
x3             0.3786      0.139      2.720      0.011       0.093       0.664
const         -1.4980      0.524     -2.859      0.008      -2.571      -0.425
==============================================================================

Solution

From research, there are at least two posibilities.

Split variables regarding X>0|X<0 which is related with the provided link in the topic:

df["GPA+"] = (df["GPA"] >= 0) * df["GPA"]

df["GPA-"] = (df["GPA"] < 0) * df["GPA"]
When having time attribute, the dual beta can be considered from the increment/decrement of the variable trought the time, this is about differencing the columns so the estimate can be computed for both concepts.

df["diff_GPA"] = df["GPA"].diff(period=1)

df["diff_GPA+"] = (df["diff_GPA"] >= 0) * df["GPA"]

df["diff_GPA-"] = (df["diff_GPA"] < 0) * df["GPA"]

In both cases depending on the nature of the dataset a dual beta can be computed with this feature engineering step. Som the estimates can be interpreted with the OLS.