In a regression framework, suppose we have two independent variables x1
and x2
and we want different slopes depending on x1>0
or x1<0
, and same with x2
. This sort of model is used in the computation of the dual beta, if you need an entry point to the literature.
This topic has been presented at crossvalidated site (Link), so now I am trying to code it. My first attemp is using statsmodels
which is a classic linear regression model:
import numpy as np
import statsmodels.api as sm
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)
# Fit and summarize OLS model
mod = sm.OLS(spector_data.endog, spector_data.exog)
res = mod.fit()
print(res.summary())
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.4639 0.162 2.864 0.008 0.132 0.796
x2 0.0105 0.019 0.539 0.594 -0.029 0.050
x3 0.3786 0.139 2.720 0.011 0.093 0.664
const -1.4980 0.524 -2.859 0.008 -2.571 -0.425
==============================================================================
How would be possible to implement the positive and negative effect assuming it is asymetric so we want to quantify it?(dual beta coeffcient)
As an expected format output we would have something like (fictitious values for the sake of exemplification):
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1+ 0.1031 0.162 2.864 0.008 0.132 0.796
x1- 0.4639 0.162 2.864 0.008 0.132 0.796
x2+ 0.0111 0.019 0.539 0.594 -0.029 0.050
x2- 0.212 0.019 0.539 0.594 -0.029 0.050
x3 0.3786 0.139 2.720 0.011 0.093 0.664
const -1.4980 0.524 -2.859 0.008 -2.571 -0.425
==============================================================================
From research, there are at least two posibilities.
Split variables regarding X>0|X<0 which is related with the provided link in the topic:
df["GPA+"] = (df["GPA"] >= 0) * df["GPA"]
df["GPA-"] = (df["GPA"] < 0) * df["GPA"]
When having time attribute, the dual beta can be considered from the increment/decrement of the variable trought the time, this is about differencing the columns so the estimate can be computed for both concepts.
df["diff_GPA"] = df["GPA"].diff(period=1)
df["diff_GPA+"] = (df["diff_GPA"] >= 0) * df["GPA"]
df["diff_GPA-"] = (df["diff_GPA"] < 0) * df["GPA"]
In both cases depending on the nature of the dataset a dual beta can be computed with this feature engineering step. Som the estimates can be interpreted with the OLS.