pythonpython-3.xregressionlinear-regressionpatsy

Python way to automatically test interaction effects in OLS


In R you can essentially write model='Lottery ~ (Literacy + Wealth + Region)^k' and get every k-way combination of those variables.

statsmodels supports some R style OLS regressions but they don't seem to support the ^k syntax. I have a large dataset, large enough where it is prohibitive to the practice of manually trying combinations of variables, and am essentially looking for a way to automate the interaction effect search.


Solution

  • Formulas are handled by patsy and not by statsmodels directly.

    According to patsy documentation using power (a + b + c + d) ** 3 works for interaction effects of categorical variables.

    See section for ** in https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-language

    Aside: power in Python is ** and not ^