pythonlinear-regressionstatsmodels

Output of a statsmodels regression


I would like to perform a simple linear regression using statsmodels and I've tried several different methods by now but I just don't get it to work. The code that I have constructed now doesn't give me any errors but it also doesn't show me the result

I am trying to create a model for the variable "Direction" which takes the value 0 if the return for the corresponding date was negative and 1 if it was positive. The explinatory variables are the (5) lags of the returns. The df13 contains the lags and also the direction for each observed date. I tried this code and as I mentioned it doesn't give an error but says " Optimization terminated successfully. Current function value: 0.682314 Iterations 5 <bound method BinaryResults.summary of statsmodels.discrete.discrete_model.LogitResults object at 0x0000021CC267D160"

However, I would like to see the typical table with all the beta values, their significance etc.

Also, what would you say, since Direction is a binary variable may it be better to use a logit instead of a linear model? However, in the assignment it appeared as a linear model.

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

import os

import itertools

from sklearn import preprocessing

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

import statsmodels.api as sm

import matplotlib.pyplot as plt

from statsmodels.sandbox.regression.predstd import wls_prediction_std

...



X = df13[['Lag1', 'Lag2', 'Lag3', 'Lag4', 'Lag5']]
Y = df13['Direction']

X = sm.add_constant(X)


model = sm.Logit(Y.astype(float), X.astype(float)).fit()
predictions = model.predict(X)

print_model = model.summary
print(print_model)

Solution

  • I don't know if this is unintentional, but it looks like you need to define X and Y separately:

    X = df13[['Lag1', 'Lag2', 'Lag3', 'Lag4', 'Lag5']]
    
    Y = df13['Direction']
    

    Secondly, I'm not familiar with statsmodel, but I would try converting your dataframes to numpy arrays. You can do this with

    Xnum = X.to_numpy() 
    
    ynum = y.to_numpy() 
    

    And try passing those to the regressors.