Hi am running the following model with statsmodel and it works fine.
from statsmodels.formula.api import ols
from statsmodels.iolib.summary2 import summary_col #for summary stats of large tables
time_FE_str = ' + C(hour_of_day) + C(day_of_week) + C(week_of_year)'
weather_2_str = ' + C(weather_index) + rain + extreme_temperature + wind_speed'
model = ols("activity_count ~ C(city_id)"+weather_2_str+time_FE_str, data=df)
results = model.fit()
print summary_col(results).tables
print 'F-TEST:'
hypotheses = '(C(weather_index) = 0), (rain=0), (extreme_temperature=0), (wind_speed=0)'
f_test = results.f_test(hypotheses)
However, I do not know how to formulate the hypthosis for the F-test if I want to include the categorical variable C(weather_index)
. I tried all for me imaginable versions but I always get an error.
Did someone face this issue before?
Any ideas?
F-TEST:
Traceback (most recent call last):
File "C:/VK/scripts_python/predict_activity.py", line 95, in <module>
f_test = results.f_test(hypotheses)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\statsmodels\base\model.py", line 1375, in f_test
invcov=invcov, use_f=True)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\statsmodels\base\model.py", line 1437, in wald_test
LC = DesignInfo(names).linear_constraint(r_matrix)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\design_info.py", line 536, in linear_constraint
return linear_constraint(constraint_likes, self.column_names)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 391, in linear_constraint
tree = parse_constraint(code, variable_names)
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 225, in parse_constraint
return infix_parse(_tokenize_constraint(string, variable_names),
File "C:\Users\Niko\Anaconda2\envs\gl-env\lib\site-packages\patsy\constraint.py", line 184, in _tokenize_constraint
Origin(string, offset, offset + 1))
patsy.PatsyError: unrecognized token in constraint
(C(weather_index) = 0), (rain=0), (extreme_temperature=0), (wind_speed=0)
^
The methods t_test, wald_test and f_test are for hypothesis test on the parameters directly and not for a entire categorical or composite effect.
Results.summary() shows the parameter names that patsy created for the categorical variables. Those can be used to create contrast or restrictions for the categorical effects.
As alternative anova_lm directly computes the hypothesis test that a term,e.g. A categorical variable has no effect.