I want to VIF analysis on a dataset df. Here, X is the subsetset of df with only the independant variables.
This is my code:
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif=pd.DataFrame()
vif["feature"]=X.columns
vif["value"]=[variance_inflation_factor(X.values,i) for i in range(len(X.columns))]
It is showing an error message:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'
Entire error message: https://pastebin.com/Bn103xjP
I searched the web, but could not find a similar error. I expected it to run smoothly, because I wrote this code exactly as it is from a book.
It seems you had a boolean data in the dataframe you provided, specifically the last 3 columns (stories_one
, stories_two
, stories_three
). The error comes from numpy.isfinite()
, you can check the function here. I think you can just remove the non-numeric columns and if it still doesn't work, cast the datatype to float
as mentioned here: Python Numpy TypeError: ufunc 'isfinite' not supported for the input types .