pythonstatsmodelsvariance

Error while using variance_inflation_factor in Python


I want to VIF analysis on a dataset df. Here, X is the subsetset of df with only the independant variables.

This is my code:

from statsmodels.stats.outliers_influence import variance_inflation_factor
vif=pd.DataFrame()
vif["feature"]=X.columns
vif["value"]=[variance_inflation_factor(X.values,i) for i in range(len(X.columns))]

It is showing an error message:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'

Entire error message: https://pastebin.com/Bn103xjP

I searched the web, but could not find a similar error. I expected it to run smoothly, because I wrote this code exactly as it is from a book.


Solution

  • It seems you had a boolean data in the dataframe you provided, specifically the last 3 columns (stories_one, stories_two, stories_three). The error comes from numpy.isfinite(), you can check the function here. I think you can just remove the non-numeric columns and if it still doesn't work, cast the datatype to float as mentioned here: Python Numpy TypeError: ufunc 'isfinite' not supported for the input types .