scikit-learnisolation

scikit learn error - 100. * self.contamination) TypeError: unsupported operand type(s) for *: 'float' and 'type'


I am trying to build an isolation forest for a csv file I have predicting 'pages' from various size values. The 'pages' value are currently 'low' and 'high' and I have encoded them to be 0 and 1 so that I can detect anomalies. However, I keep getting the error ' File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/ensemble/_iforest.py", line 312, in fit 100. * self.contamination) TypeError: unsupported operand type(s) for *: 'float' and 'type''

I have attached the code below, thank you so much for your help!

label_encoder = LabelEncoder()
integer_encoded=label_encoder.fit_transform(values)
print(integer_encoded)
print(len(integer_encoded))
df['pages']= integer_encoded
X = df.iloc[:, 0:101].values
y = df['pages']
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
model = IsolationForest(n_estimators = 50, max_samples = 'auto', contamination = float)
model.fit(df[['pages']])

Solution

  • So SciKit is open-source, you can see the file you need here: https://github.com/scikit-learn/scikit-learn/blob/8feb04524caf78c6a84b56fc59917a0eadcb69c2/sklearn/ensemble/_iforest.py

    _iforest.py

    The line in question is line 283 on latest as of June 11th 2020

    You can then look above at the init of this file and see that contamination is an argument in the constructor. If you don't pass it, it uses auto which defaults the value to 0.5

    You need to make sure you pass a float as the contamination value when you initialize the forest

    EDIT: note that you don't need to pass anything into the constructor/initialization for contamination because it has that default value