I am trying to build an isolation forest for a csv file I have predicting 'pages' from various size values. The 'pages' value are currently 'low' and 'high' and I have encoded them to be 0 and 1 so that I can detect anomalies. However, I keep getting the error ' File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/ensemble/_iforest.py", line 312, in fit 100. * self.contamination) TypeError: unsupported operand type(s) for *: 'float' and 'type''
I have attached the code below, thank you so much for your help!
label_encoder = LabelEncoder()
integer_encoded=label_encoder.fit_transform(values)
print(integer_encoded)
print(len(integer_encoded))
df['pages']= integer_encoded
X = df.iloc[:, 0:101].values
y = df['pages']
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
model = IsolationForest(n_estimators = 50, max_samples = 'auto', contamination = float)
model.fit(df[['pages']])
So SciKit is open-source, you can see the file you need here: https://github.com/scikit-learn/scikit-learn/blob/8feb04524caf78c6a84b56fc59917a0eadcb69c2/sklearn/ensemble/_iforest.py
_iforest.py
The line in question is line 283 on latest as of June 11th 2020
You can then look above at the init
of this file and see that contamination
is an argument in the constructor. If you don't pass it, it uses auto
which defaults the value to 0.5
You need to make sure you pass a float as the contamination
value when you initialize the forest
EDIT: note that you don't need to pass anything into the constructor/initialization for contamination
because it has that default value