I am working on a data science project and trying to find the optimal parameters for my project
this is what I want to test but it takes forever and I could not see the output since its been 1 hour.
scores =[]
for k in range(1, 200):
rfc = RandomForestClassifier(n_estimators=k)
rfc.fit(X_train_scaled, y_train)
y_pred3 = rfc.predict(X_test_scaled)
scores.append(accuracy_score(y_test, y_pred3))
import matplotlib.pyplot as plt
%matplotlib inline
# plot the relationship between K and testing accuracy
# plt.plot(x_axis, y_axis)
plt.plot(range(1, 200), scores)
plt.xlabel('Value of n_estimators for Random Forest Classifier')
plt.ylabel('Testing Accuracy')
Is there a solution for speeding up this process. I am using Macbook Pro 14 inc and working on a M1 Pro chip
I can suggest two simple ways to speed up tuning the parameters for a RandomForestClassifier
. First, try enabling multi-threading. The algorithm supports parallel processing, and with the right settings, it can use all available CPU cores.
The second way is to reduce the number of iterations in your loop. Going through values from 1 to 199 can be overkill and take a long time. Instead, you can increase the step size, like using every 5th value, or focus on a more likely range, say, from 10 to 100.
Both options work well together and can significantly cut down the computation time.