pythonmachine-learningtpot

TPOT error in python cannot set using a slice indexer with a different length


I'm trying to run tpot to optimize hyperparameters of a random forest using genetic algorithms. I am receiving an error and am not quite sure how to fix it. Below is the essential code I'm using.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from tpot import TPOTClassifier

X = my_df_features
y = my_df_target

X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=42)

model_parameters = {'n_estimators': [100,200],
                      "max_depth" : [None, 5, 10],
                      "max_features" : [10]}  

# This seems to work perfectly fine when I run it
# model_tuned = GridSearchCV(RandomForestClassifier(),model_parameters, cv=5)

# This does not seem to work 
model_tuned = TPOTClassifier(generations= 2, population_size= 2, offspring_size= 2,
                                      verbosity= 2, early_stop= 10,
                                      config_dict=
                                      {'sklearn.ensemble.RandomForestClassifier': model_parameters}, 
                                      cv = 5)
 

model_tuned.fit(X_train,y_train)

When using TPOT (as opposed to RandomForest), the last line above produces the following error:

ValueError: cannot set using a slice indexer with a different length than the value"

Solution

  • I tried tpot with the iris dataset and I did get no error

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.model_selection import RandomizedSearchCV
    from sklearn.model_selection import train_test_split
    from tpot import TPOTClassifier
    from sklearn import datasets
    iris = datasets.load_iris()
    X = iris.data 
    y = iris.target
    
    X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=42)
    
    model_parameters = {'n_estimators': [100,200],
                          "max_depth" : [None, 5, 10],
                          "max_features" : [len(X_train[0])]}  
    
    
    model_tuned = TPOTClassifier(generations= 2, 
                                 population_size= 2, 
                                 offspring_size= 2,
                                 verbosity= 2, 
                                 early_stop= 10, 
                                 config_dict={'sklearn.ensemble.RandomForestClassifier': 
                                 model_parameters}, 
                                 cv = 5)
     
    
    model_tuned.fit(X_train,y_train)
    

    I think there is something wrong in the shape or type of your dataset

    Maybe due to the fact that you are using pandas DataFrames

    Try to do this:

    X = X.to_numpy
    y = y.to_numpy