pythontpot

TPOT taking too long to train


Ive been trying to use tpot for the first time on a dataset that has approximately 7000 rows, when trying to train tpot on the training dataset which is 25% of the dataset as a whole, tpot takes too long. ive been running the code for approximately 45 minutes on google colab and the optimization progress is still at 4%. Ive just been trying to use the example as seen on :http://epistasislab.github.io/tpot/examples/. Is it typical for tpot to take this long, because so far i dont think its worth even trying to use it


Solution

  • TPOT can take quite a long time depending on the dataset you have. You have to consider what TPOT is doing: TPOT is evaluating thousands of analysis pipelines and fitting thousands of ML models on your dataset in the background, and if you have a large dataset, then all that fitting can take a long time--especially if you're running it on a less powerful computer.

    If you'd like faster results, you have a few options:

    1. Use the "TPOT light" configuration, which uses simpler models and will run faster.

    2. Set the n_jobs parameter to -1 or a number greater than 1, which will allow TPOT to evaluate pipelines in parallel. -1 will use all of the available cores and speed things up significantly if you have a multicore machine.

    3. Subsample the data using the subsample parameter. The default is 1.0, corresponding to using 100% of your training data. You can subsample to lower percentages of the data and TPOT will run faster.