reinforcement-learningstable-baselines

Is it possible to use a stable-baselines model as the baseline for another model?


I recently trained a stable-baselines PPO model for a couple of days and it is performing well on test environments. Essentially, I am trying to iterate on this model. I was wondering if it is possible to use this model as a new baseline for future model training. So, instead of starting with some naive policy for my environment, it could use this model as a starting spot and potentially learn a better approach to solving the environment.


Solution

  • The answer is yes. You basically need to do following things to achieve this:

    1. Save your PPO model after the training in any environment you used from stable-baseline repository.
    2. Load the saved PPO model for your new environment training rather than create a new PPO model. In this case, the starting spot will be the trained policy and the policy will evolve from there for better.

    One thing you might be interested in is Transfer Learning, which basically systematically does what you want to do (train a model in one environment and use that as a baseline for another new environment to save training time for the second environment). The key to make it work is make sure there is high similarity between the two environments. If they are totally different, the pre-trained policy from the first environment may not help much.

    In addition, the network architecture of the PPO is the same when you do this. In reality, the optimal network architecture for different environments are different.