I have a custom Boid flocking environment in OpenAI Gym using PPO from StableBaselines3. I wanted it to achieve flocking similar to Reynold's model (Video) or close enough, but it ISN'T learning.
My results after 100000 timesteps of training:
I have adjusted the calculate_reward
my model uses to be similar in reward, to encourage Reynold's model like behavior, but can't see any apparent improvement.
Ran it for 2Million, I am able to see that they all now just move away.
Two insights, Training time was too less and reward function needs to be modified.
2Million Training, 3000 steps run https://drive.google.com/file/d/10-VSBmoxZfyO_KTS2a-7VWIWQSwggg9A/view?usp=drive_link