[SOLVED] PPO Boid agent not learning

I have a custom Boid flocking environment in OpenAI Gym using PPO from StableBaselines3. I wanted it to achieve flocking similar to Reynold's model (Video) or close enough, but it ISN'T learning.

Reynold's model

My Code

My results after 100000 timesteps of training:

My boids

TensorBoard

I have adjusted the calculate_reward my model uses to be similar in reward, to encourage Reynold's model like behavior, but can't see any apparent improvement.

Ran it for 2Million, I am able to see that they all now just move away.

Two insights, Training time was too less and reward function needs to be modified.

2Million Training, 3000 steps run https://drive.google.com/file/d/10-VSBmoxZfyO_KTS2a-7VWIWQSwggg9A/view?usp=drive_link