I'm currently working on solving a VRP with Optaplanner/Timefold. We have the requirement to not only optimize towards low overall tour time/distance but also towards high vehicle utilization.
Our goal for e.g. 4 Vehicles would look like this schematic, where x
stands for a load (note that this schematic is just for visualizing the question and the actual model is more realistic ;) ):
Vehicle 1 |x|x|x|
Vehicle 2 |x|x|x|
Vehicle 3 |x| | |
Vehicle 4 | | | |
So basically we would like to use as few vehicles with as close to full utilization as possible (which is a hard optimization problem in itself I guess). However, we are currently sometimes getting results a little more like this:
Vehicle 1 |x|x| |
Vehicle 2 |x|x| |
Vehicle 3 |x|x| |
Vehicle 4 |x| | |
What we've decided to try for now, is penalizing vehicles with deviations from full utilization linearly until a certain threshold and then give a bigger soft constant soft penalty to entities (except for unused vehicles). However, I think this might just move the vehicle utilizations towards said threshold and then be kind of random again.
Another idea we've had is just giving a penalty to every vehicle in use.
Are there other and better approaches to achieve the first result with a higher probability?
There are many ideas how this could be achieved. What is surprising is that you seem to be getting well load-balanced vehicles out of the box; typically there need to be constraints that take care of load-balancing. In this case, what you seem to need is an actual anti-load-balancing constraint. Take a look at our article on measuring unfairness, which could give you some pointers.
Sufficiently penalizing every vehicle in use, as you suggest, should certainly help with keeping vehicle use down.