I want to gradually decrease the clip_range (epsilon, exploration vs. exploitation parameter) throughout training in my PPO model.
I have tried to simply run "model.clip_range = new_value", but this doesn't work.
In the docs here , it says "clip_range (Union[float, Callable[[float], float]]) – Clipping parameter, it can be a function of the current progress remaining (from 1 to 0)."
Does anyone know how to actually change this parameter during training, or how to input "a function of the current progress remaining"?
I've solved the issue.
You need to have a slightly funky setup where a function outputs another function. At this link , they give the following example:
def linear_schedule(initial_value):
"""
Linear learning rate schedule.
:param initial_value: (float or str)
:return: (function)
"""
if isinstance(initial_value, str):
initial_value = float(initial_value)
def func(progress):
"""
Progress will decrease from 1 (beginning) to 0
:param progress: (float)
:return: (float)
"""
return progress * initial_value
return func
So essentially, what you have to do is write a function, myscheduler(), which doesn't necessarily need inputs, and you need the output of that function to be another function which has "progress" (measured from 1 to 0 as training goes on) to be the only input. That "progress" value will be passed to the function by PPO itself. So, I suppose the "under the hood" order of events is something like:
In my case, I wrote something like this:
def lrsched():
def reallr(progress):
lr = 0.003
if progress < 0.85:
lr = 0.0005
if progress < 0.66:
lr = 0.00025
if progress < 0.33:
lr = 0.0001
return lr
return reallr
Then, you use that function in the following way:
model = PPO(...learning_rate=lrsched())