reinforcement-learningpolicydeterministicstable-baselines

Does "deterministic = True" make sense in box, multi-binary or multi-discrete environments?


Using Stable Baselines 3:

Given that deterministic=True always returns the action with the highest probability, what does that mean for environments where the action space is "box", "multi-binary" or "multi-discrete" where the agent is supposed to select multiple actions at the same time? How does deterministic=True work in these environments / does it work at all in the way it is supposed to?

The question is partly based on this question about

What does "deterministic=True" in stable baselines3 library means?

and potentially related to another question from me

Reinforcement learning deterministic policies worse than non deterministic policies


Solution

  • All that deterministic does is returns a mode of a distribution instead of a sample

        def get_actions(self, deterministic: bool = False) -> th.Tensor:
            """
            Return actions according to the probability distribution.
            :param deterministic:
            :return:
            """
            if deterministic:
                return self.mode()
            return self.sample()
    

    It does not matter what action space you use. From math perspective there is always one action being taken in RL. The fact that your action space "looks" multi-dimensional just makes the actual action space exponentially large, that's all. So depending on specific agent, what will happen is that you will often have either independent distribution per action group (e.g. a separate head in a neural network), and thus each group will get its "most likely action", or if you had a more advanced neural network one could parametrise a full joint distribution with say an autoregressive model etc.

    In short, yes it makes the same sense as it would make in other action space, the question is more in how you parametrise the policy, and with naive parametrisation things are less expressive, but in practise use in many agents without any issues.