modelreinforcement-learningpolicymdp

What is the difference between model and policy w.r.t reinforcement learning


Both definition seems to state they are mapping from states to actions then what is the difference or am i wrong ?


Solution

  • This articlce really sums it up for you:
    What is Model-Based Reinforcement Learning?

    To Model or Not to Model

    “Model” is one of those terms that gets thrown around a lot in machine learning (and in scientific disciplines more generally), often with a relatively vague explanation of what we mean. Fortunately, in reinforcement learning, a model has a very specific meaning: it refers to the different dynamic states of an environment and how these states lead to a reward.

    ...The policy is whatever strategy you use to determine what action/direction to take based on your current state/location.

    The overall outcome of Reinforcement learning (or any learning really) is to develop a policy, that is a series of behaviours or actions to take when presented with a specific domain.

    The reinforcement factor is that you continually re-run the learning process based on the results of prior learning, effectively you apply the new policy and learn from the results to improve the policy.

    In Model based Reinforcement learning we use a model to represent the environment or domain, this documents the facts, or states as well as the possible actions. By knowing certain facts the policies can target theses states and actions specifically in each repetition cycle, testing and improving the accuracy of the policy, just as it improves the quality of the model.

    Another way to look at the two is that the model is a record or result of the prior learning, it is the updated view of the environment. The model deals in facts or assumed facts, based on past policy execution results, the model hold the records of past executions, this data can be used to approximate the outcomes of taking certain actions from specific states. The Policy is the actual learnings on the behaviours, where as the model is the facts that back up and confirm our learnings.

    This diagram from the same article simplifies the relationship between model and policy in Reinforcement Learning:

    A flow diagram of model-based RL