[SOLVED] Understanding policy and value functions reinforcement learning