machine-learningreinforcement-learningreward

Reward Function in MIT Deep Traffic Challenge?


I have been playing around with the MIT DeepTraffic Challenge Also watching the lecture and reading the slides

After getting a General understanding of the architecture I was wondering what exactly the reward function given by the Environment is.

  1. Is it the same as the Input of the gridcell (max. drivable Speed)?
  2. And are they using Reward Clipping, or not?

I also found this javascript Codebase, which does not really help my understanding either.


Solution

  • The reward is scaled average speed within the interval: [-3, 3].

    The implementation of the deeptraffic environment locates in this file: https://selfdrivingcars.mit.edu/deeptraffic/gameopt.js

    I'm trying to make it readable. Here's the WIP one: https://github.com/mljack/deeptraffic/blob/master/gameopt.js

        var reward = (avgSpeedMeasurement - 60) / 20;