I am building a DQN for an Open Gym environment. My observation space is only 1 discrete value but my actions are:
self.action_space = (Discrete(3), Box(-100, 100, (1,)))
ex: [1,56], [0,24], [2,-78]...
My current neural network is:
model = Sequential()
model.add(Dense(24, activation='relu', input_shape=states)) # (1,)
model.add(Dense(24, activation='relu'))
model.add(Dense(2, activation='linear'))
(I copied it from a tutorial that only outputs 1 discrete value in the range [0,1]}
I understand that I need to change the last layer of my neural network but what would it be in my case?
My guess is that the last layer should have 3 binary outputs and 1 continuous output but I don't know if it is possible to have different natures of outputs within the same layer.
As you've already noted in your comment, DQN is not compatible with continuous action spaces because of how DQN works; argmax of "a" for Q(s,a)
- It's impossible to check Q(s,a)
for all a
when a
is continuous.
Having said that, when applying this to Policy Gradient methods (which are compatible with continuous action spaces) you will run into the same issue in your question because with policy gradient you need to provide a probability for each action that you take. Something like this could work:
Take the softmax of the first two outputs which gives you your discrete value and then take the third output which is continuous and this will give you your action. You then need to derive the probability of that action which is given by the combined probability of all of your outputs.