Getting:
assert q_values.shape == (len(state_batch), self.nb_actions)
AssertionError
q_values.shape <class 'tuple'>: (1, 1, 10)
(len(state_batch), self.nb_actions) <class 'tuple'>: (1, 10)
which is from the keras-rl library of the sarsa agent:
rl.agents.sarsa.SARSAAgent#compute_batch_q_values
batch = self.process_state_batch(state_batch)
q_values = self.model.predict_on_batch(batch)
assert q_values.shape == (len(state_batch), self.nb_actions)
Here is my code:
class MyEnv(Env):
def __init__(self):
self._reset()
def _reset(self) -> None:
self.i = 0
def _get_obs(self) -> List[float]:
return [1] * 20
def reset(self) -> List[float]:
self._reset()
return self._get_obs()
model = Sequential()
model.add(Dense(units=20, activation='relu', input_shape=(1, 20)))
model.add(Dense(units=10, activation='softmax'))
logger.info(model.summary())
policy = BoltzmannQPolicy()
agent = SARSAAgent(model=model, nb_actions=10, policy=policy)
optimizer = Adam(lr=1e-3)
agent.compile(optimizer, metrics=['mae'])
env = MyEnv()
agent.fit(env, 1, verbose=2, visualize=True)
Was wondering if someone can explain to me how the dimensions should be set up and how it works with the libraries? I'm putting in a list of 20 inputs, and want an output of 10.
Let first build a simple toy environment first
[1,1,0,1,1,0,1,1,0]
0
: Move to next block of maze, 1
: Hop over then next block, i.e skip the next and move to the one next to the next block of maze To implement our env in gym we need to implement 2 methods
class FooEnv(gym.Env):
def __init__(self):
self.maze = [1,1,0,1,1,0,1,1,0]
self.curr_state = 0
self.action_space = spaces.Discrete(2)
self.observation_space = spaces.Discrete(1)
def step(self, action):
if action == 0:
self.curr_state += 1
if action == 1:
self.curr_state += 2
if self.curr_state >= len(self.maze):
reward = 0.
done = True
else:
if self.maze[self.curr_state] == 0:
reward = 0.
done = True
else:
reward = 1.
done = False
return np.array(self.curr_state), reward, done, {}
def reset(self):
self.curr_state = 0
return np.array(self.curr_state)
Now given the current state we want NN to predict the action to be taken.
0
or `1model = Sequential()
model.add(Dense(units=16, activation='relu', input_shape=(1,)))
model.add(Dense(units=8, activation='relu'))
model.add(Dense(units=2, activation='softmax'))
policy = BoltzmannQPolicy()
agent = SARSAAgent(model=model, nb_actions=2, policy=policy)
optimizer = Adam(lr=1e-3)
agent.compile(optimizer, metrics=['acc'])
env = FooEnv()
agent.fit(env, 10000, verbose=1, visualize=False)
# Test the trained agent using
# agent.test(env, nb_episodes=5, visualize=False)
Output
Training for 10000 steps ...
Interval 1 (0 steps performed)
10000/10000 [==============================] - 54s 5ms/step - reward: 0.6128
done, took 53.519 seconds
If your environment is a Grid (2D) say if size n X m
then the input size of NN will be (n,m)
like below and flatten it before passing to the Dense layers
model.add(Flatten(input_shape=(n,m))
Check this example from keras-rl docs