pythonnumpytensorflowmachine-learningtf-agent

tf agent shape of reward, step_type and discount


I am trying to train an agent with tf agent based on the code in this tutorial. I am currently customizing the py_environment for my own use. Other than the code related to the environment, the rest of the code is exactly the same as it is in tutorial.

def compute_avg_return(environment, policy, num_episodes=10):

  total_return = 0.0
  for _ in range(num_episodes):

    time_step = environment.reset()
    episode_return = 0.0

    while not time_step.is_last():
      action_step = policy.action(time_step) # <----- error on this line
      time_step = environment.step(action_step.action)
      episode_return += time_step.reward
    total_return += episode_return

  avg_return = total_return / num_episodes
  return avg_return.numpy()[0]

compute_avg_return(eval_env, random_policy, num_eval_episodes)

I got the following error when runnning the above code for the first time:

ValueError: Received a mix of batched and unbatched Tensors, or Tensors are not compatible with Specs.  num_outer_dims: 1.
Saw tensor_shapes:
   TimeStep(
{'discount': TensorShape([1]),
 'observation': TensorShape([1, 50, 30]),
 'reward': TensorShape([1]),
 'step_type': TensorShape([1])})
And spec_shapes:
   TimeStep(
{'discount': TensorShape([]),
 'observation': TensorShape([1, 50, 30]),
 'reward': TensorShape([]),
 'step_type': TensorShape([])})

From the error log, my shape for observation is correct, but still it says that they are not compatible. So I think the problem is on the shape of 'discount', 'reward' and 'step_type'?

But what should I do about it? I can't find anything that show me how to define/alter the shape of these attributes.


Solution

  • Wow, turns out it was the type of my observation that was incorrect. I found it by adding print statements inside the validation utils library.

    This happened because I was using the pandas Dataframe.to_numpy(), and it outputs np.float64 instead of float32.