pytorchreinforcement-learning

When using TensorDictPrioritizedReplayBuffer, should i store "td_error" field in TensorDict data?


Let's say you are gonna train DDPG or any algorithm that use Prioritized Replay Buffer. When using torchrl TensorDictPrioritizedReplayBuffer, after you calculate td_error, you gonna use it to call TensorDictPrioritizedReplayBuffer.update_tensordict_priority().

My question is, when calling TensorDictPrioritizedReplayBuffer.extend() or TensorDictPrioritizedReplayBuffer.add() to the replay buffer, should the TensorDict data contain td_error field or not?

Since maybe you can just calculate td_error when trying to update the network, so just calculate it when needed without storing it in the TensorDict data. Thus maybe not storing the td_error field might work.

But if i need to store the td_error, then it imply i will need to calculate Temporal Difference Error every time i need to add data/entry to the replay buffer, which is not really efficient processing wise when i think about it.

My main concern is whether the sampler will work as it should when the td_error is not stored inside the TensorDict data. Especially if you want to dumps the replay buffer into storage using LazyMemmapStorage. I think the sampler (PrioritizedSampler) use it's own priority tracking using sum tree, so it might just work, but i'm not sure.

Anyway, thank you for your answer.


Solution

  • My question is, when calling TensorDictPrioritizedReplayBuffer.extend() or TensorDictPrioritizedReplayBuffer.add() to the replay buffer, should the TensorDict data contain td_error field or not?

    No, you should add that later (after the loss) then call rb.update_priority

    My main concern is whether the sampler will work as it should when the td_error is not stored inside the TensorDict data. Especially if you want to dumps the replay buffer into storage using LazyMemmapStorage. I think the sampler (PrioritizedSampler) use it's own priority tracking using sum tree, so it might just work, but i'm not sure.

    You're right to be concerned but in practice the importance has a default value (something like 1.0) that makes "new" data likely to be sampled.