I'm trying to use torchrl's SyncDataCollector with a DQN I implemented myself in torch. As the DQN uses Conv2d and Linear Layer I have to calculate the correct size for the input of the first Linear Layer, the size
param in the following net
class PixelDQN(nn.Module):
def __init__(self, input_shape, n_actions) -> None:
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=4, stride=2),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, stride=1),
nn.ReLU(),
nn.Flatten(),
)
size = self.conv(torch.zeros(1, *input_shape)).size()[-1]
self.fc_adv = nn.Sequential(
NoisyLinear(size, 256),
nn.ReLU(),
NoisyLinear(256, n_actions),
)
self.fc_val = nn.Sequential(
NoisyLinear(size, 256),
nn.ReLU(),
NoisyLinear(256, 1)
)
def forward(self, x: torch.Tensor):
print(x.shape)
conv = self.conv(x)
print(conv.shape)
adv = self.fc_adv(conv)
val = self.fc_val(conv)
outp = val + (adv - adv.mean(dim=1, keepdim=True))
return outp
is responsible for that. As you can see I expect batched inputs as I will use a replay buffer and sample a batch from that.
I wrap that DQN in the following way and then use the SyncDataCollector:
n_obs = [4,84,84]
n_act = 6
agent = QValueActor(
module=PixelDQN(n_obs, n_act), in_keys=["pixels"], spec=env.action_spec
)
policy_explore = EGreedyModule(
env.action_spec, eps_end=EPS_END, annealing_num_steps=ANNEALING_STEPS
)
agent_explore = TensorDictSequential(
agent, policy_explore
)
collector = SyncDataCollector(
env,
agent_explore,
frames_per_batch=FRAMES_PER_BATCH,
init_random_frames=INIT_RND_STEPS,
postproc=MultiStep(gamma=GAMMA, n_steps=N_STEPS)
)
This however fails as the SyncDataCollector doesn't batch the obs from the env before giving them to the DQN so size
calc gets wrong and the Linear layer get a wrong input dimension.
RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x49 and 3136x256)
I already tried to set buffer=True
in SyncDataCollector. I also tried to use
agent_explore = TensorDictSequential(
UnsqueezeTransform(0, allow_positive_dim=True), agent, policy_explore
)
as this was kinda suggested by ChatGPT, however it didn't seem to have any effect.
I also tried the UnsqueezeTransform
in my env creation, but that didn't work either, my env looks like this:
def make_env(env_name: str):
return TransformedEnv(
GymEnv(env_name, from_pixels=True),
Compose(
RewardSum(),
EndOfLifeTransform(),
NoopResetEnv(noops=30),
ToTensorImage(),
Resize(84, 84),
GrayScale(),
FrameSkipTransform(frame_skip=4),
CatFrames(N=4, dim=-3),
)
)
I could pull the size
calc into the forward pass of my PixelDQN and check the size of the input tensor to adapt the calc, but this seems like a weird thing to do, since it would mean I'd need to run the size calc at each single forward pass.
I found the solution, I changed to UnsqueezeTransform(-4, in_keys=["pixels"])
within agent_explore and now I have the wanted behaviour ... (: