machine-learningdeep-learningpytorchreinforcement-learning

torchrl: Using SyncDataCollector with a custom pytorch dqn


I'm trying to use torchrl's SyncDataCollector with a DQN I implemented myself in torch. As the DQN uses Conv2d and Linear Layer I have to calculate the correct size for the input of the first Linear Layer, the size param in the following net

class PixelDQN(nn.Module):
    def __init__(self, input_shape, n_actions) -> None:
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=4, stride=2),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.Flatten(),
        )
        size = self.conv(torch.zeros(1, *input_shape)).size()[-1]
        self.fc_adv = nn.Sequential(
            NoisyLinear(size, 256),
            nn.ReLU(),
            NoisyLinear(256, n_actions),
        )
        self.fc_val = nn.Sequential(
            NoisyLinear(size, 256),
            nn.ReLU(),
            NoisyLinear(256, 1)
        )

    def forward(self, x: torch.Tensor):
        print(x.shape)
        conv = self.conv(x)
        print(conv.shape)
        adv = self.fc_adv(conv)
        val = self.fc_val(conv)
        outp = val + (adv - adv.mean(dim=1, keepdim=True))
        return outp

is responsible for that. As you can see I expect batched inputs as I will use a replay buffer and sample a batch from that.

I wrap that DQN in the following way and then use the SyncDataCollector:

n_obs = [4,84,84]
n_act = 6

agent = QValueActor(
  module=PixelDQN(n_obs, n_act), in_keys=["pixels"], spec=env.action_spec
)
policy_explore = EGreedyModule(
  env.action_spec, eps_end=EPS_END, annealing_num_steps=ANNEALING_STEPS
)
agent_explore = TensorDictSequential(
  agent, policy_explore
)

collector = SyncDataCollector(
  env,
  agent_explore,
  frames_per_batch=FRAMES_PER_BATCH,
  init_random_frames=INIT_RND_STEPS,
  postproc=MultiStep(gamma=GAMMA, n_steps=N_STEPS)
)

This however fails as the SyncDataCollector doesn't batch the obs from the env before giving them to the DQN so size calc gets wrong and the Linear layer get a wrong input dimension. RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x49 and 3136x256)

I already tried to set buffer=True in SyncDataCollector. I also tried to use

agent_explore = TensorDictSequential(
  UnsqueezeTransform(0, allow_positive_dim=True), agent, policy_explore
)

as this was kinda suggested by ChatGPT, however it didn't seem to have any effect.

I also tried the UnsqueezeTransform in my env creation, but that didn't work either, my env looks like this:

def make_env(env_name: str):
    return TransformedEnv(
        GymEnv(env_name, from_pixels=True),
        Compose(
            RewardSum(),
            EndOfLifeTransform(),
            NoopResetEnv(noops=30),
            ToTensorImage(),
            Resize(84, 84),
            GrayScale(),
            FrameSkipTransform(frame_skip=4),
            CatFrames(N=4, dim=-3),
        )
    )

I could pull the size calc into the forward pass of my PixelDQN and check the size of the input tensor to adapt the calc, but this seems like a weird thing to do, since it would mean I'd need to run the size calc at each single forward pass.


Solution

  • I found the solution, I changed to UnsqueezeTransform(-4, in_keys=["pixels"]) within agent_explore and now I have the wanted behaviour ... (: