[SOLVED] Running pytorch model at inference, i.e. with batch_size==1 and not the batch

Running pytorch model at inference, i.e. with batch_size==1 and not the batch_size on trained with

I have trained a Pytorch model and now I want to use it. To make things easier here, I just used a linear model that takes a tensor of the form (250,120,8); 250 is the batch size and 120,8 is the size of my data sample. Now I want to use the model and just input a tensor of size (120,8) and get my results, but that doesn't work, the model always wants a tensor of size (250,120,8). I looked in the Pytorch library but they didn't write anything about it.

--> So how can I use my model at inference, when I am only interested in one data sample?

hparams = {}
hparams['num_features'] = 8
hparams['seq_len'] = 120
model = transformer.linear(hparams=hparams)
model.load_state_dict(torch.load(file_path))
model.eval()
# this is ok
print(model(torch.rand(250,120,8),'cpu').shape) # returns (250,desired_output_size)
# but I want to do this 
model(torch.rand(120,8),'cpu') # here I get an error message

Solution

Most layers in pytorch assume the first dimension is batch dimension. You can unsqueeze a unitary dimension onto your input to make it have shape [1, ...]. This doesn't copy any data so it should not add any noticeable latency.

x = torch.rand(120,8)    # shape [120, 8]
x = x.unsqueeze(0)       # shape [1, 120, 8]
y = model(x,'cpu')       # shape [1, desired_size]
y = y.squeeze(0)         # shape [desired_size]

Of course you can do the squeeze/unsqueeze all in one line if you want.

x = torch.rand(120,8)
y = model(x.unsqueeze(0), 'cpu').squeeze(0)