I am new with LSTM and ran into a problem. I'm trying to predict a variable using 7 features in time steps of 4. I am working with PyTorch.
From my initial data frame (traindf), I created tensors for every feature and the target (Y) by:
featureX_train = torch.tensor(traindf.featureX[:test].values).view(-1, 4, 1)
Y_train = torch.tensor(traindf.Y[:test].values).view(-1, 4, 1)
...
featureX_test = torch.tensor(traindf.featureX[test:].values).view(-1, 4, 1)
Y_test = torch.tensor(traindf.Y[test:].values).view(-1, 4, 1)
I concatenated all the feature tensors into one X_train and one X_test. All tensors are float32:
print(X_train.shape, Y_train.shape)
print(X_test.shape, Y_test.shape)
torch.Size([24436, 4, 7]) torch.Size([24436, 4, 1])
torch.Size([6109, 4, 7]) torch.Size([6109, 4, 1])
Eventually, I have a train and test data set:
train_dataset = TensorDataset(X_train, Y_train)
test_dataset = TensorDataset(X_test, Y_test)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
Preview of my data:
print(train_dataset[0])
print(test_dataset[0])
(tensor([[ 7909.0000, 8094.0000, 9119.0000, 8666.0000, 17599.0000, 13657.0000,
10158.0000],
[ 7909.0000, 8073.0000, 9119.0000, 8636.0000, 17609.0000, 13975.0000,
10109.0000],
[ 7939.5000, 8083.5000, 9166.5000, 8659.5000, 18124.5000, 13971.0000,
10142.0000],
[ 7951.0000, 8064.0000, 9201.0000, 8663.0000, 17985.0000, 13967.0000,
10076.0000]]), tensor([[41.],
[41.],
[41.],
[41.]]))
(tensor([[ 8411.0000, 8530.0000, 9439.0000, 9101.0000, 17368.0000, 14174.0000,
11111.0000],
[ 8460.0000, 8651.5000, 9579.5000, 9355.5000, 17402.0000, 14509.0000,
11474.5000],
[ 8436.0000, 8617.0000, 9579.0000, 9343.0000, 17318.0000, 14288.0000,
11404.0000],
[ 8519.0000, 8655.0000, 9580.0000, 9348.0000, 17566.0000, 14640.0000,
11404.0000]]), tensor([[59.],
[59.],
[59.],
[59.]]))
My LSTM model:
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super().__init__()
self.lstm = nn.LSTM(input_size, hidden_size)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, x):
x, _ = self.lstm(x)
# x = self.linear(x[:, -1, :])
x = self.linear(x)
return x
model = LSTMModel(input_size=7, hidden_size=32, output_size=1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
model.train()
When I try:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
print(loss)
I get (correctly I assume) Loss: tensor(1318.9419, grad_fn=<MseLossBackward0>)
However, when I run:
for X, Y in train_loader:
optimizer.zero_grad()
Y_pred = model(X)
loss = loss_fn(Y_pred, Y)
# Now apply backward pass
loss.backward()
optimizer.step()
print(loss)
I get: tensor(nan, grad_fn=<MseLossBackward0>)
I have tried normalizing the data:
mean = X.mean()
std = X.std()
X_normalized = (X - mean) / std
Y_pred = model(X_normalized)
But it yields the same result. Why do I yield 'nan' after applying loss.backward()
in such a loop? How can I fix this? Thanks in advance!
My X_train contained few nan values. By removing the matrices with nan values, I solved this issue:
mask = torch.isnan(X_train).any(dim=1).any(dim=1)
X_train = X_train[~mask]
# Do the same for Y_train as it needs to be the same size
Y_train = Y_train[~mask]
# Create the TensorDataset for the training set
train_dataset = TensorDataset(X_train, Y_train)