I am training a LSTM model with data from yfinance. The process is really standard. I get the data with yf.download(ticker=ticker) where ticker='AAPL and do df.rolling(30, min_periods=1) to smooth the data. Then I adapt the data for training like this:
def create_ds_for_forecasting(df, window_range):
df_values = df.copy()
X, y = [], []
for i in np.arange(0, len(df_values)-window_range-1):
X.append(df_values[i:i+window_range])
y.append(df_values[i+1:i+window_range+1])
return torch.Tensor(np.array(X)).to(device), torch.Tensor(np.array(y)).to(device)
Next, I train the following model using nn.SmoothL1Loss as criterion and Adam as optimizer.
from torch import nn
class ModeloLSTM(nn.Module):
def __init__(self, num_layers, hidden_size, input_size, batch_size):
super().__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.num_layers = num_layers
self.batch_size = batch_size
self.lstm = nn.LSTM(
input_size=self.input_size,
num_layers=self.num_layers,
hidden_size=self.hidden_size,
batch_first=True
).to(device)
self.fc = nn.Linear(hidden_size, 1).to(device)
self.tanh = nn.Tanh()
def forward(self, x):
# Dynamically initialize hidden state per batch
if self.batch_size != 0:
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
elif self.batch_size == 0:
h0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
c0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out) # last timestep [:, -1, :]
out = self.tanh(out)
return out
Everything turns normal. And these are the train + test results.
If you are wondering whether I trained with test data as well, I didn't. These are the train and test loops.
## TRAIN LOOP
loader = DataLoader(TensorDataset(X_train, y_train), shuffle=True, batch_size=64, drop_last=True)
num_epochs = 5
for epoch in range(num_epochs):
for inputs, label in loader:
outputs = modelo(inputs)
loss = criterion(outputs, label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
## TEST LOOP
y_pred = []
i = 0
loader = DataLoader(X_test, batch_size=batch_size)
with torch.no_grad():
for x_batch in loader:
#for i in range(0, X_train.shape[0], batch_size):
#x_batch, y_batch = X_train[i:i+batch_size,:,:], y_train[i:i+batch_size,:]
y_pred_i = modelo(x_batch)[:, -1, :]
y_pred.append(y_pred_i)
y_pred = torch.cat(y_pred, axis=0)
Now, here comes the issue. I save the model weights and load them on a new unbatched instance of the original model where c0 and h0 have shapes of (num_layers, hidden_szie), all by using model.load_state_dict(modelo.state_dict()) where model has batch size equal to zero. Then, I use this loop to make predictios for the future.
days_to_simulate = 3*3*window_range # 3 months
input_data = df_test[-window_range:]
input_data = torch.Tensor(input_data).to(DEVICE)
model = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=0)
model.load_state_dict(modelo.state_dict())
model.eval()
with torch.no_grad():
seq_prediction = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
for i in range(0, days_to_simulate):
if i < window_range:
input_data = torch.cat((input_data[-window_range+i:,:], seq_prediction), dim=0)
elif i >= window_range:
input_data = seq_prediction[-window_range:]
next_pred = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
seq_prediction = torch.cat((seq_prediction, next_pred), dim=0)
starting_dates = pd.date_range(start=df.index[-window_range], periods=window_range)
predicted_dates = pd.date_range(start=df.index[-1], periods=days_to_simulate+1)
starting_series = pd.Series(df[-window_range:].values.flatten(), index=starting_dates)
predicted_series = pd.Series(scaler.inverse_transform(seq_prediction.detach().cpu().numpy()).flatten(), index=predicted_dates)
plt.figure(figsize=(12, 6))
plt.plot(starting_series.index, starting_series.values.flatten(), linestyle='-', label='Datos Reales')
plt.plot(predicted_series.index, predicted_series.values.flatten(), linestyle='-', label='Predicción')
plt.title('Predicción de Precios de Acciones')
plt.xlabel('Fecha')
plt.ylabel('Precio')
plt.legend()
plt.show()
However, for some unknown reason, this new unbatched model with the exact same weights converges to a stationary value like shown in this resulting plot.
Final results, the predicted values in orange seem to always converge to a fixed value. Why is this? The batching should not affect the performance of a model like this. By the way, I am actually doing this process to deploy the model in a custom personal app by downloading the weights, instantiating a new model class and finally reload the original model state into the newer one.
I realized, maybe a bit too late, that I am using x.size(0) to define the batch size in h0 and c0, which means the batch size is inferred and if I fed the network with a data of different batch size, it would still work. Meaning that I could feed data with shape (1, window_range, features). However, that doesn't explain why it stabilizes into this a constante value. It seems to me that this might be related to the fact that I am doing future predictions with the models own predictions.
That changes the question. How can I avoid a LSTM from stabilizing itself onto a singular constant value when using their own predictions?
After doing a lot of research, I realized that the issue has to do with the use of LSTM.
LSTM and RNN are critized for begin bad precisely at predicting future values of a sequence and often used for predicting intermediate values in voice recognition or sentiment analysis.
Futher research showed me that, for forecasting, it is recommended to use Seq2Seq models like an LSTM encoder-to-decoder or attention based models that don't rely on autoregression.