[SOLVED] Why does a LSTM pytorch model yield constant values?

I am training a LSTM model with data from yfinance. The process is really standard. I get the data with yf.download(ticker=ticker) where ticker='AAPL and do df.rolling(30, min_periods=1) to smooth the data. Then I adapt the data for training like this:

def create_ds_for_forecasting(df, window_range):
    df_values = df.copy()
    X, y = [], []
    for i in np.arange(0, len(df_values)-window_range-1):
        X.append(df_values[i:i+window_range])
        y.append(df_values[i+1:i+window_range+1])
    return torch.Tensor(np.array(X)).to(device), torch.Tensor(np.array(y)).to(device)

Next, I train the following model using nn.SmoothL1Loss as criterion and Adam as optimizer.

from torch import nn

class ModeloLSTM(nn.Module):
    
    def __init__(self, num_layers, hidden_size, input_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.lstm = nn.LSTM(
            input_size=self.input_size,
            num_layers=self.num_layers,
            hidden_size=self.hidden_size,
            batch_first=True
        ).to(device)
        self.fc = nn.Linear(hidden_size, 1).to(device)
        self.tanh = nn.Tanh()

        
    def forward(self, x):
        # Dynamically initialize hidden state per batch
        if self.batch_size != 0:
            h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
            c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size, device=x.device)
        elif self.batch_size == 0:
            h0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
            c0 = torch.zeros(self.num_layers, self.hidden_size, device=x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out)  # last timestep [:, -1, :]
        out = self.tanh(out)
        return out

Everything turns normal. And these are the train + test results.

If you are wondering whether I trained with test data as well, I didn't. These are the train and test loops.

## TRAIN LOOP

loader = DataLoader(TensorDataset(X_train, y_train), shuffle=True, batch_size=64, drop_last=True)

num_epochs = 5
for epoch in range(num_epochs):
    for inputs, label in loader:
        outputs = modelo(inputs)
        loss = criterion(outputs, label)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

## TEST LOOP

y_pred = []
i = 0
loader = DataLoader(X_test, batch_size=batch_size)
with torch.no_grad():
    for x_batch in loader:
        #for i in range(0, X_train.shape[0], batch_size):
        #x_batch, y_batch = X_train[i:i+batch_size,:,:], y_train[i:i+batch_size,:]
        y_pred_i = modelo(x_batch)[:, -1, :]
        y_pred.append(y_pred_i)
        
y_pred = torch.cat(y_pred, axis=0)

Now, here comes the issue. I save the model weights and load them on a new unbatched instance of the original model where c0 and h0 have shapes of (num_layers, hidden_szie), all by using model.load_state_dict(modelo.state_dict()) where model has batch size equal to zero. Then, I use this loop to make predictios for the future.

days_to_simulate = 3*3*window_range # 3 months
input_data = df_test[-window_range:]
input_data = torch.Tensor(input_data).to(DEVICE)

model = ModeloLSTM(num_layers=1, hidden_size=50, input_size=1, batch_size=0)
model.load_state_dict(modelo.state_dict())
model.eval()

with torch.no_grad():
    seq_prediction = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
    
    for i in range(0, days_to_simulate):
        if i < window_range:
            input_data = torch.cat((input_data[-window_range+i:,:], seq_prediction), dim=0)
        elif i >= window_range:
            input_data = seq_prediction[-window_range:]
        next_pred = torch.Tensor(model(input_data))[-1,:].unsqueeze(-1).to(DEVICE)
        seq_prediction = torch.cat((seq_prediction, next_pred), dim=0)

starting_dates = pd.date_range(start=df.index[-window_range], periods=window_range)
predicted_dates = pd.date_range(start=df.index[-1], periods=days_to_simulate+1)

starting_series = pd.Series(df[-window_range:].values.flatten(), index=starting_dates)
predicted_series = pd.Series(scaler.inverse_transform(seq_prediction.detach().cpu().numpy()).flatten(), index=predicted_dates)


plt.figure(figsize=(12, 6))
plt.plot(starting_series.index, starting_series.values.flatten(), linestyle='-', label='Datos Reales')
plt.plot(predicted_series.index, predicted_series.values.flatten(), linestyle='-', label='Predicción')
plt.title('Predicción de Precios de Acciones')
plt.xlabel('Fecha')
plt.ylabel('Precio')
plt.legend()
plt.show()

However, for some unknown reason, this new unbatched model with the exact same weights converges to a stationary value like shown in this resulting plot.

Final results, the predicted values in orange seem to always converge to a fixed value. Why is this? The batching should not affect the performance of a model like this. By the way, I am actually doing this process to deploy the model in a custom personal app by downloading the weights, instantiating a new model class and finally reload the original model state into the newer one.

EDIT

I realized, maybe a bit too late, that I am using x.size(0) to define the batch size in h0 and c0, which means the batch size is inferred and if I fed the network with a data of different batch size, it would still work. Meaning that I could feed data with shape (1, window_range, features). However, that doesn't explain why it stabilizes into this a constante value. It seems to me that this might be related to the fact that I am doing future predictions with the models own predictions.

That changes the question. How can I avoid a LSTM from stabilizing itself onto a singular constant value when using their own predictions?