pythonpytorchu8darts

Multiple-series training input is giving NaN loss while same data but One-serie training input is not


I want to train a N-Beats time series model using Darts. I have a time serie DataFrame for each users so I want to use Multiple-Series training but when I feed the list of TimeSeries I directly get NaN as losses during training. If I concatenate all users's TimeSeries into one, I get a normal loss. In both cases the data is scale, fill and cast to float.32

data = scaler.transform(filler.transform(data)).astype(np.float32)

Here is the code that I use combine the list of TimeSeries into a single TimeSeries. I also have a pure Darts code for that but it is much slower for the same result.

SPLIT = 0.8

if concatenate_to_one_ts:
    all_dfs = []
    all_dfs_cov = []

    for i in range(len(list_of_target_ts)):
        all_dfs.append(list_of_target_ts[i].pd_series())
        all_dfs_cov.append(list_of_cov_ts[i].pd_dataframe())
        
    all_dfs = pd.concat(all_dfs)
    all_dfs_cov = pd.concat(all_dfs_cov)
    
    nbr_train_sample = int(len(all_dfs) * SPLIT)

    all_dfs_train = all_dfs[:nbr_train_sample]
    all_dfs_test = all_dfs[nbr_train_sample:]
    
    list_of_target_ts_train = TimeSeries.from_series(all_dfs_train.reset_index(drop=True))
    list_of_target_ts_test = TimeSeries.from_series(all_dfs_test.reset_index(drop=True))
    
    all_dfs_cov_train = all_dfs_cov[:nbr_train_sample]
    all_dfs_cov_test = all_dfs_cov[nbr_train_sample:]
    
    list_of_cov_ts_train = TimeSeries.from_dataframe(all_dfs_cov_train.reset_index(drop=True))
    list_of_cov_ts_test = TimeSeries.from_dataframe(all_dfs_cov_test.reset_index(drop=True))
else:

     nbr_train_sample = int(len(list_of_target_ts) * SPLIT)
     list_of_target_ts_train = list_of_target_ts[:nbr_train_sample]
     list_of_target_ts_test = list_of_target_ts[nbr_train_sample:]
     
     list_of_cov_ts_train = list_of_cov_ts[:nbr_train_sample]
     list_of_cov_ts_test = list_of_cov_ts[nbr_train_sample:]

model = NBEATSModel(input_chunk_length=4,
                    output_chunk_length=1,
                    batch_size=512,
                    n_epochs=5,
                    nr_epochs_val_period=1, 
                    model_name="NBEATS_test",
                    generic_architecture=True,
                    force_reset=True,
                    save_checkpoints=True,
                    show_warnings=True,
                    log_tensorboard=True, 
                    torch_device_str='cuda:0'
                   )

model.fit(series=list_of_target_ts_train, 
          past_covariates=list_of_cov_ts_train, 
          val_series=list_of_target_ts_val, 
          val_past_covariates=list_of_cov_ts_val, 
          verbose=True,
          num_loader_workers=20)

As Multiple-Series training I get: Epoch 0: 8%|██████████▉ | 2250/27807 [03:00<34:11, 12.46it/s, loss=nan, v_num=logs, train_loss=nan.0

As a single serie training I get: Epoch 0: 24%|█████████████████████████▋ | 669/2783 [01:04<03:24, 10.33it/s, loss=0.00758, v_num=logs, train_loss=0.00875]

I am also confused by the number of sample per epoch with the same batch size as from what I read here: https://unit8.com/resources/training-forecasting-models/ the single serie should have more sample as the window size cut is not happening for each Multiple Series.


Solution