I currently have a dataset with multiple features, where each row is a time-series and each column is a time step. For example:
How should I re-shape the data so that I can properly represent the sequential information when I use a pytorch LSTM?
Currently I’ve left it the way it is, transformed the features into tensors and wrapped it inside a variable and reshaped it using this code:
X_train_tensors = Variable(torch.Tensor(X_train), requires_grad=True)
X_test_tensors = Variable(torch.Tensor(X_test), requires_grad=True)
y_train_tensors = Variable(torch.Tensor(y_train), requires_grad=True)
y_test_tensors = Variable(torch.Tensor(y_test))
Final Shape Looks like:
torch.Size([num_rows, 1, num_features])
The LSTM runs fine, however, I’m worried that I’ve not captured the sequential nature of the dataset by keeping it at this orientation? Should I have made every row a time-sequence and the columns a time-series? And in that case what would the final shape look like and how could I transform that using pytorch tools?
There's no point using a LSTM with your current configuration. LSTMs are useful for processing variable length sequences. If the number of features is set and your tensors are all of size (num_rows, 1, num_features)
, you can squeeze that to (num_rows, num_features)
and put them through a MLP.
If you want to use a LSTM-type approach, you would do something like this:
(num_rows, num_features)
where all the features are integer values (I'm inferring this from your spreadsheet example)nn.Embedding
layer to get tensors of size (num_rows, num_features, d_features)
(num_rows, num_features, d_features)
through the LSTMThat said, if the number of features for your input is fixed, there's no need to use a LSTM. LSTMs are used when you have to process variable length sequences.
As an aside, it looks like you're using the Variable
syntax which was depreciated several years ago - you should check out the current documentation for pytorch.