train_data.shape
(11458167, 10)
# define the input shape
feature_num=10 # number of features
timesteps=1
# Reshape the input to shape (num_instances, timesteps, num_features)
train_data_reshaped = train_data.reshape(train_data.shape[0], timesteps, feature_num)
test_data_reshaped=test_data.reshape (test_data.shape[0], timesteps, feature_num)
I have a dataset of 10 features and I want to try different time steps values because the value 1 will no capture the sequence in my data, however, when I change the time steps value I got this error:
ValueError: cannot reshape array of size 114581670 into shape (11458167,10,10)
Can you explain to me why this error is happening and how can I solve it ?
Try different Time steps to find the optimal value
You cannot reshape
while keeping the dimensions of num_instances
and features
the same as before. Correct would be train_data.reshape(-1, timesteps, features)
. But this only works, if the number of instances can be divided by time steps without rest.
Furthermore, there are two types of windows that you can create. Non-overlapping windows that simply reshape the data as mentioned above. Sliding windows or overlapping windows where we slide over the data. Thereby, a data point can be contained in multiple windows.
However, you do not need to do this yourself. I have written a small utility library called mlnext-framework that contains such functionality. The temporalize
method is a wrapper around numpy.lib.stride_tricks.sliding_window_view for generating non-overlapping (reshape) and overlapping (sliding) windows.
Given some data:
>>> import numpy as np
>>> import mlnext
>>> i, j = np.ogrid[:6, :3]
>>> data = 10 * i + j
>>> print(data)
[[ 0 1 2]
[10 11 12]
[20 21 22]
[30 31 32]
[40 41 42]
[50 51 52]]
Non-Overlapping windows:
>>> # Transform 2d data into 3d
>>> mlnext.temporalize(data=data, timesteps=2, verbose=True)
Old shape: (6, 3). New shape: (3, 2, 3).
[[[ 0 1 2]
[10 11 12]]
[[20 21 22]
[30 31 32]]
[[40 41 42]
[50 51 52]]]
As you can see, each data point is contained in exactly one window. If the original shape could not be evenly divided by time steps, then the superfluous data points (at the end) would be discarded.
Sliding windows:
>>> # Transform 2d into 3d with stride=1
>>> mlnext.temporalize(data, timesteps=3, stride=1, verbose=True)
Old shape: (6, 3). New shape: (4, 3, 3).
[[[ 0 1 2]
[10 11 12]
[20 21 22]]
[[10 11 12]
[20 21 22]
[30 31 32]]
[[20 21 22]
[30 31 32]
[40 41 42]]
[[30 31 32]
[40 41 42]
[50 51 52]]]
As you can see, the second window starts with [10 11 12]
which is the second data point overall. The step size can be configured with stride
.