I want to train an LSTM-based RNN model for binary classification and for that I wanted to use tensorflow keras model with LSTM layers. In order to do so, I need testing input and output as well as validation input and output, which I wanted to generate with sklearns train_test_split.
def prepare_data(self, satellites):
"""
Prepare time-series data for RNN.
"""
feature_sequences = []
labels = []
for sat in satellites:
if sat.manoeuvrability is not None:
# Stack the orbital parameters as time-series features (epochs will be the time dimension)
features = np.column_stack((
sat.apoapses,
sat.periapses,
sat.inclinations,
sat.mean_motions,
sat.eccentricities,
sat.semimajor_axes,
sat.orbital_energy
))
feature_sequences.append(features)
labels.append(sat.manoeuvrability)
X = np.array(feature_sequences, dtype=object)
y = np.array(labels)
return train_test_split(X, y, test_size=0.2, random_state=42)
train_test_split returns me a None-Value. Removing the dtype=object cast in the argument leads me to an
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (73,) + inhomogeneous part.
How do I properly form my features vector for sklearns train_test_split if I want to pass literal timeseries as arguments? The time-dependence is important in my case, so I really cant work around with manually breaking down time series to the average or something
I simplified your code to this:
def prepare_data():
feature_sequences = []
labels = []
for i in range(10):
features = np.column_stack((2*i*5, "hello")) # wrong?
# features = (2*i*5, "hello") # correct
feature_sequences.append(features)
labels.append(i)
X = np.array(feature_sequences, dtype=object)
y = np.array(labels)
return train_test_split(X, y, test_size=0.2, random_state=42)
The returned split for the features in a 3D array which it shouldn't be. Just replace the line with column_stack
and the resulting split looks better.