pythontensorflowregressiondata-generation

how to create a dataset for multi-output regression with sliding window approach


I want to build normal DNN model, I have huge data with X_train= 8000000x7 and y_train=8000000x2. How to create a dataset with sliding window of 100 data points to feed the neural network.

If I use a customized dataset using following code, I have a problem of allocation due to large dataset.

def data_set(x_data, y_data, num_steps=160):
    X, y = list(), list()
    # Loop of the entire data set
    for i in range(x_data.shape[0]):
        # compute a new (sliding window) index
        end_ix = i + num_steps
        # if index is larger than the size of the dataset, we stop
        if end_ix >= x_data.shape[0]:
            break
        # Get a sequence of data for x
        seq_X = x_data[i:end_ix]
        # Get only the last element of the sequency for y
        seq_y = y_data[end_ix]
        # Append the list with sequencies
        X.append(seq_X)
        y.append(seq_y)
    # Make final arrays
    x_array = np.array(X)
    y_array = np.array(y)
    return x_array, y_array

So, in order to avoid this is there any dataset generator I can use with sliding window for feeding into DNN.

Thanks in advance


Solution

  • You can use dataset.window method to achieve that.

    dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
    stride = 1
    dataset = dataset.window(batch_size, shift=batch_size-stride, drop_remainder=True)