pythonkerasconv-neural-networkmini-batch

Custom CNN mini-batch (Keras, TF) to avoid repeated measurements in training/testing


I am currently building a 1D-CNN for classification. The predictors are spectra (X-matrix with 779 features), and the dependent variable contains two classes.

However, the X-matrix contains repeated measurements (series of 15-20 replicates). It is crucial that during training repeated measurements are not included both in the sets for training and evaluation of the loss function. Is there a way to build "custom" mini-batches which would avoid this?


Solution

  • You should try using data generators.

    A DataGenerator is an object that takes as input the X_train and y_train matrices and put the samples into batches following some criterion. It can also be used to handle large volumes of data that cannot be loaded at once on the virtual memory.

    Here is an example on how to implement one !

    Basically get_item will give you your next batch so that's the place to implement all the conditions you might need.

    import numpy as np
    import keras
    
    class DataGenerator(keras.utils.Sequence):
        'Generates data for Keras'
        def __init__(self, X, labels, batch_size=32, dim=(32,32,32), n_channels=1,
                     n_classes=10, shuffle=True):
            'Initialization'
            self.dim = dim
            self.batch_size = batch_size
            self.labels = labels
            self.X = X
            self.n_channels = n_channels
            self.n_classes = n_classes
            self.shuffle = shuffle
            self.on_epoch_end()
    
        def __len__(self):
            'Denotes the number of batches per epoch'
            return int(np.floor(len(self.X) / self.batch_size))
    
        def __getitem__(self, index):
            'Generate one batch of data'
            # Generate indexes of the batch to make sure samples dont repeat
            list_IDs_temp = ... your code
    
            # Generate data
            X, y = self.__data_generation(list_IDs_temp)
    
            return X, y
    
        def on_epoch_end(self):
            'Updates indexes after each epoch'
            self.indexes = np.arange(len(self.X))
            if self.shuffle == True:
                np.random.shuffle(self.indexes)
    
        def __data_generation(self, list_IDs_temp):
            'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
            # Initialization
            X = np.empty((self.batch_size, *self.dim, self.n_channels))
            y = np.empty((self.batch_size), dtype=int)
    
            # Generate data
            for i, ID in enumerate(list_IDs_temp):
                # Store sample
                X[i,] = self.X[ID,]
    
                # Store class
                y[i] = self.labels[ID]
    
            return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
    

    Source: This