[SOLVED] Predictive Coder in Tensorflow? Can it be done efficiently?

Predictive Coder in Tensorflow? Can it be done efficiently?

For a compression application I am trying to implement the most simplest differential pulse-code modulation (DPCM). If you do not know DPCM, it is just a differential encoding scheme, where you quantize the prediction error of a predictor and send the quantized prediction error to the decoder who can invert the process. So basically, in the most simplest case, you do

 e(n) = x(n) - xhat(n-1)

with xhat(n) being the reconstructed sample, the you quantize e(n) and reconstruct x(n) according to

xhat(n) = xhat(n-1) + Q(e(n))

where Q denotes the quantizer.I implemented this in tensorflow, however, the resulting code is extremly slow due to the for loop that I believe is necessary, which conflicts with vectorization, which I am not sure can be done here. My current code is

class DPCM(tf.keras.Model):
    def __init__(self, **kwargs):
        super(DPCM, self).__init__(**kwargs)
        self.quantizer = None




    def quantize(self,x):
        x_np = x.numpy().astype(np.float32) 
        x_np_q = self.quantizer.cluster_centers_[self.quantizer.predict(x_np),:]
        
        return x_np_q
    
    def SetQuantizer(self, quantizer, bypass=False):
        self.quantizer = quantizer

        
    # @tf.function
    def call(self, inputs):

        if self.quantizer is not None: 
            reconstructed =  tf.TensorArray(tf.float32, size = tf.shape(inputs)[1], dynamic_size=True)

            last_sample = tf.zeros(shape=(tf.shape(inputs)[0],1,tf.shape(inputs)[2]))
            for i in range(tf.shape(inputs)[1]):
                pred_error = inputs[:,i,:] - last_sample
                pred_error_q = tf.py_function(self.quantize, [pred_error[:,0,:]], tf.float32)
                pred_error_q = tf.expand_dims(pred_error_q, axis=0)

                reconstructed = reconstructed.write(i, pred_error_q + last_sample)
                
                last_sample = reconstructed.read(i)

            
            out = tf.transpose(reconstructed.stack(),[1,2,0,3])
            out = tf.squeeze(out,axis=0)
            return out

        else: 
            return inputs

inputs is of shape [batchsize, 3999, 8]. The quantizer is just scikits kmeans algorithm's codebook after I fit it to the raw prediction errors. This code works, but is EXTREMELY slow. Is it possible to speed it up somehow? Recurrent neural networks are implemented in tensorflow without a problem apparently, so I guess it must be possible to do it way faster.

Solution

Ok, I was in a programmer tunnel. It might perhaps never be efficient on a GPU due to their optimization for certain operations. However, if I switch to CPU using with tf.device('/cpu:0'): the speed drastically improved to an expected level. However, training still is surprisingly slow for such a small model that I am using (< 10000 neurons)