pythontensorflowkerasnonlinear-optimizationlevenberg-marquardt

keras implementation of Levenberg-Marquardt optimization algorithm as a custom optimizer


I am trying to implement Levenberg-Marquardt algorithm as a Keras optimizer as was described here but I have several problems, biggest one is with this error

TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn.

After quick search I have found out this is connected to how tensorflow is running programs with graphs which I don't understand in details.I have found this answer useful from SO but its about loss function, not optimizer.

So to the point.

My attempt looks like this:

from keras.optimizers import Optimizer
from keras.legacy import interfaces
from keras import backend as K

class Leveberg_Marquardt(Optimizer):
    def __init__(self, tau =1e-2 , lambda_1=1e-5, lambda_2=1e+2, **kwargs):
        super(Leveberg_Marquardt, self).__init__(**kwargs)
        with K.name_scope(self.__class__.__name__):
            self.iterations = K.variable(0, dtype='int64', name='iterations')
            self.tau = K.variable(tau,name ='tau')
            self.lambda_1 = K.variable(lambda_1,name='lambda_1')
            self.lambda_2 = K.variable(lambda_2,name='lambda_2')

    @interfaces.legacy_get_updates_support
    def get_updates(self, loss, params):
        grads = self.get_gradients(loss,params)
        self.updates = [K.update_add(self.iterations,1)]
        error = [K.int_shape(m) for m in loss]
        for p,g,err in zip(params,grads,error):
            H = K.dot(g, K.transpose(g)) + self.tau * K.eye(K.max(g))
            w = p - K.pow(H,-1) * K.dot(K.transpose(g),err) #ended at step 3 from http://mads.lanl.gov/presentations/Leif_LM_presentation_m.pdf
            if self.tau > self.lambda_2:
                w = w - 1/self.tau * err
            if self.tau < self.lambda_1:
                w = w - K.pow(H,-1) * err
            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                w = p.constraint(w)
            self.updates.append(K.update_add(err, w))
        return self.updates

    def get_config(self):
        config = {'tau':float(K.get_value(self.tau)),
                  'lambda_1':float(K.get_value(self.lambda_1)),
                  'lambda_2':float(K.get_value(self.lambda_2)),}
        base_config = super(Leveberg_Marquardt, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Q1 Can I fix this error without going deep into tensorflow (I wish I could do this by staying on Keras level)

Q2 Do I use keras backend in correct way?

I mean, in this line

H = K.dot(g, K.transpose(g)) + self.tau * K.eye(K.max(g))

I should use keras backend function, or numpy or pure python in order to run this code without problem that input data are numpy arrays?

Q3 This question is more about the algorith itself.

Do I even implement LMA correctly? I'm must say, I not sure how to deal with boundry conditions, tau/lambda values I have guessed, maybe you know better way?

I was trying to understand how every other optimizer in keras works, but even SGD code looks ambiguous to me.

Q4 Do I need to change in any way local file optimizers.py?

In order to run it properly I was initializing my optimizer with:

myOpt = Leveberg_Marquardt()

and then simply pass it to complie method. Yet after quick look at source code of optimizers.py I have found thera are places in code with explicity writted names of optimizers (e.g deserialize function). Is it important to extend this for my custom optimizer or I can leave it be?

I would really appreciate any help and direction of future actions.


Solution

  • Q1 Can I fix this error without going deep into tensorflow (I wish I could do this by staying on Keras level)

    A1 I believe even if this error is fixed there are still problems in the implementation of the algorithm that keras does not support for example, the error term f(x;w_0)-y from the document is not available to a keras optimizer.

    Q2 Do I use keras backend in correct way?

    A2 Yes you must use the keras backend for this calculation because g is a tensor object and not a numpy array. However, I believe the correct calculation for H should be H = K.dot(K.transpose(g), g) to take the Nx1 vector g and perform an outer product to produce an NxN matrix.

    Q3 This question is more about the algorith itself.

    A3 As stated in A1 I am not sure that keras supports the required inputs for this algorithm.

    Q4 Do I need to change in any way local file optimizers.py?

    A4 The provided line of code would run the optimizer if supplied as the optimizer argument to the model compile function of keras. The keras library supports calling the built in classes and functions by name for convenience.