[SOLVED] Computing Gini index in tensorflow

Computing Gini index in tensorflow

I'm trying to write down the gini index calculation as a tensorflow cost function. Gini index is: https://en.wikipedia.org/wiki/Gini_coefficient

a numpy solution would be

def ginic(actual, pred):
    n = len(actual)
    a_s = actual[np.argsort(pred)]
    a_c = a_s.cumsum()
    giniSum = a_c.sum() / a_s.sum() - (n + 1) / 2.0
    return giniSum / n

Can someone help me figure out how to do this in tf (for example, in tf there is no argsort that can be part of a function that is differentiated, AFAIK)

Solution

You can perform the argsorting by using tf.nn.top_k(). This function returns a tuple, the second element being the indices. Its order must be reversed since the order is descending.

def ginicTF(actual:tf.Tensor,pred:tf.Tensor):
    n = int(actual.get_shape()[-1])
    inds =  tf.reverse(tf.nn.top_k(pred,n)[1],axis=[0]) # this is the equivalent of np.argsort
    a_s = tf.gather(actual,inds) # this is the equivalent of numpy indexing
    a_c = tf.cumsum(a_s)
    giniSum = tf.reduce_sum(a_c)/tf.reduce_sum(a_s) - (n+1)/2.0
    return giniSum / n

Here is a code you can use for verification that this function returns the same numerical value as your numpy function ginic:

sess = tf.InteractiveSession()
ac = tf.placeholder(shape=(50,),dtype=tf.float32)
pr = tf.placeholder(shape=(50,),dtype=tf.float32)
actual  = np.random.normal(size=(50,))
pred  = np.random.normal(size=(50,))
print('numpy version: {:.4f}'.format(ginic(actual,pred)))
print('tensorflow version: {:.4f}'.format(ginicTF(ac,pr).eval(feed_dict={ac:actual,pr:pred})))