python tensorflow vectorization xor bitwise-xor

XOR on tensors (using vectorization) having float value in Tensorflow

I have two tensors t1 and t2 of the same shape (in my case [64, 64, 3]). I need to compute the XOR of these two tensors. But couldn't figure out a way to do so.

import bitstring
from bitstring import *

@tf.function
def xor(x1, x2) :
  a = BitArray(float=x1, length = 64)
  b = BitArray(float=x2, length = 64)
  a ^= b
  return a.float

This xor function computes xor of two float values in python.

Sample input tensors are,

t1 = tf.constant([[1.1, 2.2, 3.3],
                  [4.4, 5.5, 6.6]], dtype=tf.float64)
t2 = tf.constant([[7.7, 8.8, 9.9],
                  [10.1, 11.11, 12.12]], dtype=tf.float64)

I can't seem to find a way to compute xor of two tensors.

How can I write vectorized version of the xor function call which will compute xor of each pair of floats from two tensors of any shape (similar to tf.add, tf.matmul etc)? I tried np.vectorized etc.
How can I efficiently write the xor function? In order to use the gpu in tensorflow I need to write each statement using tf.something e.g. tf.add, tf. matmul etc. But since tensorflow doesn't have native support of Bitstring, is there any way to convert float to bitstring in tensorflow (in the xor function) so that I can execute tf.bitwise_xor over that later?

Solution

You'll probably need a custom C++ op to do this. The Tensorflow docs have a nice tutorial on how to construct one. Here's an example to get you started.

xor_op.cc

#include "tensorflow/core/framework/common_shape_fns.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_types.h"

namespace tensorflow {
using shape_inference::InferenceContext;

REGISTER_OP("Xor")
    .Input("input_tensor_a: float")
    .Input("input_tensor_b: float")
    .Output("output_tensor: float")
    .SetShapeFn([](InferenceContext* c) {
      return shape_inference::UnchangedShapeWithRankAtLeast(c, 1);
    });

class XorOp : public OpKernel {
 public:
  explicit XorOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}

  float XorFloats(const float* a, const float* b, float* c) {
    *(int*)c = *(int*)a ^ *(int*)b;
    return *c;
  }

  void Compute(OpKernelContext* ctx) override {
    // get input tensors
    const Tensor& input_fst = ctx->input(0);
    const Tensor& input_snd = ctx->input(1);

    TTypes<float, 1>::ConstFlat c_in_fst = input_fst.flat<float>();
    TTypes<float, 1>::ConstFlat c_in_snd = input_snd.flat<float>();

    // allocate output tensor
    Tensor* output_tensor = nullptr;
    OP_REQUIRES_OK(ctx,
                   ctx->allocate_output(0, input_fst.shape(), &output_tensor));

    auto output_flat = output_tensor->flat<float>();
    const int N = c_in_fst.size();

    for (int i = 0; i < N; ++i) {
      XorFloats(&c_in_fst(i), &c_in_snd(i), &output_flat(i));
    }
  }
};

REGISTER_KERNEL_BUILDER(Name("Xor").Device(DEVICE_CPU), XorOp);

}  // namespace tensorflow

Let's build the op and test

$ TF_LFLAGS=($(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))'))
$ TF_CFLAGS=($(python -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))'))
$ 
$ g++ -std=c++14 -shared xor_op.cc -o xor_op.so -fPIC ${TF_CFLAGS[@]} ${TF_LFLAGS[@]} -O2

Let's run the op and see if it works.

main.py

import tensorflow as tf


def main():
    xor_module = tf.load_op_library("./xor_op.so")
    xor_op = xor_module.xor

    # make some data
    a = tf.constant(
        [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6]],
        dtype=tf.float32)

    b = tf.constant(
        [[7.7, 8.8, 9.9], [10.1, 11.11, 12.12]],
        dtype=tf.float32)
    
    c = xor_op(a, b)

    print(f"a: {a}")
    print(f"b: {b}")
    print(f"c: {c}")


if __name__ == "__main__":
    main()

# a: [[1.1 2.2 3.3]
#     [4.4 5.5 6.6]]
# b: [[ 7.7   8.8   9.9 ]
#     [10.1  11.11 12.12]]
# c: [[3.3319316e+38 2.3509887e-38 3.7713776e-38]
#     [6.3672620e-38 4.7666294e-38 5.3942895e-38]]

Cool. Let's test a little more rigorously.

test.py

import tensorflow as tf
from tensorflow.python.platform import test as test_lib


class XorOpTest(test_lib.TestCase):
    def setUp(self):
        # import the custom op
        xor_module = tf.load_op_library("./xor_op.so")
        self._xor_op = xor_module.xor

        # make some data
        self.a = tf.constant(
            [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6]],
            dtype=tf.float32)

        self.b = tf.constant(
            [[7.7, 8.8, 9.9], [10.1, 11.11, 12.12]],
            dtype=tf.float32)

    def test_xor_op(self):
        c = self._xor_op(self.a, self.b)
        self.assertAllEqual(self._xor_op(c, self.b), self.a)


if __name__ == "__main__":
    test_lib.main()

# [ RUN      ] XorOpTest.test_xor_op
# [       OK ] XorOpTest.test_xor_op
# ----------------------------------------------------------------------
# Ran 1 test in 0.005s
# 
# OK

I'll leave it to you to extend this to work on a GPU. If you're curious, the XorFloats method comes from a bit level manipulation used in the inverse square root problem.