[SOLVED] Not able to use Embedding Layer with tf.distribute.MirroredStrategy

Not able to use Embedding Layer with tf.distribute.MirroredStrategy

I am trying to parallelize a model with embedding layer, on tensorflow version 2.4.1 . But it is throwing me the following error :

InvalidArgumentError: Cannot assign a device for operation sequential/emb_layer/embedding_lookup/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node sequential/emb_layer/embedding_lookup/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU XLA_CPU XLA_GPU 
Cast: GPU CPU XLA_CPU XLA_GPU 
Const: GPU CPU XLA_CPU XLA_GPU 
ResourceSparseApplyAdagradV2: CPU 
_Arg: GPU CPU XLA_CPU XLA_GPU 
ReadVariableOp: GPU CPU XLA_CPU XLA_GPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  sequential_emb_layer_embedding_lookup_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adagrad_adagrad_update_update_0_resourcesparseapplyadagradv2_accum (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  sequential/emb_layer/embedding_lookup/ReadVariableOp (ReadVariableOp) 
  sequential/emb_layer/embedding_lookup/axis (Const) 
  sequential/emb_layer/embedding_lookup (GatherV2) 
  gradient_tape/sequential/emb_layer/embedding_lookup/Shape (Const) 
  gradient_tape/sequential/emb_layer/embedding_lookup/Cast (Cast) 
  Adagrad/Adagrad/update/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0

     [[{{node sequential/emb_layer/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_631]

Simplified the model to a basic model to make it reproducible :

import tensorflow as tf
central_storage_strategy = tf.distribute.MirroredStrategy()
with central_storage_strategy.scope():
  user_model = tf.keras.Sequential([
       tf.keras.layers.Embedding(10, 2, name = "emb_layer")
     ])
user_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1), loss="mse")
user_model.fit([1],[[1,2]], epochs=3)

Any help will be highly appreciated. Thanks !

Solution

So finally I figured out the problem, if anyone is looking for an answer.

Tensorflow does not have complete GPU implementation of Adagrad optimizer as of now. ResourceSparseApplyAdagradV2 operation gives error on GPU, which is integral to embedding layer. So it can not be used with embedding layer with data parallelism strategies. Using Adam or rmsprop works fine.