pythontensorflowmultihead-attention

Adding an attention block in deep neural network issue for regression problem


I want to add an tf.keras.layers.MultiHeadAttention inside the two layers of neural network. However i am getting IndexError:

The detail code are as follow

x1 = Dense(58, activation='relu')(x1)

x1 = Dropout(0.1)(x1)

print(x1.shape)

attention = tf.keras.layers.MultiHeadAttention(num_heads=2, key_dim=58, dropout=0.1,output_shape=x1.shape)(x1,x1)

x1 = Dropout(0.2)(attention)

x1 = Dense(59, activation='relu')(x1)

output = Dense(1, activation='linear')(x1)

model = tf.keras.models.Model(inputs=input1, outputs=output)

in above code i am getting following error

IndexError: Exception encountered when calling layer 'softmax' (type Softmax).

tuple index out of range

Call arguments received by layer 'softmax' (type Softmax): • inputs=tf.Tensor(shape=(None, 2), dtype=float32) • mask=None Note that x1.shape= (None, 58)


Solution

  • The problem is solved now. MultiHeadAttention layer in TensorFlow expects a 3D input tensor. Therefor to introduce an attention block into normal neural network, there is need to set inputs and outputs of that block accordingly. So the updated code is as follow

        x1 = Dense(58, activation='relu')(x1)
        x1 = Dropout(0.1)(x1)
        x1 = tf.expand_dims(x1, axis=1) # here we need to expand dimension
        print(x1.shape)
    
        attention = tf.keras.layers.MultiHeadAttention(num_heads=3, key_dim=x1.shape[2], dropout=0.2)(x1, x1)
        x1 = Dropout(0.2)(attention)
        x1 = tf.keras.layers.LayerNormalization()(x1)
        x1 = tf.squeeze(x1, axis=1) # set dimension here again
        x1 = Dense(10, activation='relu')(x1)
        output = Dense(1, activation='linear')(x1)