Im currently investigating the effekt of masking attention Scores on MultiHeadAttention Layers in a Transformer model for classification of time series data. I have build a model that accepts a time series and a mask.
def build(params: dict) -> keras.Model:
input_dim = 1
sequence_size = params["sequence_size"]
n_classes = params["n_classes"]
encoder_blocks = params["encoder_blocks"]
n_heads = params["encoder_heads"]
encolder_mlp = params["mlp_dim"]
conv_filters = params["conv_filters"]
encoder_dropout = params["encoder_dropout"]
mlp_dropout = params["mlp_dropout"]
learning_rate = params["learning_rate"]
inputs = keras.Input(shape=(sequence_size, input_dim), name="sequence_input")
mask_input = keras.Input(shape=(sequence_size, sequence_size), name="mask_input")
x = inputs + SinePositionEncoding()(inputs)
for _ in range(encoder_blocks):
x = transformer_encoder(x, head_size=sequence_size, num_heads=n_heads, con_filters=conv_filters, attention_mask=mask_input, dropout=encoder_dropout, seed=SEED)
x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
x = layers.Dense(encolder_mlp, activation="relu")(x)
x = layers.Dropout(mlp_dropout, seed=SEED)(x)
outputs = layers.Dense(n_classes, activation="softmax")(x)
model = keras.Model(inputs=[inputs, mask_input], outputs=outputs)
model.compile(
loss="categorical_crossentropy",
optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
metrics=["categorical_accuracy", "f1_score"]
)
return model
When i call model.fit([x_train, dummy_mask], y_train)
everthing works fine. If i call model.evaluate([x_train, dummy_mask], y_train
i get an ValueError:
ValueError: Data cardinality is ambiguous. Make sure all arrays contain the same number of samples.'x' sizes: 60, 577
'y' sizes: 60
x_train shape = (60, 577, 1)
dummy_mask shape = (577, 577, 1)
I don't have a clue why that is, any suggestions?
PS: I am fully aware that calling the .evaluate()
method with train data makes no sense. I tried to narrow the error down. At first i guessed something with my test data is worng but when the model could not be evaluated on the same data it was trained on, somethind other must be off.
Well, truns out i forget that inputs are treated as datasets and my masks had the wrong shape.
x_train
has 60 Datapoint, with a sequence length of 577 and a dimension of 1.
dummy_mask
has a shape of 577 times 577 of dimension 1, which is obviously wrong.
The right shape for dummy_musk
is (60, 577, 577)
or more general (x_train.shape[0], sequence_size, sequence_size)
in case of fitting the model.