python keras lstm masking multiple-input

Keras multiple inputs masking

I am building a LSTM model with multiple inputs (the number is given by n_inputs). I scaled the inputs in (0, 1) and replaced all the NaN of the inputs with -1. Now I want the model to ignore such NaN values, therefore I use a mask as it follows:

model= Sequential()
model.add(Masking(mask_value=-1, input_shape=(window, n_inputs)))
model.add(LSTM(units=n_units), return_sequences=True)
model.add(Dropout(dropout_rate))
model.add(LSTM(units=n_units))
model.add(Dropout(dropout_rate))
model.add(Dense(units=1))

I am afraid that the Masking forces the model to completely ignore one timestep of data if any of the inputs has NaN value (I am not sure how to check if this is the case). What I want though is: for each timestemp, ignore only the NaN inputs, but pass the others that are valid. My question is: does the Masking exclude all the timesteps for which at least one input is NaN? If it does, how can I get the model to ignore only the NaN inputs?

Solution

Ok so because I didn't know the answer and was curious about it, I made some experimentations. I first created a sequence made of 3 time steps and 3 features :

inputs = np.ones([1, 3, 3]).astype(np.float32)

and I created a simple network where I print two intermediary layers :

inp = tf.keras.layers.Input(shape=(3,3))
mask=tf.keras.layers.Masking(mask_value=-np.inf)(inp)
out=tf.keras.layers.Dense(1,
                           kernel_initializer=tf.keras.initializers.Ones(),
                           use_bias=False)(mask)

model_mask=tf.keras.models.Model(inp,mask)
model=tf.keras.models.Model(inp,out)
print(model_mask(inputs))
print(model(inputs))

I used a Dense layer because it supports Masking and it allows to better understand what's happening, but the process is the same with RNNs. I also chose to set the mask value at -inf to see if the masked values are well masked. The weights of the Dense layer are set to one and I disabled biases, so this Dense layer calculate for each time step the sum of the inputs.

If i mask all the inputs of the time step :

inputs[0, 2, :] = -np.inf

this is what i have :

tf.Tensor(
[[[ 1.  1.  1.]
  [ 1.  1.  1.]
  [nan nan nan]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[ 3.]
  [ 3.]
  [nan]]], shape=(1, 3, 1), dtype=float32)

So the mask was correctly taken into account.

If I want to mask one value :

inputs[0, 2, 0] = -np.inf

and my outputs are :

tf.Tensor(
[[[  1.   1.   1.]
  [  1.   1.   1.]
  [-inf   1.   1.]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[  3.]
  [  3.]
  [-inf]]], shape=(1, 3, 1), dtype=float32)

So I conclude that masking was not processed.

You should create your own mask.

I tried with a little exemple, so I hope this example is exploitable for your project. First, I forget the Vanilla Masking layer from keras to use my own mask. The idea is to create a mask that put a 1 on masked values and 0 on real values. For instance, if your values are superior to 0, you replace your Nan values with -1 and you create your custom_mask:

inputs = np.array([[[1,2,1],[0.5,2,1],[1,0,3]]],dtype=np.float32)

inputs[:,1,0]=-1
inputs[:,2,2]=-1

custom_mask=inputs.copy()
custom_mask[inputs[:,:,:]>=0]=0
custom_mask[inputs[:,:,:]<0]=1

with inputs and custom_mask respectively :

[[[ 1.  2.  1.]
  [-1.  2.  1.]
  [ 1.  0. -1.]]]
[[[0. 0. 0.]
  [1. 0. 0.]
  [0. 0. 1.]]]

Then, you mutilply your mask by -1E9 in order to put infinite values where you want to mask your inputs. and you add it to your tensor. A simple ReLu set masked values to 0 :

inp = tf.keras.layers.Input(shape=(3,3))
input_mask=tf.keras.activations.relu(inp-custom_mask*1E9)
out=tf.keras.layers.Dense(1,
                           kernel_initializer=tf.keras.initializers.Ones(),
                           use_bias=False)(input_mask)

model=tf.keras.models.Model(inp,out)
print(model(inputs))

equal to :

tf.Tensor(
[[[4.]
  [3.]
  [1.]]], shape=(1, 3, 1), dtype=float32)