I am building a LSTM model with multiple inputs (the number is given by n_inputs
). I scaled the inputs in (0, 1)
and replaced all the NaN
of the inputs with -1
. Now I want the model to ignore such NaN
values, therefore I use a mask as it follows:
model= Sequential()
model.add(Masking(mask_value=-1, input_shape=(window, n_inputs)))
model.add(LSTM(units=n_units), return_sequences=True)
model.add(Dropout(dropout_rate))
model.add(LSTM(units=n_units))
model.add(Dropout(dropout_rate))
model.add(Dense(units=1))
I am afraid that the Masking
forces the model to completely ignore one timestep of data if any of the inputs has NaN
value (I am not sure how to check if this is the case). What I want though is: for each timestemp, ignore only the NaN
inputs, but pass the others that are valid.
My question is: does the Masking
exclude all the timesteps for which at least one input is NaN
? If it does, how can I get the model to ignore only the NaN
inputs?
Ok so because I didn't know the answer and was curious about it, I made some experimentations. I first created a sequence made of 3 time steps and 3 features :
inputs = np.ones([1, 3, 3]).astype(np.float32)
and I created a simple network where I print two intermediary layers :
inp = tf.keras.layers.Input(shape=(3,3))
mask=tf.keras.layers.Masking(mask_value=-np.inf)(inp)
out=tf.keras.layers.Dense(1,
kernel_initializer=tf.keras.initializers.Ones(),
use_bias=False)(mask)
model_mask=tf.keras.models.Model(inp,mask)
model=tf.keras.models.Model(inp,out)
print(model_mask(inputs))
print(model(inputs))
I used a Dense layer because it supports Masking and it allows to better understand what's happening, but the process is the same with RNNs. I also chose to set the mask value at -inf to see if the masked values are well masked. The weights of the Dense layer are set to one and I disabled biases, so this Dense layer calculate for each time step the sum of the inputs.
inputs[0, 2, :] = -np.inf
this is what i have :
tf.Tensor(
[[[ 1. 1. 1.]
[ 1. 1. 1.]
[nan nan nan]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[ 3.]
[ 3.]
[nan]]], shape=(1, 3, 1), dtype=float32)
So the mask was correctly taken into account.
inputs[0, 2, 0] = -np.inf
and my outputs are :
tf.Tensor(
[[[ 1. 1. 1.]
[ 1. 1. 1.]
[-inf 1. 1.]]], shape=(1, 3, 3), dtype=float32)
tf.Tensor(
[[[ 3.]
[ 3.]
[-inf]]], shape=(1, 3, 1), dtype=float32)
So I conclude that masking was not processed.
I tried with a little exemple, so I hope this example is exploitable for your project. First, I forget the Vanilla Masking layer from keras to use my own mask. The idea is to create a mask that put a 1 on masked values and 0 on real values. For instance, if your values are superior to 0, you replace your Nan values with -1 and you create your custom_mask
:
inputs = np.array([[[1,2,1],[0.5,2,1],[1,0,3]]],dtype=np.float32)
inputs[:,1,0]=-1
inputs[:,2,2]=-1
custom_mask=inputs.copy()
custom_mask[inputs[:,:,:]>=0]=0
custom_mask[inputs[:,:,:]<0]=1
with inputs
and custom_mask
respectively :
[[[ 1. 2. 1.]
[-1. 2. 1.]
[ 1. 0. -1.]]]
[[[0. 0. 0.]
[1. 0. 0.]
[0. 0. 1.]]]
Then, you mutilply your mask by -1E9
in order to put infinite values where you want to mask your inputs. and you add it to your tensor. A simple ReLu
set masked values to 0 :
inp = tf.keras.layers.Input(shape=(3,3))
input_mask=tf.keras.activations.relu(inp-custom_mask*1E9)
out=tf.keras.layers.Dense(1,
kernel_initializer=tf.keras.initializers.Ones(),
use_bias=False)(input_mask)
model=tf.keras.models.Model(inp,out)
print(model(inputs))
equal to :
tf.Tensor(
[[[4.]
[3.]
[1.]]], shape=(1, 3, 1), dtype=float32)