tensorflow machine-learning keras deep-learning loss

Keras: Validation accuracy stays the exact same but validation loss decreases

I know that the problem can't be with the dataset because I've seen other projects use the same dataset. Here is my data preprocessing code:

import pandas as pd
dataset = pd.read_csv('political_tweets.csv')
dataset.head()
dataset = pd.read_csv('political_tweets.csv')["tweet"].values
y_train = pd.read_csv('political_tweets.csv')["dem_or_rep"].values

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(dataset, y_train, test_size=0.1)

max_words = 10000
print(max_words)
max_len = 25

tokenizer = Tokenizer(num_words = max_words, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n1234567890', lower=False,oov_token="<OOV>")

tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, max_len, padding='post', truncating='post')

tokenizer.fit_on_texts(x_test)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = pad_sequences(x_test, max_len, padding='post', truncating='post')

And my model:

model = Sequential([
    Embedding(max_words+1,64,input_length=max_len),
    Bidirectional(GRU(64, return_sequences = True), merge_mode='concat'),
    GlobalMaxPooling1D(),
    Dense(64,kernel_regularizer=regularizers.l2(0.02)),
    Dropout(0.5),
    Dense(1, activation='sigmoid'),

])
model.summary()

model.compile(loss='binary_crossentropy', optimizer=RMSprop(learning_rate=0.0001), metrics=['accuracy'])
model.fit(x_train,y_train, batch_size=128, epochs=500, verbose=1, shuffle=True, validation_data=(x_test, y_test))

Both of my losses decrease, my training accuracy increases, but the validation accuracy stays at 50% (which is awful considering I am doing a binary classification model).

Epoch 1/500
546/546 [==============================] - 35s 64ms/step - loss: 1.7385 - accuracy: 0.5102 - val_loss: 1.2458 - val_accuracy: 0.5102
Epoch 2/500
546/546 [==============================] - 34s 62ms/step - loss: 0.9746 - accuracy: 0.5137 - val_loss: 0.7886 - val_accuracy: 0.5102
Epoch 3/500
546/546 [==============================] - 34s 62ms/step - loss: 0.7235 - accuracy: 0.5135 - val_loss: 0.6943 - val_accuracy: 0.5102
Epoch 4/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6929 - accuracy: 0.5135 - val_loss: 0.6930 - val_accuracy: 0.5102
Epoch 5/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6928 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 6/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6927 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 7/500
546/546 [==============================] - 37s 68ms/step - loss: 0.6925 - accuracy: 0.5136 - val_loss: 0.6932 - val_accuracy: 0.5106
Epoch 8/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6892 - accuracy: 0.5403 - val_loss: 0.6958 - val_accuracy: 0.5097
Epoch 9/500
546/546 [==============================] - 35s 63ms/step - loss: 0.6815 - accuracy: 0.5633 - val_loss: 0.7013 - val_accuracy: 0.5116
Epoch 10/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6747 - accuracy: 0.5799 - val_loss: 0.7096 - val_accuracy: 0.5055

I've seen other posts on this topic and they say to add dropout, crossentropy, decrease the learning rate, etc. I have done all of this and none of it works. Any help is greatly appreciated. Thanks in advance!

Solution

A couple of observations for your problem:

Though not particularly familiar with the dataset, I trust that it is used in many circumstances without problems. However, you could try to check for its balance. In train_test_split() there is a parameter called stratify which, if fed the y, it will ensure the same number of samples for each class are in training set and test set proportionally.
Your phenomenon with validation loss and validation accuracy is not something out of the ordinary. Imagine that in the first epochs, the neural network considers some ground truth positive examples (ys) with GT == 1 with 55% confidence. While the training advances, the neural network learns better, and now it is 90% confident for a ground truth positive example (ys) with GT == 1. Since the threshold for calculating the accuracy is 50% , in both situations you have the same accuracy. Nevertheless, the loss has changed significantly, since 90% >> 55%.
You training seems to advance(slowly but surely). Have you considered using Adam as an off-the-shelves optimizer?
If the low accuracy is still maintained over some epochs, you may very well suffer from a well known phenomenon called underfitting, in which your model is unable to capture the dependencies between your data. To mitigate/avoid underfitting altogether, you may want to use a more complex model (2 LSTMs / 2 GRUs) stacked.
At this stage, remove the Dropout() layer, since you have underfitting, not overfitting.
Decrease the batch_size. Very big batch_size can lead to local minima, rendering you network unable to properly learn/generalize.
If none of these work, try starting with a lower learning rate, say 0.00001 instead of 0.0001.
Reiterate over the dataset preprocessing steps. Ensure the sentences are converted properly.