I am using keras to train a 1D CNN with time-series as input data to perform a binary classification. The model part of the code is the following:
modelo = Sequential()
modelo.add(Conv1D(filters=32, kernel_size=5, activation='relu', input_shape=(max_size, 1)))
modelo.add(BatchNormalization())
modelo.add(Conv1D(filters=32, kernel_size=5, activation='relu'))
modelo.add(BatchNormalization())
modelo.add(Dropout(0.3))
modelo.add(Flatten())
modelo.add(Dense(64, activation='relu'))
modelo.add(Dropout(0.3))
modelo.add(Dense(1, activation='sigmoid'))
modelo.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
early_stopping = callbacks.EarlyStopping(monitor='loss', patience=1, restore_best_weights=True)
modelo.fit(X, labels, epochs=100, batch_size=5, callbacks=[early_stopping])
scores = modelo.evaluate(X, labels)
print("\nAccuracy: %.2f%%" % (scores[1]*100))
When the fit function is executed, 4 epochs are computed (stops due to early_stopping), only the inital epoch with a low accuracy. However, the resulting accuracy coming from evaluate function is only 54%.
Why is there a so high discrepancy between the training accuracies during the epochs and the accuracy printed at the end? I would expect to get a final accuracy around the average of the training epochs accuracy. At least that is the case when I use an LSTM or an MLP.
The output is the following:
Epoch 1/100
C:\ProgramData\miniconda3\envs\spyder-cf\Lib\site-packages\keras\src\layers\convolutional\base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, kwargs)
10/10 ━━━━━━━━━━━━━━━━━━━━ 2s 39ms/step - accuracy: 0.5768 - loss: 5.3461
Epoch 2/100
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 38ms/step - accuracy: 0.9291 - loss: 0.4829
Epoch 3/100
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 39ms/step - accuracy: 0.9692 - loss: 0.0785
Epoch 4/100
10/10 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step - accuracy: 0.9272 - loss: 0.2117
2025-03-09 20:19:38.223377: E tensorflow/core/framework/node_def_util.cc:676] NodeDef mentions attribute use_unbounded_threadpool which is not in the op definition: Op<name=MapDataset; signature=input_dataset:variant, other_arguments: -> handle:variant; attr=f:func; attr=Targuments:list(type),min=0; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),min=1; attr=use_inter_op_parallelism:bool,default=true; attr=preserve_cardinality:bool,default=false; attr=force_synchronous:bool,default=false; attr=metadata:string,default=""> This may be expected if your graph generating binary is newer than this binary. Unknown attributes will be ignored. NodeDef: {{node ParallelMapDatasetV2/_15}}
2/2 ━━━━━━━━━━━━━━━━━━━━ 0s 25ms/step - accuracy: 0.4538 - loss: 1.3638
Accuracy: 54.00%
Fortunately, I get very good results with the LSTM and MLP. This means that I don't need the CNN to perform good. However, I would like to know why the CNN does not perform so good or if there is something wrong, because I am comparing the accuracy vs epoch during the training of the CNN with the other models.
After some investigation, I have found the problem. The model included two Dropout
layers, which are active during training but disabled during evaluation. This was affecting the final accuracy in evaluate
, as the model sees all connections in use during inference. After removing the Dropout
layers, the model was able to train correctly.