I am using mini-batches to train my model which is as follows
SimpleModel(
(embed): Embedding(vocab_size, embedding_size, max_norm=2)
(model): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=in_features, out_features=1, bias=True)
)
(sig): Sigmoid()
)
these are the specifics of the model and upon training through mini-batches, after 2-3 minibatches all the outputs become 0.
training function looks like this while the training loop is usual
def trainer(train_loader, model, optimizer, criterion):
model.train()
it_loss = 0
counter = 0
for data in train_loader:
optimizer.zero_grad()
msgs = data['msg']
targets = data['target']
out = model(msgs)
print(out)
loss = criterion(out, targets)
loss.backward()
optimizer.step()
it_loss+=loss.item()*msgs.shape[0]
counter+=msgs.shape[0]
return it_loss/counter
I have tried using various optimizer and all those things, data is not imbalanced as shown
0 3900
1 1896
Name: count, dtype: int64
what could be the possible reason and how can I solve it
Edit :
The output of first mini batch looks like
tensor([[0.4578],
[0.4569],
[0.4686],
.
.
.
[0.4602],
[0.4674],
[0.4398]], grad_fn=<SigmoidBackward0>)
while output of 4th or 5th mini batch looks like
tensor([[0.0057],
[0.0058],
[0.0058],
.
.
.
[0.0058],
[0.0057],
[0.0059]], grad_fn=<SigmoidBackward0>)
And furthermore it gradually becomes exact 0.
Changed the optimizer and Loss function
Used the RAdam
optimizer and changed loss function to BCELoss
and it worked