algorithmgithubforwardcrflog-likelihood

Forward and forward-backward algorithms in CRF


In the papers about CRF (like this or this), authors all mention the forward-backward algorithm, yet implementations in GitHub (or the basic implementation in PyTorch tutorial) seem to only use the forward algorithm for calculating the negative log-likelihood to be optimized with SGD.

If I want to train a NER on BiLSTM features and the only type of queries I will make is just like "given a sentence, find the named entities", do I need the forward-backward algorithm? Or more generally, what is the difference between these two algorithms and which one is used when?


Solution

  • I think the reason why only forward algorithm is used in the PyTorch tutorial is that for calculating the partition function, only forward or backward pass is needed. Forward-backward algorithm would be needed to calculate the marginal probabilities though.