mosesgiza++smt-language-processing

GIZA++ - How is alignment score calculated?


This might be more of a math problem, but I couldn't find any relevant document elsewhere.

I just want to figure out which equation is used to calculate alignment score in GIZA++.

Might anyone have an idea?

Thank you for your help in advance.


Solution

  • If it helps, I found this document, which includes the following description:

    Implements full IBM-4 alignment model with a dependency of word classes as described in (Brown et al. 1993)

    Following up that reference leads to a paper entitled "The Mathematics of Statistical Machine Translation: Parameter Estimation", which you can find in PDF format here.

    The paper gives details of the math underlying the 5 alignment models and is too verbose to paste here. Perhaps you can see if this is sufficiently detailed in its description of Model 4, which is what I assume is used by GIZA++.

    There is also this PDF, which summarises the models and training process.