python-2.7nltkbleu

Could I use nltk.translate.bleu_score.sentence_bleu for calculating BLEU scores in Chinese


If I have a Chinese word lists, like:

reference = ['我', '是', '好' ,'人']
hypothesis = ['我', '是', '善良的','人]

Could I use the nltk.translate.bleu_score.sentence_bleu(references, hypothesis) for the Chinese translation task? Is it the same as for English? How about the word lists in Japanese?


Solution

  • TL;DR

    Yes.


    In Long

    BLEU score measures n-grams and its agnostic to languages but its dependent on the fact the language sentences can be split into tokens. So yes, it can compare Chinese/Japanese...

    Note the caveats of using BLEU score at sentence level. BLEU was never created with sentence level comparison in mind, here's a nice discussion: https://github.com/nltk/nltk/issues/1838

    Most probably, you'll see the warning when you have really short sentences, e.g.

    >>> from nltk.translate import bleu
    >>> ref = '我 是 好 人'.split()
    >>> hyp = '我 是 善良的 人'.split()
    >>> bleu([ref], hyp)
    /usr/local/lib/python2.7/site-packages/nltk/translate/bleu_score.py:490: UserWarning: 
    Corpus/Sentence contains 0 counts of 3-gram overlaps.
    BLEU scores might be undesirable; use SmoothingFunction().
      warnings.warn(_msg)
    0.7071067811865475
    

    You can use the smoothing functions in https://github.com/alvations/nltk/blob/develop/nltk/translate/bleu_score.py#L425 to overcome short sentences.

    >>> from nltk.translate.bleu_score import SmoothingFunction
    >>> smoothie = SmoothingFunction().method4
    >>> bleu([ref], hyp, smoothing_function=smoothie)
    0.2866227639866161