I am calculating BLEU score between 2 sentences which seem very similar to me but I am getting BLEU score as very low. Is it supposed to happen?
prediction = "I am ABC."
reference = "I'm ABC."
from nltk.translate.bleu_score import sentence_bleu, corpus_bleu
from nltk.translate.bleu_score import SmoothingFunction
# Tokenize the sentences
prediction_tokens = prediction.split()
reference_tokens = reference.split()
# Calculate BLEU score
bleu_score = sentence_bleu([reference_tokens], prediction_tokens, smoothing_function=SmoothingFunction().method4)
# Print the BLEU score
print(f"BLEU score: {bleu_score:.4f}")
Output is 0.0725
Yes, for two reasons:
Hope you found the answer you are looking for 🤓.