I want to highlight the differences between two strings in a colour using Python code.
Example 1:
sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 = "I am enjoying the summer breeze on the beach while I am doing some pilates."
Expected result (the part marked by asterisks should be in red):
I *am* enjoying the summer breeze on the beach while I *am doing* some pilates.
Example 2:
sentence1: "My favourite season is Autumn while my sister's favourite season is Winter."
sentence2: "My favourite season is Autumn, while my sister's favourite season is Winter."
Expected result (the comma is different):
"My favourite season is Autumn*,* while my sister's favourite season is Winter."
I tried this:
sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 = "I'm enjoying the summer breeze on the beach while I am doing some pilates."
# Split the sentences into words
words1 = sentence1.split()
words2 = sentence2.split()
# Find the index where the sentences differ
index_of_difference = next((i for i, (word1, word2) in enumerate(zip(words1, words2)) if word1 != word2), None)
# Highlight differing part "am doing" in red
highlighted_words = []
for i, (word1, word2) in enumerate(zip(words1, words2)):
if i == index_of_difference:
highlighted_words.append('\033[91m' + word2 + '\033[0m')
else:
highlighted_words.append(word2)
highlighted_sentence = ' '.join(highlighted_words)
print(highlighted_sentence)
And I got this:
I'm enjoying the summer breeze on the beach while I *am* doing some
Instead of this:
I'm enjoying the summer breeze on the beach while I *am doing* some pilates.
How can I solve this?
I believe the main issue with your code was with getting the indexes of the differences. Here is a solution that makes use of the built-in Python difflib
library:
from difflib import Differ
# Return string with the escape sequences at specific indexes to highlight
def highlight_string_at_idxs(string, indexes):
# hl = "\x1b[38;5;160m" # 8-bit
hl = "\x1b[91m"
reset = "\x1b[0m"
words_with_hl = []
for string_idx, word in enumerate(string.split(" ")):
if string_idx in indexes:
words_with_hl.append(hl + word + reset)
else:
words_with_hl.append(word)
return " ".join(words_with_hl)
# Return indexes of the additions to s2 compared to s1
def get_indexes_of_additions(s1, s2):
diffs = list(Differ().compare(s1.split(" "), s2.split(" ")))
indexes = []
adj_idx = 0 # Adjust index to compensate for removed words
for diff_idx, diff in enumerate(diffs):
if diff[:1] == "+":
indexes.append(diff_idx - adj_idx)
elif diff[:1] == "-":
adj_idx += 1
return indexes
sentence1 = "I'm enjoying the summer breeze on the beach while I do some pilates."
sentence2 = "I am enjoying the summer breeze on the beach while I am doing some pilates."
addition_idxs = get_indexes_of_additions(sentence1, sentence2)
hl_sentence2 = highlight_string_at_idxs(sentence2, addition_idxs)
print(hl_sentence2)
*I am* enjoying the summer breeze on the beach while I *am doing* some pilates.