pythonalgorithmsequence-alignment

Connecting input sentences with overlapping words


The task is to connect the input sentences which are overlapping. My problem is how to remove the overlapping parts properly.

Input: first line is number of sentences to be connected. Next following lines are sentences. Output: connected sentence

Examples:

Input:

2
The harder you work for something, the
something, the greater you?ll feel when you achieve it.

Output:

The harder you work for something, the greater you?ll feel when you achieve it.

My code:

def connect(sentence1,sentence2):
  x= None
  y= None
  for i in range(len(sentence2)):
    if sentence2[:len(sentence2)-i] in sentence1 and len(sentence2[:len(sentence2)-i]) != 1:
        y =(sentence1+' '+sentence2[len(sentence2)-i:].strip())
        x =True
        break
  return x,y
n = int(input())
lst = []
for i in range(n):
  a = input()
  lst.append(a)
for i in lst:
  for j in lst:
    if i ==j:
        pass
    elif True in connect(i,j):
        lst.remove(i)
        lst.remove(j)
        lst.append(connect(i,j)[1])
print(lst[0])

Input 1:

3
The fool doth think he is wise,
wise man knows himself to be a fool.
wise, but the wise

Output 1: incorrect

The fool doth think he is wise, man knows himself to be a fool. but the wise

Expected output 1:

The fool doth think he is wise, but the wise man knows himself to be a fool.

Input 2:

7
afraid of greatness.
Be not afraid
some achieve greatness,
greatness thrust upon them.
greatness. Some
Some are born great, some
greatness, and others have greatness

Output 2: error

line 21, in 
    lst.remove(i)
ValueError: list.remove(x): x not in list

Expected output 2:

Be not afraid of greatness. Some are born great, some achieve greatness, and others have greatness thrust upon them.

Solution

  • You need to keep track of the length of the overlap (either end-start or start-end) in order to cut the appropriate number of characters from one of the parts that you concatenate:

    sentences = """afraid of greatness.
    Be not afraid
    some achieve greatness,
    greatness thrust upon them.
    greatness. Some
    Some are born great, some
    greatness, and others have greatness""".split("\n")
    
    result = sentences.pop(0) # start with any part, and take it out
    
    while sentences:
        for s in list(sentences):
            atEnd   = next((p for p in range(1,len(s)) if result[-p:]==s[:p]),0)
            if atEnd:                        # length of overlapping end-start
                result = result + s[atEnd:]  # append to end, cut starting overlap
                sentences.remove(s)
                continue
            atStart = next((p for p in range(1,len(s)) if result[:p]==s[-p:]),0)
            if atStart:                      # length of overlapping start-end
                result = s[:-atStart]+result # insert at start, cut ending overlap
                sentences.remove(s)
    
    print(result)
    
    Be not afraid of greatness. Some are born great, some achieve greatness, and others have greatness thrust upon them.