pythonpython-2.7difflib

Python - getting just the difference between strings


What's the best way of getting just the difference from two multiline strings?

a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

diff = difflib.ndiff(a,b)
print ''.join(diff)

This produces:

  t  e  s  t  i  n  g     t  h  i  s     i  s     w  o  r  k  i  n  g     
     t  e  s  t  i  n  g     t  h  i  s     i  s     w  o  r  k  i  n  g     1     
+  + t+ e+ s+ t+ i+ n+ g+  + t+ h+ i+ s+  + i+ s+  + w+ o+ r+ k+ i+ n+ g+  + 2

What's the best way of getting exactly:

testing this is working 2?

Would regex be the solution here?


Solution

  • a = 'testing this is working \n testing this is working 1 \n'
    b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
    
    splitA = set(a.split("\n"))
    splitB = set(b.split("\n"))
    
    diff = splitB.difference(splitA)
    diff = ", ".join(diff)  # ' testing this is working 2, more things if there were...'
    

    Essentially making each string a set of lines, and taking the set difference - i.e. All things in B that are not in A. Then taking that result and joining it all into one string.

    Edit: This is a convoluted way of saying what @ShreyasG said - [x for x if x not in y]...