pythonhtmldifflib

How can I format the output of Python's difflib.HtmlDiff to make it readable?


I am trying to output the difference between two text files using the library difflib in Python 2, with the function HtmlDiff to generate an html file.

V1 = 'This has four words'
V2 = 'This has more than four words'

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

However the output html looks like this on a browser:

enter image description here

The display is comparing each single character, making it completely unreadable.

What should I modify for the comparison to be more human-friendly? (e.g. full sentences on each side)

If the input specifies "lines", then the output is also formatted respecting the lines, but it is not displaying the differences:

V1 = ['This has four words']
V2 = ['This has more than four words']

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

Resulting html (as viewed on a browser):

enter image description here


Solution

  • To get a markup you can use difflib.SequenceMatcher as in the function defined in this answer https://stackoverflow.com/a/788780/2318649

    to get this code:

    import difflib
    
    def show_diff(seqm):
        # function from https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline
        """Unify operations between two compared strings
    seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
        output= []
        for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
            if opcode == 'equal':
                output.append(seqm.a[a0:a1])
            elif opcode == 'insert':
                output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
            elif opcode == 'delete':
                output.append("<del>" + seqm.a[a0:a1] + "</del>")
            elif opcode == 'replace':
                raise NotImplementedError( "what to do with 'replace' opcode?" )
            else:
                raise RuntimeError( f"unexpected opcode unknown opcode {opcode}" )
        return ''.join(output)
    
    
    V1 = 'This has four words but fewer than eleven'
    V2 = 'This has more than four words'
    
    
    sm= difflib.SequenceMatcher(None, V1, V2)
    
    html = "<html><body>"+show_diff(sm)+"</body></html>"
    
    open("output.html","wt").write(html)
    

    which produces:

    enter image description here