c++diffgit-patch

How to serialize a diff of two folders optimally in C++


I'm trying to develop a file diff format for multiple files recursively in folders. Consider a source directory containing patched files and a destination directory containing original files. Write a size minimal diff file which expresses the difference between all files in the source and destination directory which can be applied to the original files in order to transform the original files into the patched files.

For this purpose I found the dtl library. Which algorithm or feature of the library should I use to write a file diff to the disk which I can then later read back and apply in order to patch the file? Any example code for this? I tried writing the result of the shortest edit script (SES) to the disk but I realized that I needed to specify the character and operation for every single byte. This of course makes the output file bigger than the entire comparison file, making this diff format entirely redundant since storing the entire target file instead would've saved more storage.

As another reference, this is very similar to how version control systems like git or svn operate but I don't want to use those since I'm mainly dealing with binary files and the simple requirement of creating and applying patches.


Solution

  • After doing some more search, I found the HDiffPatch project. It worked fine apparently but it seems to take long on bigger folder comparisons:

    diff usage: hdiffz [options] oldPath newPath outDiffFile
    patch usage: hpatchz [options] oldPath diffFile outNewPath
    

    EDIT:
    Another good option is open-vcdiff but it only supports individual files.