I am trying to implement word level matches in Google Diff Match Patch, but it is beating me up.
The result I get is:
=I've never been =|-a-|=t=|= th=|-e-|=se places=|
=I've never been =|=t=|+o+|= th=|+o+|=se places=|
The result I want is:
=I've never been =|-at these-|= places=|
=I've never been =|+to those+|= places=|
The documentation says:
make a copy of diff_linesToChars and call it diff_linesToWords. Look for the line that identifies the next line boundary: lineEnd = text.indexOf('\n', lineStart);
In the c# version, I found the line to change in diff_linesToCharsMunge, which I changed to:
lineEnd = text.Replace(@"/[\n\.,;:]/ g"," ").IndexOf(" ", lineStart);
However, there is no change in granularity -it still finds differences at character level.
I am calling:
List<Diff> differences = diffs.diff_main(linepair.Original, linepair.Corrected, true);
diffs.diff_cleanupSemantic(differences);
I have stepped through to make sure that it is hitting the change I made (incidently, there is a hardcoded minimum of 100 characters before it kicks in).
I have created a sample dotnet project with diffmatch program. Its probably older version of DiffMatchPatch file but the word and lines work.
For your above sample text ,I get below output.
at these | to those