javaregexstringstring-comparisongoogle-diff-match-patch

Java : Getting effected sentence in Google-diff-match-patch


Example :

  1. Old String : The quick brown fox jumped over lazy rabbit. Curiosity killed the cat.
  2. New String : The quick brown lion jumped over lazy rabbit. Curiosity killed the cat.

Expected output : The quick brown lion jumped over lazy rabbit.

What I am getting now.

Diff(DELETE,"fox")
Diff(INSERT,"lion")

So, I have no context as to where the word fox was deleted and where lion was added. So, even 15 characters to left and right of the one that had some operation would also do. The code I have now :

diff_match_patch diffMatchPatch = new diff_match_patch();
            LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldText,newText);
            for(diff_match_patch.Diff d : deltas){
                if((d.operation == diff_match_patch.Operation.DELETE) || (d.operation== diff_match_patch.Operation.INSERT)) {
                    System.out.println(d);
                }
            }

Any help would be nice. Thanks a lot. :-) If there is any doubt about the way I explained, please let me know.

Edit New code added from the answer :

 diff_match_patch diffMatchPatch = new diff_match_patch();
            LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(notes1.getNotetext(),notes.getNotetext());
            for(diff_match_patch.Diff d : deltas) {
                if ((d.operation == diff_match_patch.Operation.DELETE) || (d.operation == diff_match_patch.Operation.INSERT)) {
                    Pattern myPattern = Pattern.compile("(\\. |^)(.*" + d.text + ".*)(\\. )");
                    Matcher m = myPattern.matcher(notes1.getNotetext());
                    while (m.find()) {
                        System.out.println("Found " + d.operation + " of: " + d.text + " in sentence: " + m.group());
                    }
                }
            }

The output I am getting is wrong, something like this I am getting,
Found DELETE of: I  in sentence: I yoyo am also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: oyo am in sentence: I yoyo am also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: a in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found INSERT of: a in sentence: kshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found INSERT of: r in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: ks in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: ay in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found INSERT of: ul in sentence: akshay also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: In this, in sentence: rahul also working on a webapp in which the user can make changes to a text area. In this, he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: ang in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: s in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found INSERT of: ck in sentence: rahul also working on a webapp in which the user can make changes to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 
Found DELETE of: rahul  in sentence: rahul also working on a webapp in which the user can make check to a text area.  he can either write one paragraph, one sentence. So what I am currently trying to do is to split the whole paragraph by a dot separator. Once that is done, I would like to check which sentences have changed. I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays. But it is not working, I am getting zero String modified from it. Kindly let me know what I am doing wrong. 

I would like to know when the whole word/sentence is deleted inserted, so I can save it properly in database. Any help would be nice. Thanks a lot. :-)

Edit The answer below mentioned works perfectly to get 2 seperate Strings which can be persisted in Database.


Solution

  • After extensive reconsideration I think this is not a case for a regular expression . The same changes my appear in several lines so you have to check your input line by line like this:

    //-------------------------Example Strings---------------------------------------------
      private static String oldText = "I yoyo am also working on a \n webapp in which the user can make changes to a text area. " +
          "In this, he can either write one paragraph, one sentence." +
          " So what I am currently trying to do is to split the whole paragraph by a dot separator. " +
          "Once that is done, I would like to check which sentences have changed." +
          " I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays." +
          " But it is not working, I am getting zero String modified from it."+
          " Kindly let me know what I am doing wrong.";
    
      private static String newText = "akshay is also working on a \n webapp in which the user can make changes to a text area. " +
          "He can either write one paragraph, one sentence." +
          " So what I am currently trying to do is to split the whole paragraph by a dot separator. " +
          "Once that is done, I would like to check which sentences have changed." +
          " I am currently doing it using for loop, which is not accurate as I have to length of array to Math.minimum of both String arrays." +
          " But it is not working, I am getting zero String modified from it.";
      //-------------------------Example Strings end --------------------------------------
    
      private static diff_match_patch diffMatchPatch;
    
      public static void main(String[] args) {
    
        diffMatchPatch = new diff_match_patch();
        //Split text into List of strings
        List<String> oldTextList = Arrays.asList(oldText.split("(\\.|\\n)"));
        List<String> newTextList = Arrays.asList(newText.split("(\\.|\\n)"));
    
        //If we have different length
        int counter = Math.max(oldTextList.size(), newTextList.size()); 
        StringBuilder sb = new StringBuilder();
    
        for(int current = 0; current < counter; current++){
          String oldString = null;
          String newString = null;
    
          if(oldTextList.size() <= current){
            oldString = "";
            newString = newTextList.get(current);
    
          } else if (newTextList.size() <= current){
            oldString = oldTextList.get(current);
            newString = "";
          } else {
            if (isLineDifferent(oldTextList.get(current), newTextList.get(current))){
              oldString = oldTextList.get(current);
              newString = newTextList.get(current);
            }
          }
          if(oldString != null && newString != null) {
            //---- Insert into database here -----
            sb.append("Changes for Line: " + current + "\n");
            sb.append("Old: " + oldString + "; New: " + newString +";\n");
          }
        }
    
        System.out.println(sb.toString());
      }
    
      private static boolean isLineDifferent(String oldString, String newString) {
        LinkedList<diff_match_patch.Diff> deltas = diffMatchPatch.diff_main(oldString,newString);
        for(diff_match_patch.Diff d : deltas){
          if (d.operation == diff_match_patch.Operation.EQUAL) continue;
          return true;
          }
        return false;
        }
      }
    

    This should net you the following result:

    Changes for Line: 0
    Old: I yoyo am also working on a ; New: akshay is also working on a ;
    Changes for Line: 2
    Old:  In this, he can either write one paragraph, one sentence; New:  He can either write one paragraph, one sentence;
    Changes for Line: 8
    Old:  Kindly let me know what I am doing wrong; New: ;
    

    Please note that I only added the ";" as a seperation symbol for the Stringbuilder so you can discern where the strings end. This is of course still not perfect a few things to consider: