regexgoogle-apps-scriptgoogle-docs

Replacing Regex expression that is not supported with Google Script


A short background of what I am trying to achieve: I have a Google Doc and A google sheet. The google doc contains text and the google sheet contains 2 columns: a word and it's translation. the function gets the body of the google doc and supposed to go over the "words" col, identify all appearances of each word in the body and replace it with its translation - but it matches only occurrences that are whole-words and exact match only. What basically I want to have would be easier to explain with an example: Let's say I have the word "pop" and it is translated to "pretty". I want the function to replace the word except for cases like:

  1. pop's
  2. allpop
  3. popping
  4. etc..

So basically, as was mentioned only if it's an exact match and a whole word only.

This is the function, the regex works fine, the problem is that it is not supported with google script. I couldn't come up with a solution that replaces the regex I made with one that works and meet my requirements. I attach the code so in case something is unclear, you would be able to understand what I meant if you're familiar with regex.

function replaceText(body, words, origin, translated) {

  for(var i=0; i<words.length; i++){
    var word = words[i][origin-1];
    var regex = RegExp("(?:\\b)" + word + "\\b(?!\\')",'gi');

    Logger.log(body.getText().match(regex));
    Logger.log(body.replaceText(regex, translation));
    var translation = words[i][translated-1];
    var foundElement = body.replaceText(regex, translation);
  }

  return body;
}

Also if you're interested, attached the link with what regex expressions are supported by Google Script: https://github.com/google/re2/wiki/Syntax


Solution

  • First, (?:\\b) should just be \\b, the word boundary is zero-width anyway, so it does not need a lookaround.

    Second, I understand that your issue is specifically with replaceText. The line body.getText().match(regex); works with regular JavaScript string method, which supports the usual regexes. The issue is that you need replaceText, and that one is different.

    Third, replaceText does not take a regular expression object as a parameter: its arguments are strings. Check the docs again.

    Finally, since we don't want to treat ' as a word boundary and don't have lookahead support, a solution is to escape ' by replacing it with a weird enough alphanumeric string that won't occur naturally. At the end, replace back.

    function translate() {
      var body = DocumentApp.getActiveDocument().getBody();
      var escape = "uJKiy5hzXNUWFDl7k2pSZoDZ8ipv6LR1ArTi6gXu";  // from https://www.random.org/strings/?num=2&len=20&digits=on&upperalpha=on&loweralpha=on&unique=on&format=html&rnd=new
      body.replaceText("'", escape);
      // the loop would begin here
      var word = "pop";    
      body.replaceText("(?i)\\b" + word + "\\b", "translation");
      // loop would end here.
      body.replaceText(escape, "'");
    }
    

    Note that case-insensitive flag is (?i), and that replacement in replaceText is always global.

    And watch out for curly apostrophes: if they need to special treatment too, escape them similarly but using some other random string.