regexgoogle-bigquerystemmingre2

Trying to convert plural words to singular words using regex but want to ignore a few words


I am currently trying to replace some of the plural words like removing "s" from "birds" and replacing it as "bird" in bigquery

but I want them to ignore a few words like "less", "james", "this".

I was able to come up with this which ignores the "less" but still butchers james.

    SELECT REGEXP_REPLACE("James likes to chase birds","([^s])s\\b", "\\1" )

The output I am getting is "Jame like to chase bird" but what I am expecting is "James like to chase bird"

Update: I tried to use negative lookahead, but unfortunately, Bigquery Regex(RE2) doesn't support this.


Solution

  • You can use an alternation here, a regex with two alternatives. In the first alternative, you can capture all the words that are exceptions, into Group 1, and in the second one, use your regex. The replacement will be both group values concatenated:

    (?i)\b(less|james|this)\b|([^s])s\b
    

    Replace with \1\2. See the regex demo. Details:

    In your code, use

    SELECT REGEXP_REPLACE("James likes to chase birds", r"(?i)\b(less|james|this)\b|([^s])s\b", r"\1\2" )