I am currently trying to replace some of the plural words like removing "s" from "birds" and replacing it as "bird" in bigquery
but I want them to ignore a few words like "less", "james", "this".
I was able to come up with this which ignores the "less" but still butchers james.
SELECT REGEXP_REPLACE("James likes to chase birds","([^s])s\\b", "\\1" )
The output I am getting is "Jame like to chase bird" but what I am expecting is "James like to chase bird"
Update: I tried to use negative lookahead, but unfortunately, Bigquery Regex(RE2) doesn't support this.
You can use an alternation here, a regex with two alternatives. In the first alternative, you can capture all the words that are exceptions, into Group 1, and in the second one, use your regex. The replacement will be both group values concatenated:
(?i)\b(less|james|this)\b|([^s])s\b
Replace with \1\2
. See the regex demo. Details:
(?i)
- a case insensitive modifier\b(less|james|this)\b
- Group 1: less
, james
, or this
as a whole word|
- or([^s])
- Group 1: any char other than s
(NOTE: if you want to only match a letter other than s
, you can use [^\W\d_s]
)s\b
- s
at the end of a word.In your code, use
SELECT REGEXP_REPLACE("James likes to chase birds", r"(?i)\b(less|james|this)\b|([^s])s\b", r"\1\2" )