regexreason

How do I do regex substitutions with multiple capture groups?


I'm trying to allow users to filter strings of text using a glob pattern whose only control character is *. Under the hood, I figured the easiest thing to filter the list strings would be to use Js.Re.test[https://rescript-lang.org/docs/manual/latest/api/js/re#test_], and it is (easy).

Ignoring the * on the user filter string for now, what I'm having difficulty with is escaping all the RegEx control characters. Specifically, I don't know how to replace the capture groups within the input text to create a new string.

So far, I've got this, but it's not quite right:

let input = "test^ing?123[foo";

let escapeRegExCtrl = searchStr => {
    let re = [%re("/([\\^\\[\\]\\.\\|\\\\\\?\\{\\}\\+][^\\^\\[\\]\\.\\|\\\\\\?\\{\\}\\+]*)/g")];

    let break = ref(false);
    while (!break.contents)  {
        switch (Js.Re.exec_ (re, searchStr)) {
            | Some(result) => {
                let match = Js.Re.captures(result)[0];
                Js.log2("Matching: ", match)
            }
            | None => {
                break := true;
            }
        }
    }
};
search -> escapeRegExCtrl

If I disregard the "test" portion of the string being skipped, the above output will produce:

Matching: ^ing  
Matching: ?123 
Matching: [foo

With the above example, at the end of the day, what I'm trying to produce is this (with leading and following .*:

.*test\^ing\?123\[foo.*

But I'm unsure how to achieve creating a contiguous string from the matched capture groups.

(echo "test^ing?123[foo" | sed -r 's_([\^\?\[])_\\\1_g' would get the work done on the command line)


EDIT

Based on Chris Maurer's answer, there is a method in the JS library that does what I was looking for. A little digging exposed the ReasonML proxy for that method: https://rescript-lang.org/docs/manual/latest/api/js/string#replacebyre


Solution

  • Let me see if I have this right; you want to implement a character matcher where everything is literal except *. Presumably the * is supposed to work like that in Windows dir commands, matching zero or more characters.

    Furthermore, you want to implement it by passing a user-entered character string directly to a Regexp match function after suitably sanitizing it to only deal with the *.

    If I have this right, then it sounds like you need to do two things to get the string ready for js.re.test:

    1. Quote all the special regex characters, and
    2. Turn all instances of * into .* or maybe .*?

    Let's keep this simple and process the string in two steps, each one using Js.re.replace. So the list of special characters in regex are [^$.|?*+(). Suitably quoting these for replace:

    str.replace(/[\[\\\^\$\.\|\?\+\(\)]/g, '\$&')
    

    This is just all those special characters quoted. The $& in the replacement specifications says to insert whatever matched. Then pass that result to a second replace for the * to .*? transformation.

    str.replace(/*+/g, '.*?')