javascriptregexecmascript-2020

Why does the new "matchAll" in Javascript return an iterator (vs. an array)?


ES2020 contains a new String.prototype.matchAll method, which returns an iterator. I'm sure I'm missing something dumb/obvious, but I don't see why it doesn't just return an array instead.

Can someone please explain the logic there?

EDIT: Just to clarify something from the comments, I'm operating on the assumption that iterators haven't simply replaced arrays as the new way all JS APIs going forward will return multiple values. If I missed that memo, and all new JS functions do return iterators, a link to said memo would 100% qualify as a valid answer.

But again, I suspect that such a blanket change wasn't made, and that the makers of Javascript made a specific choice, for this specific method, to have it return an iterator ... and the logic of that choice is what I'm trying to understand.


Solution

  • This is described in the proposal document:

    Many use cases may want an array of matches - however, clearly not all will. Particularly large numbers of capturing groups, or large strings, might have performance implications to always gather all of them into an array. By returning an iterator, it can trivially be collected into an array with the spread operator or Array.from if the caller wishes to, but it need not.

    .matchAll is lazy. When using the iterator, the regex will only evaluate the next match in the string once the prior match has been iterated over. This means that if the regex is expensive, the first few matches can be extracted, and then your JS logic can make the iterator bail out of trying further matches.

    For a trivial example of the lazy evaluation in action:

    for (const match of 'axxxxxxxxxxxxxxxxxxxxxxxxxxxxy'.matchAll(/a|(x+x+)+y./g)) {
      if (match[0] === 'a') {
        console.log('Breaking out');
        break;
      }
    }
    console.log('done');

    Without the break, the regular expression will go on to attempt a 2nd match, which will result in a very expensive operation.

    If matchAll returned an array, and iterated over all matches immediately while creating the array, it would not be possible to bail out.