regexscalaregex-alternation

How do I instruct Scala's regex matcher to select the leftmost alternation


I have hit a regular expression alternation problem when attempting to retrieve a value from scala.util.matching.Regex. I have tried both the default and the unanchored initialized states while using the findAllMatchIn method.

Using the regex pattern of (1e)|(a1) with the sample text of a1e, I WANT to receive 1e because I have placed it in the first alternation position. I'm confused as to why I am actually receiving the value a1 instead which is in the second alternation position.

After looking through a number of questions that deal with multiple matches across an input string, I have not found any that address how to handle alternation other than to put them in the right order, which I have done.

What method(s), if any, can I use to either get back my desired value of 1e, or that returns two values, one for each of the alternations it matched?

Below is the code sample from my IntelliJ Scala Worksheet (here's the code in Scastie):

import scala.util.matching.Regex

val regexStringLongOrLat: String =
  """(1e)|(a1)"""
val regexLongOrLat: Regex = regexStringLongOrLat.r.unanchored

val sample =
  "a1e"

val test2 =
  regexLongOrLat
    .findAllMatchIn(sample)
    .toList

println(test2) //This should return List(1e)

Solution

  • It returns "a1" because that's the first match encountered as the regex engine traverses the string.

    In this case you can get what you want with a negative look-ahead.

    "(1e|a1(?!e))".r
                  .findAllIn("a1e")
                  .toList
    //res0: List[String] = List(1e)
    

    Note: "(a1(?!e)|1e)".r returns the same results.