regexscala

Extract Int from regex using pattern matching without extracting as String and then casting toInt


I have a year, expressed in the format XXYY-ZZ. For example, the year 2020-21 would represent a year spanning 2020 to 2021. I need to extract XXYY, YY and ZZ as Ints to use in calculations later.

Using Pattern matching and regex, I can extract values I want as Strings, like this:

import scala.util.matching.Regex
val YearFormatRegex: Regex = "(20([1-9][0-9]))-([1-9][0-9])".r

"2020-21" match {
  case YearFormatRegex(fullStartYear, start, end) => println(fullStartYear, start, end)
  case _                                          => println("did not match")
}
// will print (2020, 20, 21)

However I need the values as Ints. Is there a way to extract these values as Ints without throwing .toInt all over the place? I understand that the regex specifically looks for numbers so extracting them as Strings and then parsing as Ints seems like an unnecessary step if I can avoid it.


Solution

  • If you want to simply encapsulate the conversion, one way to do it could be to create your own extractor object built around your regular expression, e.g.:

    import scala.util.matching.Regex
    
    object Year {
      
      private val regex: Regex = "(20([1-9][0-9]))-([1-9][0-9])".r
      
      def unapply(s: String): Option[(Int, Int, Int)] =
        s match {
          case regex(prefix, from, to) => Some((prefix.toInt, from.toInt, to.toInt))
          case _ => None
        }
      
      
    }
    
    "2020-21" match {
      case Year(fullStartYear, start, end) => fullStartYear - start + end
      case _ => 0
    } // returns 2020 - 20 + 21 = 2021
    

    You can read more on extractor objects here on the Scala official documentation.

    You can play around with this code here on Scastie.