I am trying to write a multiplatform library for comparing SemVer versions. I extracted the regular expression provided by the website to validate the pre-release part:
val preReleasePattern = Regex("""(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*""")
If I execute the following function in JVM and native, it return true
. However, if I execute this statement in Nodejs, it returns false
.
fun main() {
val preReleasePattern = Regex("""(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*""")
println(preReleasePattern matches "01s")
}
I am using Kotlin 1.6.0 and the Regex
class is coming from kotlin.text
package. Is my pattern incorrect or I have to write a different regular expression dedicated for JS environment?
As was suspected in the comments above, this is not due to regex dialect differences, but because the bug reported at https://youtrack.jetbrains.com/issue/KT-49065 (which is wrongly attributed to dialect differences) and fixed by https://github.com/JetBrains/kotlin/pull/5402 once it gets accepted.
Kotlin/JS simply has a bug in the matches
implementation. The JavaScript class has no such facility so it is faked by finding a match and then checking whether it starts at the start and ends at the end. Instead of |
you can also simply use a lazy quantifier like in .*?
to trigger the bug. The regex engine gives back the matched substring, Kotlin/JS See it is not the whole string and thus returns false
, ignoring the fact that the pattern could also have matched the whole input.
As a work-around wrap your pattern into a non-matching group and add start and ends anchors, then it works uniformly on all platforms (you can see it in the mentioned PR as that also is the fix for JS).
Btw. no, the Kotlin compiler does not rewrite and regex patterns and also should not. For one, this might not even possible. And besides that, the documentation clearly states that the platform specific dialects are used. So you either have to restrict yourself to the common subset, or have different patterns for different platforms.