The Rust Regex crate offers the regex!
syntax extension which makes it possible to compile a regex during the standard compile time. This is good in two ways:
Unfortunately, the docs say:
WARNING: The
regex!
compiler plugin is orders of magnitude slower than the normalRegex::new(...)
usage. You should not use the compiler plugin unless you have a very special reason for doing so.
This sounds like a completely different regex engine is used for regex!
than for Regex::new()
. Why isn't regex!()
just a wrapper for Regex::new()
to combine the advantages from both worlds? As I understand it, these syntax-extension compiler plugins can execute arbitrary code; why not Regex::new()
?
The answer is very subtle: one feature of the macro is that the result of regex!
can be put into static data, like so:
static r: Regex = regex!("t?rust");
The main problem is that Regex::new()
uses heap allocations during the regex compilation. This is problematic and would require a rewrite of the Regex::new()
engine to also allow for static storage. You can also read burntsushi's comment about this issue on reddit.
There are some suggestions about how to improve regex!
:
static
support and just validate the regex string at compile time while still compiling the regex at runtimestatic
support by using a similar trick as lazy_static!
doesAs of the beginning of 2017, the developers are focused on stabilizing the standard API to release version 1.0. Since regex!
requires a nightly compiler anyway, it has a low priority right now.
However, the compiler-plugin approach could offer even better performance than Regex::new()
, which is already super fast: since the regex's DFA could be compiled into code instead of data, it has the potential to run a bit faster and benefit from compiler optimizations. But more research has to be done in the future to know for sure.