regexdenial-of-service

Is it possible to write a regex engine that uses grouping but does not do backtracking?


I am trying to understand redos in details and it is more or less clear why (a|a)+x fails on aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab string, but I am curious if there is any example which does not use grouping? I read that the Thompson engine is not vulnerable to this problem, because it does not do backtracking, but as far as I understand this means that it cannot have grouping either. Is it possible to do grouping without backtracking and the vulnerability that comes with it?


Solution

  • Ok, so the answer is Re2 according to Wiktor Stribiżew. It is possible to support grouping without backtracking. Re2 does not support stuff like backreferences, which is completely understandable, at least I doubt many engines prepare for patterns like this one /(a)(a|\1)+b/.test("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaax"). They write that lookaround is not fully supported either because of this. So these engines have limits, but they are good enough. Many people try to solve everything with a single pattern, which fuels this problem.