Suppose I have a regular expression set R, how can I find a string s that matches as many regular expressions as possible?
For example, if R = {a\*b, b\*, c}
, then s
could be "b"
. I am not sure, maybe z3 solver could be helpful?
Yes, z3 can handle this via regular-expressions and optimization. You can do this directly using SMTLib, or bindings from other languages to z3 (C, C++, Java, Haskell etc.) Below, find Python and Haskell versions:
Using the Python bindings to z3:
from z3 import *
re1 = Concat(Star(Re("a")), Re("b"))
re2 = Star(Re("b"))
re3 = Re("c")
s = String('s')
o = Optimize()
i = If(InRe(s, re1), 1, 0) + If(InRe(s, re2), 1, 0) + If(InRe(s, re3), 1, 0)
o.maximize(i)
r = o.check()
if r == sat:
print(o.model())
else:
print("Solver said: %s" % r)
When I run this, I get:
[s = "b"]
which finds the string b
as you described.
Here's the same example, coded using the SBV library in Haskell:
{-# LANGUAGE OverloadedStrings #-}
import Data.SBV
import Data.SBV.RegExp
find = optimize Lexicographic $ do
s <- sString "s"
let res = [KStar "a" * "b", KStar "b", KStar "c"]
count :: SInteger
count = sum [oneIf (s `match` re) | re <- res]
maximize "matchCount" count
When run, this produces:
Optimal model:
s = "b" :: String
matchCount = 2 :: Integer
which also shows you how many of the reg-exps matched. (You can program so it also tells you exactly which ones matched as well.)