[SOLVED] find a string match as many regular expressions as possible in a regular expression set

find a string match as many regular expressions as possible in a regular expression set

Suppose I have a regular expression set R, how can I find a string s that matches as many regular expressions as possible?

For example, if R = {a\*b, b\*, c}, then s could be "b". I am not sure, maybe z3 solver could be helpful?

Solution

Yes, z3 can handle this via regular-expressions and optimization. You can do this directly using SMTLib, or bindings from other languages to z3 (C, C++, Java, Haskell etc.) Below, find Python and Haskell versions:

Python

Using the Python bindings to z3:

from z3 import *

re1 = Concat(Star(Re("a")), Re("b"))
re2 = Star(Re("b"))
re3 = Re("c")

s = String('s')

o = Optimize()
i = If(InRe(s, re1), 1, 0) + If(InRe(s, re2), 1, 0) + If(InRe(s, re3), 1, 0)
o.maximize(i)

r = o.check()
if r == sat:
    print(o.model())
else:
    print("Solver said: %s" % r)

When I run this, I get:

[s = "b"]

which finds the string b as you described.

Haskell

Here's the same example, coded using the SBV library in Haskell:

{-# LANGUAGE OverloadedStrings #-}

import Data.SBV
import Data.SBV.RegExp

find = optimize Lexicographic $ do
          s <- sString "s"

          let res = [KStar "a" * "b", KStar "b", KStar "c"]

              count :: SInteger
              count = sum [oneIf (s `match` re) | re <- res]

          maximize "matchCount" count

When run, this produces:

Optimal model:
  s          = "b" :: String
  matchCount =   2 :: Integer

which also shows you how many of the reg-exps matched. (You can program so it also tells you exactly which ones matched as well.)