regexz3z3pysbv

find a string match as many regular expressions as possible in a regular expression set


Suppose I have a regular expression set R, how can I find a string s that matches as many regular expressions as possible?

For example, if R = {a\*b, b\*, c}, then s could be "b". I am not sure, maybe z3 solver could be helpful?


Solution

  • Yes, z3 can handle this via regular-expressions and optimization. You can do this directly using SMTLib, or bindings from other languages to z3 (C, C++, Java, Haskell etc.) Below, find Python and Haskell versions:

    Python

    Using the Python bindings to z3:

    from z3 import *
    
    re1 = Concat(Star(Re("a")), Re("b"))
    re2 = Star(Re("b"))
    re3 = Re("c")
    
    s = String('s')
    
    o = Optimize()
    i = If(InRe(s, re1), 1, 0) + If(InRe(s, re2), 1, 0) + If(InRe(s, re3), 1, 0)
    o.maximize(i)
    
    r = o.check()
    if r == sat:
        print(o.model())
    else:
        print("Solver said: %s" % r)
    

    When I run this, I get:

    [s = "b"]
    

    which finds the string b as you described.

    Haskell

    Here's the same example, coded using the SBV library in Haskell:

    {-# LANGUAGE OverloadedStrings #-}
    
    import Data.SBV
    import Data.SBV.RegExp
    
    find = optimize Lexicographic $ do
              s <- sString "s"
    
              let res = [KStar "a" * "b", KStar "b", KStar "c"]
    
                  count :: SInteger
                  count = sum [oneIf (s `match` re) | re <- res]
    
              maximize "matchCount" count
    

    When run, this produces:

    Optimal model:
      s          = "b" :: String
      matchCount =   2 :: Integer
    

    which also shows you how many of the reg-exps matched. (You can program so it also tells you exactly which ones matched as well.)