haskellfunctional-programmingcompositionpointfree

Can Bird-Meertens be used to calculate a linear-time solution to this longest valid parentheses problem in Haskell?


The problem: compute the length of the longest matched/valid substring of parentheses given a string composed only of ( and ) characters.

Here's a naive solution:

isValid :: String -> Bool
isValid = go' (0 :: Int)
  where
    go' = memo2 go
    go 0 [] = True
    go 0 (')' : _ ) = False
    go n (')' : cs) = go' (n-1) cs
    go n ('(' : cs) = go' (n+1) cs
    go _ _ = False

solution :: String -> Int
solution = longest . valid . substrings
  where
    substrings = tails >=> inits
    valid      = filter isValid
    longest    = maximum . map length

Now I find I can come up with these inefficient, simple solutions to programming problems quite easily, but the efficient versions are more elusive. The whole idea of Bird-Meertens formalism is that if you can come up with a slow version, the calculus will give you a fast version, and so I was hopeful that it might be of some use.

Additionally, this looks superficially similar to the example on Wikipedia, since we are computing all possible subarrays, mapping something, and taking a maximum at the end.

Can Bird-Meertens help us here?

I attempted to inline definitions and got here:

solution = maximum . map length . concat . map (filter isValid . tails) . inits

But it's not obvious to me that any of the relevant rules are applicable from this point.


Solution

  • I believe this can be done in linear time with quite a compact algorithm. At each position, we will compute a [Int] whose head is the length of the longest valid string ending at that position and whose remaining elements represent lengths of earlier valid substrings separated by open parens. The answer is then just the largest head. With this intuition in mind, it should be relatively easy to read:

    solution :: String -> Int
    solution = maximum . map head . scanl cons [0] where
        cons ns = \case
            ')' -> case ns of
                n:n':nt -> n+n'+2:nt
                _ -> [0]
            '(' -> 0:ns
            _ -> [0]
    

    Unfortunately, this solution was created not via Birds-Meerten-style calculation, but rather via the Feynman algorithm:

    1. Write down the problem.
    2. Think very hard.
    3. Write down the solution.

    I suspect with some effort, I could find some calculation that transforms this into an inefficient but readable specification (perhaps not the exact specification you started with), and then I could present that by reading the calculation backwards to show how to move from spec to optimized version, but is it really worth it?