algorithmrecursionbrace-expansion

How to recursively expand nested braces as cartesian product


(I deliberately removed the "Python" tag someone added so as to keep this question language-agnostic.)

I think I managed to solve this iteratively by unwinding a stack for each brace, which is a solution I'm happy with (assuming it works). But I failed to implement a single function recursion. Can someone please help? What I tried turned out too complex for me to envision, with how different scenarios for appending rather than expanding are needed depending on the relationship between the commas and braces.

Is there an elegant recursion for something like this? I would also appreciate any recommendations for other types of established parsing/language methods I could study for this kind of problem.

Problem description: given a string of letters, valid curly braces, and commas; expand the braces so that the result includes all possibilities of the substring of letters immediately to the left of the braces concatenated with each of the comma-separated substrings of letters inside the braces. Braces can be nested.

Examples:

"abc{d,e}f{g,hi}" => ['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']


"abc{d{0,1},e}f{g{0{a,b},1,2},hi}" =>

['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']

(Hopefully working) iterative Python code:

# Brace expansion
# Example:
# input "abc{d,e}f{g,hi}"
# output ["abcdfg", "abcdfhi", "abcefg", "abcefhi"]
def f(s):
  stack = []

  i = 0

  while i < len(s):
    j = i

    while s[j] not in ['{', '}', ',']:
      j += 1

    if s[i:j]:
      stack.append([s[i:j]])

    i = j

    if s[i] == '{':
      stack.append('{')
      i += 1

    if s[i] == ',':
      i += 1

    # Unwind stack
    if s[i] == '}':
      # Get comma, separated elements
      lst = []
      while stack[-1] != '{':
        lst.extend(reversed(stack.pop()))
      lst = reversed(lst)
      stack.pop() # Remove '{'
      pfxs = reversed(stack.pop())
      stack.append([pfx + sfx for pfx in pfxs for sfx in lst])
      i += 1

  # Cartesian product of stack
  result = []

  for i in range(1, len(stack)):
    result = [pfx + sfx for pfx in stack[i-1] for sfx in stack[i]]
    
  return result

Output:

s = "abc{d,e}f{g,hi}"

print(s)
print(f(s))

s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"

print("")
print(s)
print(f(s))

"""
abc{d,e}f{g,hi}
['abcdfg', 'abcdfhi', 'abcefg', 'abcefhi']

abc{d{0,1},e}f{g{0{a,b},1,2},hi}
['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi',
 'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi',
 'abcefg0a', 'abcefg0b', 'abcefg1', 'abcefg2', 'abcefhi']
"""

Solution

  • To differentiate between the case where recursion is initiated by a comma or by a brace, I would suggest to pass as argument the prefixes that should be prefixed to whatever the recursive call will process.

    I would also tokenize the input using a regular expression.

    Here is how that would look:

    import re
    from itertools import product 
    
    def expand(s):
        tokens = re.finditer(r"[^{},]+|.|$", s)
        
        def recur(prefixes):
            token = next(tokens).group(0) or "}"
            if token == "}":
                return prefixes
            elif token == ",":
                return prefixes + recur([""])
            elif token == "{":
                return recur([a + b for a, b in product(prefixes, recur([""]))])
            else:
                return recur([prefix + token for prefix in prefixes])
    
        return recur([""])
    

    Example call:

    s = "abc{d{0,1},e}f{g{0{a,b},1,2},hi}"
    
    print(expand(s))
    

    Output:

    ['abcd0fg0a', 'abcd0fg0b', 'abcd0fg1', 'abcd0fg2', 'abcd0fhi', 
     'abcd1fg0a', 'abcd1fg0b', 'abcd1fg1', 'abcd1fg2', 'abcd1fhi', 
     'abcefg0a',  'abcefg0b',  'abcefg1',  'abcefg2',  'abcefhi']
    

    Version with input validation

    If the solution should raise a controlled exception when the braces are not balanced, then extend the function to this:

    def expand(s):
        tokens = re.finditer(r"[^{},]+|.|$", s)
        
        def recur(prefixes, until):
            token = next(tokens).group(0)
            if token == until:
                return prefixes
            elif token == ",":
                return prefixes + recur([""], until)
            elif token == "{":
                return recur([a + b for a, b in product(prefixes, recur([""], "}"))], until)
            elif token == "}":
                raise ValueError("Unexpected closing brace")
            elif token:
                return recur([prefix + token for prefix in prefixes], until)
            else:
                raise ValueError("Missing closing brace")
        
        return recur([""], "")