To easily understand my problem, below is a simplified version of some example input.
['foo', 1, 'a', 'foo', 2, 'foo', 1, 'b', 'foo', -1, 'foo', -1, "bar", 1, "c", "bar", 2, 'baz', 1, 'd', 'baz', -1, "bar", 3, "e", "bar", 4, 'qux', 1, 'stu', 1, 'f', 'stu', -1, 'qux', -1, 'bar', -1]
(I used "stu" because I ran out of placeholder names.)
The strings are function names (sort of, specifics later). The numbers after the function names specify the position of the argument in the function that follows. A position of -1 closes the function.
For example, ['foo',1,'a','foo',2,'b','foo',-1]
should be equivalent to foo('a', 'b')
.
This should also work when nested:
['foo', 1, 'a', 'foo', 2, 'foo', 1, 'b', 'foo', -1, 'foo', -1]
should be equivalent to foo('a', foo('b'))
and
['bar', 1, 'c', 'bar', 2, 'baz', 1, 'd', 'baz', -1, 'bar', 3, 'e', 'bar', 4, 'qux', 1, 'stu', 1, 'f', 'stu',-1, 'qux', -1, 'bar', -1]
should be equivalent to bar('c', baz('d'), e, qux(stu('f')))
.
My desired function should return a list. For example,
['foo', 1, 'a', 'foo', 2, 'foo', 1, 'b', 'foo', -1, 'foo', -1, 'bar', 1, 'c', 'bar', -1]
should result in
[['foo', 'a', ['foo', 'b']], ['bar', 'c']]
Now that the problem is clearer, my actual problem is slightly different. All elements of the list are integers. The function names are not strings, they're sequences of three integers. Thus, ['foo',1,'a','foo',2,'b','foo',-1]
is actually [1, 1, 1, 1, 104, 1, 1, 1, 2, 105, 1, 1, 1, -1]
.
The function names ([1, 1, 1]
in the above example) act as dictionary keys. The dictionary (called constructs
) looks something like this:
constructs = {
1: {
1: {
1: lambda *chars : print(''.join(chr(char) for char in chars))
}
}
}
So finally, the example should result in something like
[[lambda *chars : print(''.join(chr(char) for char in chars)), 104, 105]]
All the specifications about nesting and such should all still apply. I have no clue how to reliably and elegantly implement this, please help!
Thanks in advance.
Edit: I forgot to say that 0
is always ignored and skipped over, and a 0
following a function call will escape the function call and cause it to be treated as an argument. All of this functionality is implemented to some degree in my attempt so far, but it doesn't work when the same function is nested within itself. It's also inefficient and inelegant, with lots of potential problems, so I took to Stack Overflow for help writing a better one. Feel free to use it as a starting point!
Edit 2: here is the code for my attempt so far:
constructs = {
1: {
1: {
1: print,
}
}
}
def parse(code: list) -> list:
if len(code) <= 1:
return code
result = []
in_function = 0
for i, token in enumerate(code):
if in_function > 0:
in_function -= 1
continue
if token == 0:
continue
if result and result[-1][0][3] != -1:
if token in constructs and code[i + 1] in constructs[token] and code[i + 2] in constructs[token][code[i + 1]]:
if i < len(code) - 4 and code[i + 4] == 0:
result[-1][-1].append(token)
else:
if code[i + 3] == result[-1][0][3] + 1:
result[-1].append([])
result[-1][0] = code[i:i + 4]
in_function = 3
else:
result[-1][-1].append(token)
else:
if token in constructs and code[i + 1] in constructs[token] and code[i + 2] in constructs[token][code[i + 1]]:
if code[i + 3] == 1:
result.append([code[i:i + 4], []])
in_function = 3
else:
raise SyntaxError(f'function {code[i:i + 3]} has no previous separator {code[i + 3] - 1}')
else:
raise SyntaxError(f'function {code[i:i + 3]} was not recognized')
for i, function in enumerate(result):
result[i][0] = constructs[result[i][0][0]][result[i][0][1]][result[i][0][2]]
for j, argument in enumerate(result[i][1:]):
result[i][j + 1] = parse(argument)
return result
It works for parse([1, 1, 1, 1, 'Hello world', 1, 1, 1, 2, 'etc', 1, 1, 1, -1])
but not for parse([1, 1, 1, 1, 1, 1, 1, 1, 'Hello world', 1, 1, 1, -1, 1, 1, 1, 2, 'etc', 1, 1, 1, -1])
.
A valid list following this grammar rule can be parsed recursively with a function which, given a valid index, tries to parse the tokens starting from the index as keys to a function followed by the argument position (with -1 ending the function call), or otherwise defaults to a scalar value.
In addition to the parsed object, the function should also return the index to the next token in order to advance the index according to the number of tokens consumed at the current level.
Since there can be one or more of such expressions in the input list, the function should be called iteratively with returning values appended to an output list until the index reaches the end of the input:
def parse(code):
def parse_expr(index):
while index < size and code[index] == 0:
index += 1 # skip 0s
try:
call = []
while True:
key1, key2, key3, pos = code[index: (next_index := index + 4)]
if next_index < size and code[next_index] == 0:
raise ValueError # escape the token
if not call:
call.append(constructs[key1][key2][key3])
if pos == -1:
return call, next_index
obj, index = parse_expr(next_index)
call.append(obj)
except (ValueError, KeyError):
return code[index], index + 1
size = len(code)
index = 0
result = []
while index < size:
obj, index = parse_expr(index)
result.append(obj)
return result
so that:
print(parse([1, 1, 1, 1, 'Hello world', 1, 1, 1, 2, 'etc', 1, 1, 1, -1]))
print(parse([1, 1, 1, 1, 1, 1, 1, 1, 'Hello world', 1, 1, 1, -1, 1, 1, 1, 2, 'etc', 1, 1, 1, -1]))
print(parse([1, 1, 1, 1, 'a', 1, 1, 1, 2, 1, 1, 1, 1, 'b', 1, 1, 1, -1, 1, 1, 1, -1, 1, 1, 1, 1, 'c', 1, 1, 1, -1]))
outputs:
[[<built-in function print>, 'Hello world', 'etc']]
[[<built-in function print>, [<built-in function print>, 'Hello world'], 'etc']]
[[<built-in function print>, 'a', [<built-in function print>, 'b']], [<built-in function print>, 'c']]
Demo: https://ideone.com/4ct2KA
The code above assumes the input to be valid and ignores the keys to a function after the first argument since they are redundant. Argument positions are ignored too since they are always incremented from 1, although they can be used for input validation as necessary.