I am trying to get any mathematical string to split into a list by operators (i.e. "+"
, "-"
, "/"
, "*"
), while keeping anything in a matching number of brackets together as one list element.
Here are some very random examples and the desired outputs of what I want to achieve:
import math
equation = "5+5*10"
equation_segmented = ["5", "+", "5", "*", "10"]
equation = "(2*2)-5*(math.sqrt(9)+2)"
equation_segmented = ["(2*2)", "-", "5", "*", "(math.sqrt(9)+2)"]
equation = "(((5-3)/2)*0.5)+((2*2))*(((math.log(5)+2)-2))"
equation_segmented = ["(((5-3)/2)*0.5)", "+", "((2*2))", "*", "(((math.log(5)+2)-2))"]
Note: alphabetical letters (or symbols like "π") should be included in the brackets too.
My first thought was using a regex:
import re
equation_segmented = re.split("([\+|\-|\*|\/]|\(.*\))", equation)
The problem here, however, is that it does not account for matching brackets.
I then thought of iterating through the string manually and keeping track of the parentheses with a counter, but did not get it to work (I was pretty much only able to write my own 're.split' function).
Lastly I went back to regex (equation_segmented = re.split("([\+|\-|\*|\/])", equation)
) and thought about just splitting the string by operators, to then "".join()
all the list elements in matching brackets afterwards - yet again to no avail.
I am not sure if this might be a problem for a parser, but I am not sure where to start.
A custom (non-regex) function is trivial. All you need to do is ensure that you keep track of opening and closing parentheses.
Assuming the string formulae are syntactically correct then:
OPS = set("+-*/")
PMAP = {"(": 1, ")": -1}
def tokenizer(s: str) -> list[str]:
result = [""]
pcount = 0
for c in s:
pcount += PMAP.get(c, 0)
if c in OPS:
if pcount == 0:
result.extend([c, ""])
continue
result[-1] += c
return result
equations = [
"5+5*10",
"(2*2)-5*(math.sqrt(9)+2)",
"(((5-3)/2)*0.5)+((2*2))*(((math.log(5)+2)-2))",
]
for eq in equations:
print(tokenizer(eq))
Output:
['5', '+', '5', '*', '10']
['(2*2)', '-', '5', '*', '(math.sqrt(9)+2)']
['(((5-3)/2)*0.5)', '+', '((2*2))', '*', '(((math.log(5)+2)-2))']