pythonmathsimplify

how to simplify huge arithmatic expression?


I've a huge expression like:

x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52] + FUNC1(z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC0(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49)) + FUNC2(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49), a49, x49) + y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51] + FUNC1(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53], x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + movr[54] + m1[54]) + RET(h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC0(a49) + FUNC2(a49, x49, y49) + w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50] + FUNC1(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + movr[53] + m1[53]) + RET(a49 + v49 + FUNC1(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]) + RET(x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52], y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) ...

I need to simplify this expression by replacing some part of repeating expression with other variables. For example:

a = RET(v49, t49, z49) 
b= w49 + h49 + FUNC1(v49) + a + movr[50] + m1[50] 
and so on...

my problem is; this is really huge expression (like 2MB long expression) and doing this manually is near impossible and also without mistakes.

now my question is; is there any app that'll do such thing? or any python program to do so?

I can program python easily, but I lack of such algorithm knowing.

any help appreciated.


Solution

  • The following function extracts all function calls and puts them into variables.

    def simplify(progstr, variable_prefix='x'):
        progstr = f' {progstr} '
        prog = []
        while progstr.count('(') > 0:
            for i, c in enumerate(progstr):
                if c == ')':
                    c2, i2 = None, i
                    while c2 != '(':
                        i2 -= 1
                        c2 = progstr[i2]
                    i2 -= 1
                    c2 = progstr[i2]
                    while c2 not in [',', ' ', '(', ')']:
                        i2 -= 1
                        c2 = progstr[i2]
                    variable = progstr[i2+1:i+1]
                    vname = f'{variable_prefix}{str(len(prog))}'
                    progstr = progstr.replace(variable, vname)
                    prog.append(f'{vname} = {variable}')
                    break
        prog.append(progstr[1:-1])
        return '\n'.join(prog)
    
    expression = 'x49 + t49 + FUNC1(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51]) + RET(y49 + z49 + FUNC1(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50]) + RET(w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49, t49) + movr[51] + m1[51], w49 + h49 + FUNC1(v49) + RET(v49, t49, z49) + movr[50] + m1[50], v49) + movr[52] + m1[52]'
    
    print(simplify(expression, 'x'))
    

    prints

    x0 = FUNC1(v49)
    x1 = RET(v49, t49, z49)
    x2 = FUNC1(w49 + h49 + x0 + x1 + movr[50] + m1[50])
    x3 = RET(w49 + h49 + x0 + x1 + movr[50] + m1[50], v49, t49)
    x4 = FUNC1(y49 + z49 + x2 + x3 + movr[51] + m1[51])
    x5 = RET(y49 + z49 + x2 + x3 + movr[51] + m1[51], w49 + h49 + x0 + x1 + movr[50] + m1[50], v49)
    t49 + x4 + x5 + movr[52] + m1[52]
    

    Next to making the code more readable, this allows avoiding much repeated computation, which should speed it up a lot (especially if the individual function calls are costly), e.g. here FUNC1(v49) gets executed only once as opposed to 5 times.

    (Edit): How it works:

    While there are parentheses in the expression, do the following: Go through the expression from left to right, until you encounter a closing bracket (call this location j), then walk to the left until you encounter an opening bracket, then walk to the left until you encounter a whitespace, comma or bracket (and call this location i). The segment expression[i:j] then marks the first function call. Then simply replace each occurrence expression[i:j] in expression by a variable name x and add x = expression[i:j] to your list of variables.

    Some remarks on the code: