pythonregexfunctionabstract-syntax-tree

Dividing nested calls into several lines


I have a function like this in Python (the capital letters can represent constants, functions, anything, but not function calls):

def f(x):
    a = foo1(A, B, foo3(E, foo2(A, B)))
    b = foo3(a, E)
    return b

and I want to break it up into "atomic" operations like this:

def f(x):
    tmp1 = foo2(A, B)
    tmp2 = foo3(E, tmp1)
    a = foo1(A, B, tmp2)
    b = foo3(a, E)
    return b

In other words, exactly one function call and one assignment per line.

Is there a way I can implement this source code transformation in Python? A program that takes in the string representation of such a function and returns the transformed version. I know I need to use the AST representation, but I don't really know how to proceed.


Solution

  • I think this is what you want exactly:

    s = """def f(x):
        a = foo1(A, B, foo3(E, foo2(A, B)))
        b = foo3(a, E)
        return b"""
    
    import re
    
    result = []
    reg = re.compile(r'^(\s+).*?(\w+\([^\(\)]*\))')
    count = 1
    for l in s.splitlines():
        r = reg.search(l)
        if not r:
            result.append(l)
            continue
        while r:
            indent = r[1]
            var = f'tmp_{count}'
            result.append(f'{indent}{var} = {r[2]}')
            count += 1
            l = l[:r.start(2)] + var + l[r.end(2):]
            r = reg.search(l)
        else:
            if l.strip():
                result.append(l)
    
    
    print('\n'.join(result))
    

    and it will show:

    def f(x):
        tmp_1 = foo2(A, B)
        tmp_2 = foo3(E, tmp_1)
        tmp_3 = foo1(A, B, tmp_2)
        a = tmp_3
        tmp_4 = foo3(a, E)
        b = tmp_4
        return b