pythonpython-polars

Polars transform meta data of expressions


Is it possible in python polars to transform the root_names of expression meta data? E.g. if I have an expression like

expr = pl.col("A").dot(pl.col("B")).alias("AdotB")

to add suffixes to the root_names, e.g. transforming the expression to

pl.col("A_suffix").dot(pl.col("B_suffix")).alias("AdotB_suffix")

I know that expr.meta.root_names() gives back a list of the column names, but I could not find a way to transform them.


Solution

  • There is an example in the tests that does query plan node rewriting in Python with callbacks:

    But I can't see any equivalent API for rewriting expressions?

    Out of interest, there is .serialize() which can dump to JSON.

    expr.meta.serialize(format="json")
    
    # '{"Alias":[{"Agg":{"Sum":{"BinaryExpr":{"left":{"Column":"A"},"op":"Multiply","right":{"Column":"B"}}}}},"AdotB"]}'
    #    ^^^^^                                         ^^^^^^^^^^                             ^^^^^^^^^^        ^^^^^
    

    Technically, you could modify the Alias and Column values, and .deserialize() back into an expression.

    def suffix_all(expr, suffix):
        def _add_suffix(obj):
            if "Column" in obj:
                obj["Column"] = obj["Column"] + suffix
            if "Alias" in obj:
                obj["Alias"][-1] +=  suffix
            return obj 
        ast = expr.meta.serialize(format="json")
        new_ast = json.loads(ast, object_hook=_add_suffix)
    
        return pl.Expr.deserialize(json.dumps(new_ast).encode(), format="json")
    
    df = pl.DataFrame({"A_suffix": [2, 7, 3], "B_suffix": [10, 7, 1]})
    
    expr = pl.col("A").dot(pl.col("B")).alias("AdotB")
    
    df.with_columns(expr.pipe(suffix_all, "_suffix"))
    
    shape: (3, 3)
    ┌──────────┬──────────┬──────────────┐
    │ A_suffix ┆ B_suffix ┆ AdotB_suffix │
    │ ---      ┆ ---      ┆ ---          │
    │ i64      ┆ i64      ┆ i64          │
    ╞══════════╪══════════╪══════════════╡
    │ 2        ┆ 10       ┆ 72           │
    │ 7        ┆ 7        ┆ 72           │
    │ 3        ┆ 1        ┆ 72           │
    └──────────┴──────────┴──────────────┘
    

    Which does seem to "work" in this case, but the serialize docs do contain a warning:

    Serialization is not stable across Polars versions

    And it's probably just not a recommended approach in general.