Is it possible in python polars to transform the root_names of expression meta data? E.g. if I have an expression like
expr = pl.col("A").dot(pl.col("B")).alias("AdotB")
to add suffixes to the root_names, e.g. transforming the expression to
pl.col("A_suffix").dot(pl.col("B_suffix")).alias("AdotB_suffix")
I know that expr.meta.root_names()
gives back a list of the column names, but I could not find a way to transform them.
There is an example in the tests that does query plan node rewriting in Python with callbacks:
But I can't see any equivalent API for rewriting expressions?
Out of interest, there is .serialize()
which can dump to JSON.
expr.meta.serialize(format="json")
# '{"Alias":[{"Agg":{"Sum":{"BinaryExpr":{"left":{"Column":"A"},"op":"Multiply","right":{"Column":"B"}}}}},"AdotB"]}'
# ^^^^^ ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^
Technically, you could modify the Alias and Column values, and .deserialize()
back into an expression.
def suffix_all(expr, suffix):
def _add_suffix(obj):
if "Column" in obj:
obj["Column"] = obj["Column"] + suffix
if "Alias" in obj:
obj["Alias"][-1] += suffix
return obj
ast = expr.meta.serialize(format="json")
new_ast = json.loads(ast, object_hook=_add_suffix)
return pl.Expr.deserialize(json.dumps(new_ast).encode(), format="json")
df = pl.DataFrame({"A_suffix": [2, 7, 3], "B_suffix": [10, 7, 1]})
expr = pl.col("A").dot(pl.col("B")).alias("AdotB")
df.with_columns(expr.pipe(suffix_all, "_suffix"))
shape: (3, 3)
┌──────────┬──────────┬──────────────┐
│ A_suffix ┆ B_suffix ┆ AdotB_suffix │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════════╪══════════╪══════════════╡
│ 2 ┆ 10 ┆ 72 │
│ 7 ┆ 7 ┆ 72 │
│ 3 ┆ 1 ┆ 72 │
└──────────┴──────────┴──────────────┘
Which does seem to "work" in this case, but the serialize docs do contain a warning:
Serialization is not stable across Polars versions
And it's probably just not a recommended approach in general.