I am extremely happy with the polars expression syntax, so much so that a lot of my feature engineering is expressed in polars expressions.
However, I am now trying to move the feature engineering to JSON or YAML files (for MLOps reasons).
The question is - how could I encode this as a JSON file:
configuration = {
'features': [
pl.col('col1').fill_null(0).log().le(0.2).alias('feature1'),
pl.col('col2').fill_null(0).log().le(0.2).alias('feature2'),
pl.col('col3').fill_null(0).log().le(0.2).alias('feature3')
],
'filters': [
pl.col('col4') >= 500_000,
pl.col('col5').is_in(['A', 'B'])
]
}
# This is how I use it - just for context
X = (df
.filter(pl.all(configuration['filters']))
.select(configuration['features'])
)
Any ideas on how I could serialize (or re-write) this as JSON such that it could be converted back to Polars expressions?
Note that this question has a lot of overlap with Possible to Stringize a Polars Expression?, but it's not a duplicate.
As of polars >= 0.18.1
we directly support serializing/deserializing expressions to and from json.
def test_expression_json() -> None:
# create an expression
e = pl.col("foo").sum().over("bar")
# serialize to json
json = e.meta.serialize(format="json")
# deserialize back to an expression
round_tripped = pl.Expr.deserialize(json.encode(), format="json")
# assert expression equality
assert round_tripped.meta == e