python-polars

Serializing Polars expressions as JSON or YAML file?


I am extremely happy with the polars expression syntax, so much so that a lot of my feature engineering is expressed in polars expressions.

However, I am now trying to move the feature engineering to JSON or YAML files (for MLOps reasons).

The question is - how could I encode this as a JSON file:


configuration = {
     'features': [
          pl.col('col1').fill_null(0).log().le(0.2).alias('feature1'),
          pl.col('col2').fill_null(0).log().le(0.2).alias('feature2'),
          pl.col('col3').fill_null(0).log().le(0.2).alias('feature3')
                ],
     'filters': [
          pl.col('col4') >= 500_000, 
          pl.col('col5').is_in(['A', 'B'])
      ]
}

# This is how I use it - just for context
X = (df
         .filter(pl.all(configuration['filters']))
         .select(configuration['features'])
       )

Any ideas on how I could serialize (or re-write) this as JSON such that it could be converted back to Polars expressions?

Note that this question has a lot of overlap with Possible to Stringize a Polars Expression?, but it's not a duplicate.


Solution

  • As of polars >= 0.18.1 we directly support serializing/deserializing expressions to and from json.

    def test_expression_json() -> None:
        # create an expression
        e = pl.col("foo").sum().over("bar")
        
        # serialize to json
        json = e.meta.serialize(format="json")
    
        # deserialize back to an expression
        round_tripped = pl.Expr.deserialize(json.encode(), format="json")
    
        # assert expression equality
        assert round_tripped.meta == e