pythondataframevectorizationpython-polars

Operation on all columns of a type in modern Polars


I have a piece of code that works in Polars 0.20.19, but I don't know how to make it work in Polars 1.10.

The working code (in Polars 0.20.19) is very similar to the following:

def format_all_string_fields_polars() -> pl.Expr:
  return (
      pl.when(
          (pl.col(pl.Utf8).str.strip().str.lengths() == 0) | # ERROR ON THIS LINE
          (pl.col(pl.Utf8) == "NULL")
      )
      .then(None)
      .otherwise(pl.col(pl.Utf8).str.strip())
      .keep_name()
  )

df.with_columns(format_all_string_fields_polars())

I have converted the pl.Utf8 dtype to pl.String, but it keeps giving me the same error:

AttributeError: 'ExprStringNameSpace' object has no attribute 'strip'

The function is supposed to perform the When-Then operation on all string fields of the dataframe, in-place, but return all columns in the dataframe (including the non-string columns as well).

How do I convert this function to a working piece of code in Polars 1.10?


Solution

  • def format_all_string_fields_polars() -> pl.Expr:
      return (
          pl.when(
              (pl.col(pl.String).str.strip_chars().str.len_chars() == 0) | 
              (pl.col(pl.String) == "NULL")
          )
          .then(None)
          .otherwise(pl.col(pl.String).str.strip_chars())
          .name.keep()
      )