pythonpython-polars

How to convert negative values to others in polars DataFrame?


I want to convert negative float values in polars DataFrame,and I use this code:

x_origin.select(pl.col(pl.Float64), pl.when(pl.col(pl.Float64)<0).then(np.nan).otherwise(pl.col(pl.Float64)))

But it crashed with this:

Traceback (most recent call last):
  File "/home/wangyang1/.local/share/JetBrains/IntelliJIdea2024.3/python/helpers-pro/pydevd_asyncio/pydevd_asyncio_utils.py", line 117, in _exec_async_code
    result = func()
             ^^^^^^
  File "<input>", line 1, in <module>
  File "/home/wangyang1/.conda/envs/torchhydro1/lib/python3.11/site-packages/polars/dataframe/frame.py", line 9113, in select
    return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangyang1/.conda/envs/torchhydro1/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2029, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.DuplicateError: the name 'literal' is duplicate
It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.

But I don't need to rename these columns.

So how to solve my problem? I haven't find answer in documents of polars.


Solution

  • The problem is .then(np.nan) will produce multiple results with the name 'literal':

    pl.select(np.nan)
    
    shape: (1, 1)
    ┌─────────┐
    │ literal │ # <- name 'literal'
    │ ---     │
    │ f64     │
    ╞═════════╡
    │ NaN     │
    └─────────┘
    

    .name.keep() can be used to take the name from the original column instead.

    df.with_columns(
        pl.when(pl.col(pl.Float64)<0).then(np.nan).otherwise(pl.col(pl.Float64))
          .name.keep()
    )