I want to convert negative float values in polars DataFrame,and I use this code:
x_origin.select(pl.col(pl.Float64), pl.when(pl.col(pl.Float64)<0).then(np.nan).otherwise(pl.col(pl.Float64)))
But it crashed with this:
Traceback (most recent call last):
File "/home/wangyang1/.local/share/JetBrains/IntelliJIdea2024.3/python/helpers-pro/pydevd_asyncio/pydevd_asyncio_utils.py", line 117, in _exec_async_code
result = func()
^^^^^^
File "<input>", line 1, in <module>
File "/home/wangyang1/.conda/envs/torchhydro1/lib/python3.11/site-packages/polars/dataframe/frame.py", line 9113, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/wangyang1/.conda/envs/torchhydro1/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2029, in collect
return wrap_df(ldf.collect(callback))
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.DuplicateError: the name 'literal' is duplicate
It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.
But I don't need to rename these columns.
So how to solve my problem? I haven't find answer in documents of polars.
The problem is .then(np.nan)
will produce multiple results with the name 'literal':
pl.select(np.nan)
shape: (1, 1)
┌─────────┐
│ literal │ # <- name 'literal'
│ --- │
│ f64 │
╞═════════╡
│ NaN │
└─────────┘
.name.keep()
can be used to take the name from the original column instead.
df.with_columns(
pl.when(pl.col(pl.Float64)<0).then(np.nan).otherwise(pl.col(pl.Float64))
.name.keep()
)