I am dealing with a large dataframe (198,619 rows x 19,110 columns) and so am using the polars package to read in the tsv file. Pandas just takes too long.
However, I now face an issue as I want to transform each cell's value x
raising it by base 2 as follows: 2^x
.
I run the following line as an example:
df_copy = df
df_copy[:,1] = 2**df[:,1]
But I get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/tmp/pbs.98503.hn-10-03/ipykernel_196334/3484346087.py in <module>
1 df_copy = df
----> 2 df_copy[:,1] = 2**df[:,1]
~/.local/lib/python3.9/site-packages/polars/internals/frame.py in __setitem__(self, key, value)
1845
1846 # dispatch to __setitem__ of Series to do modification
-> 1847 s[row_selection] = value
1848
1849 # now find the location to place series
~/.local/lib/python3.9/site-packages/polars/internals/series.py in __setitem__(self, key, value)
512 self.__setitem__([key], value)
513 else:
--> 514 raise ValueError(f'cannot use "{key}" for indexing')
515
516 def estimated_size(self) -> int:
ValueError: cannot use "slice(None, None, None)" for indexing
This should be simple but I can't figure it out as I'm new to Polars.
The secret to harnessing the speed and flexibility of Polars is to learn to use Expressions. As such, you'll want to avoid Pandas-style indexing methods.
Let's start with this data:
import polars as pl
nbr_rows = 4
nbr_cols = 5
df = pl.DataFrame({
"col_" + str(col_nbr): pl.int_range(col_nbr, nbr_rows + col_nbr, eager=True)
for col_nbr in range(0, nbr_cols)
})
df
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 0 ┆ 1 ┆ 2 ┆ 3 ┆ 4 │
│ 1 ┆ 2 ┆ 3 ┆ 4 ┆ 5 │
│ 2 ┆ 3 ┆ 4 ┆ 5 ┆ 6 │
│ 3 ┆ 4 ┆ 5 ┆ 6 ┆ 7 │
└───────┴───────┴───────┴───────┴───────┘
In Polars we would express your calculations as:
df_copy = df.select(pl.lit(2).pow(pl.all()).name.keep())
print(df_copy)
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 1.0 ┆ 2.0 ┆ 4.0 ┆ 8.0 ┆ 16.0 │
│ 2.0 ┆ 4.0 ┆ 8.0 ┆ 16.0 ┆ 32.0 │
│ 4.0 ┆ 8.0 ┆ 16.0 ┆ 32.0 ┆ 64.0 │
│ 8.0 ┆ 16.0 ┆ 32.0 ┆ 64.0 ┆ 128.0 │
└───────┴───────┴───────┴───────┴───────┘