Some stuff online like Selecting with Indexing is an anti-pattern in Polars: How to parse and transform (select/filter?) a CSV that seems to require so? suggests using indexing like df["a"]
is discouraged over df.get_column("a")
. But the linked reference guide is gone now: https://pola-rs.github.io/polars-book/user-guide/howcani/selecting_data/selecting_data_indexing.html#selecting-with-indexing
So is it discouraged? What about similar to pandas pl.col.a
instead of pl.col("a")
?
For reference: You can find an old version of that page through the Wayback Machine, just keep in mind such pages may be outdated and contain no longer relevant or even correct information at times.
pl.col.name
works the same as pl.col("name")
, you can use either based on your personal preference, although for the same reasons you'll see df["col"]
being used over df.col
in pandas, you must use pl.col("column name")
if the column name is not a valid python attribute name (e.g. contains spaces), so some prefer to always use col() for consistency.
df["column"]
is directly equivalent to df.get_column("column")
(in fact, it calls get_column under the hood), so again which one you rather use is a personal preference - however, ideally you should not use either of them in first place.
To get the most out of polars, you should use the Lazy API as much as you can. You cannot use neither lf['col']
nor lf.get_column()
in Lazy mode.
Instead of extracting a series then applying operations on it, first define an expression that applies those operations then select() or filter() - remember that you can and should compose & chain expressions.