I have just started trying to convert some stock trading python code from being Pandas-based to use Polars. So, I am a complete newbie to Polars, not much better at Python, but I get by. I am migrating to Polars because of the increased performance of Polars. The following is my Pandas-based ATR function:
def ATR(df: pa.DataFrame, window_size: int = 14) -> pa.DataFrame:
high, low, prev_close = df['high'], df['low'], df['close'].shift()
tr_all = [high - low, high - prev_close, low - prev_close]
tr_all = [tr.abs() for tr in tr_all]
tr = pa.concat(tr_all, axis = 1).max(axis = 1)
df['ATR'] = tr.ewm(alpha = 1/window_size, min_periods = window_size, adjust = False, ignore_na = True).mean()
return df
I call this function like this:
raw_dd_df = ATR(raw_dd_df, window_size = slowline_period)
and produces results similar to the following:
date open high low ... volume vwap mid ATR
300 2024-01-01 1.27300 1.27330 1.26936 ... 0 1.27189 1.27300 NaN
299 2024-01-02 1.27291 1.27597 1.26105 ... 242445 1.26779 1.26707 NaN
298 2024-01-03 1.26123 1.26765 1.26123 ... 296035 1.26414 1.26384 NaN
297 2024-01-04 1.26644 1.27295 1.26565 ... 270883 1.26830 1.26730 NaN
296 2024-01-05 1.26816 1.27710 1.26113 ... 333038 1.26949 1.26987 NaN
.. ... ... ... ... ... ... ... ... ...
4 2024-12-26 1.25296 1.25474 1.25005 ... 245898 1.25241 1.25242 0.009308
3 2024-12-27 1.25187 1.25925 1.25046 ... 234639 1.25464 1.25443 0.009282
2 2024-12-29 1.25756 1.25756 1.25756 ... 0 1.25756 1.25756 0.008846
1 2024-12-30 1.25726 1.26070 1.25059 ... 243089 1.25587 1.25610 0.008910
0 2024-12-31 1.25493 1.25688 1.25048 ... 227765 1.25336 1.25303 0.008784
The following is my attempt to rewite the function to use Polars:
def pl_ATR(df: pl.DataFrame, window_size: int = 14) -> pl.DataFrame:
high, low, prev_close = df['High'], df['Low'], df['Close'].shift()
tr_all = [high - low, high - prev_close, low - prev_close]
tr_all = [tr.abs() for tr in tr_all]
tr = pl.concat(tr_all, rechunk = True).max()
df['ATR'] = pl.Expr.ewm_mean(tr, alpha = 1/window_size, min_samples = window_size, adjust = False, ignore_nulls = True)
return df
I am not sure if i have got tr = pl.concat(tr_all, rechunk = True).max()
right, but I get errors at the next line (when trying to do pl.Expr.ewm_mean
):
Traceback (most recent call last):
File "/home/stuart/Projects/Python/Trading/Scratches/Polars/demo_polars.py", line 48, in <module>
df = pl_ATR(df, window_size = 20)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stuart/Projects/Python/Trading/Scratches/Polars/demo_polars.py", line 37, in pl_ATR
df['ATR'] = pl.Expr.ewm_mean(tr, alpha = 1/window_size, min_samples = window_size, adjust = False, ignore_nulls = True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stuart/Projects/Python/Env/lib/python3.12/site-packages/polars/_utils/deprecation.py", line 119, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/stuart/Projects/Python/Env/lib/python3.12/site-packages/polars/expr/expr.py", line 9488, in ewm_mean
return self._from_pyexpr(
^^^^^^^^^^^^^^^^^
AttributeError: 'float' object has no attribute '_from_pyexpr'
I have spent hours trying to get it working, but to no avail. Can anybody help please?
Regards, Stuart
Let's start with some data
from datetime import date
import pandas as pd
import polars as pl
data = {
"date": pl.date_range(date(2025, 4, 1), date(2025, 4, 6), eager=True),
"high": [102, 104, 103, 107, 110, 112],
"low": [98, 99, 100, 103, 106, 108],
"close": [101, 103, 102, 106, 109, 111],
}
df_pl = pl.DataFrame(data)
df_pd = df_pl.to_pandas()
# shape: (6, 4)
# ┌────────────┬──────┬─────┬───────┐
# │ date ┆ high ┆ low ┆ close │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ date ┆ i64 ┆ i64 ┆ i64 │
# ╞════════════╪══════╪═════╪═══════╡
# │ 2025-04-01 ┆ 102 ┆ 98 ┆ 101 │
# │ 2025-04-02 ┆ 104 ┆ 99 ┆ 103 │
# │ 2025-04-03 ┆ 103 ┆ 100 ┆ 102 │
# │ 2025-04-04 ┆ 107 ┆ 103 ┆ 106 │
# │ 2025-04-05 ┆ 110 ┆ 106 ┆ 109 │
# │ 2025-04-06 ┆ 112 ┆ 108 ┆ 111 │
# └────────────┴──────┴─────┴───────┘
Calling your (pandas) ATR
function will give us an output to check against. Using a window size of 3 for simplicity.
ATR(df_pd, window_size=3)
# date high low close ATR
# 0 2025-04-01 102 98 101 NaN
# 1 2025-04-02 104 99 103 NaN
# 2 2025-04-03 103 100 102 3.888889
# 3 2025-04-04 107 103 106 4.259259
# 4 2025-04-05 110 106 109 4.172840
# 5 2025-04-06 112 108 111 4.115226
With Polars, when you see in the docs polars.Expr.ewm_mean
, that means call the ewm_mean
method on an expression (e.g., my_expr.ewm_mean(...)
). Expressions can be created with pl.col("some_column")
and are also returned when doing computations between each other (e.g., pl.col("high") - pl.col("low")
). In practice, that means you need your "true range" calculation as an expression. Read the expressions and contexts section of the user guide for further details.
As a general rule in Polars, prefer expressions rather than accessing columns as Series' (df["some_column"]
)
With that said, here is a Polars solution. Not dissimilar to your pandas code, just using expressions.
def pl_ATR(
high: str | pl.Expr = "high",
low: str | pl.Expr = "low",
close: str | pl.Expr = "close",
*,
window_size: int = 14,
) -> pl.Expr:
# If caller passed in strings for column names, convert them to expressions
# Calling `pl.col("column_name")` gives us a reference to a column
# and is of type expression (pl.Expr)
# This isn't required, just makes things nice for the caller
# and allows different column names to be passed
if isinstance(high, str): high = pl.col(high)
if isinstance(low, str): low = pl.col(low)
if isinstance(close, str): close = pl.col(close)
# Define the previous close as an expression give we will re-use it
prev_close = close.shift()
# `max_horizontal` is like `max(axis=1)` in pandas - it operates horizontally
# In Polars, horizontal operations generally have a dedicated function
# rather than an `axis` parameter
true_range = pl.max_horizontal(
high - low, # abs() isn't needed here as this will never be negative
(high - prev_close).abs(),
(low - prev_close).abs(),
)
# We also return an expression, that can be evaluated in a context
# (read the user guide link if confused)
return true_range.ewm_mean(
alpha=1 / window_size,
min_samples=window_size,
adjust=False,
ignore_nulls=True
)
Now using this function, we can see it matches the pandas output!
df_pl.with_columns(ATR=pl_ATR(window_size=3))
# shape: (6, 5)
# ┌────────────┬──────┬─────┬───────┬──────────┐
# │ date ┆ high ┆ low ┆ close ┆ ATR │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ date ┆ i64 ┆ i64 ┆ i64 ┆ f64 │
# ╞════════════╪══════╪═════╪═══════╪══════════╡
# │ 2025-04-01 ┆ 102 ┆ 98 ┆ 101 ┆ null │
# │ 2025-04-02 ┆ 104 ┆ 99 ┆ 103 ┆ null │
# │ 2025-04-03 ┆ 103 ┆ 100 ┆ 102 ┆ 3.888889 │
# │ 2025-04-04 ┆ 107 ┆ 103 ┆ 106 ┆ 4.259259 │
# │ 2025-04-05 ┆ 110 ┆ 106 ┆ 109 ┆ 4.17284 │
# │ 2025-04-06 ┆ 112 ┆ 108 ┆ 111 ┆ 4.115226 │
# └────────────┴──────┴─────┴───────┴──────────┘