In Polars / pandas I can do a rolling sum where row each row the window is (row - 10 minutes, row]
. For example:
import polars as pl
data = {
"timestamp": [
"2023-08-04 10:00:00",
"2023-08-04 10:05:00",
"2023-08-04 10:10:00",
"2023-08-04 10:10:00",
"2023-08-04 10:20:00",
"2023-08-04 10:20:00",
],
"value": [1, 2, 3, 4, 5, 6],
}
df = pl.DataFrame(data).with_columns(pl.col("timestamp").str.strptime(pl.Datetime))
print(
df.with_columns(pl.col("value").rolling_sum_by("timestamp", "10m", closed="right"))
)
This outputs
shape: (6, 2)
┌─────────────────────┬───────┐
│ timestamp ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═══════╡
│ 2023-08-04 10:00:00 ┆ 1 │
│ 2023-08-04 10:05:00 ┆ 3 │
│ 2023-08-04 10:10:00 ┆ 9 │
│ 2023-08-04 10:10:00 ┆ 9 │
│ 2023-08-04 10:20:00 ┆ 11 │
│ 2023-08-04 10:20:00 ┆ 11 │
└─────────────────────┴───────┘
How can I do this in DuckDB? Closest I could come up with is:
rel = duckdb.sql("""
SELECT
timestamp,
value,
SUM(value) OVER roll AS rolling_sum
FROM df
WINDOW roll AS (
ORDER BY timestamp
RANGE BETWEEN INTERVAL 10 minutes PRECEDING AND CURRENT ROW
)
ORDER BY timestamp;
""")
print(rel)
but that makes the window [row - 10 minutes, row]
, not (row - 10 minutes, row]
Alternatively, I could do
rel = duckdb.sql("""
SELECT
timestamp,
value,
SUM(value) OVER roll AS rolling_sum
FROM df
WINDOW roll AS (
ORDER BY timestamp
RANGE BETWEEN INTERVAL '10 minutes' - INTERVAL '1 microsecond' PRECEDING AND CURRENT ROW
)
ORDER BY timestamp;
""")
but I'm not sure about how robust that'd be?
Maybe not particularly neat, but from the top of my head you could exclude the rows which are exactly 10 minutes back by additional window clause
import duckdb
rel = duckdb.sql("""
SELECT
timestamp,
value,
SUM(value) OVER roll - coalesce(SUM(value) OVER exclude, 0) AS rolling_sum
FROM df
WINDOW roll AS (
ORDER BY timestamp
RANGE BETWEEN INTERVAL 10 minutes PRECEDING AND CURRENT ROW
), exclude AS (
ORDER BY timestamp
RANGE BETWEEN INTERVAL 10 minutes PRECEDING AND INTERVAL 10 minutes PRECEDING
)
ORDER BY timestamp;
""")
print(rel)
┌─────────────────────┬───────┬─────────────┐
│ timestamp │ value │ rolling_sum │
│ timestamp │ int64 │ int128 │
├─────────────────────┼───────┼─────────────┤
│ 2023-08-04 10:00:00 │ 1 │ 1 │
│ 2023-08-04 10:05:00 │ 2 │ 3 │
│ 2023-08-04 10:10:00 │ 3 │ 9 │
│ 2023-08-04 10:10:00 │ 4 │ 9 │
│ 2023-08-04 10:20:00 │ 5 │ 11 │
│ 2023-08-04 10:20:00 │ 6 │ 11 │
└─────────────────────┴───────┴─────────────┘