Is there an pendant for this Pandas functionality in Pyspark?
pandasDataFrame.rolling('2s', min_periods=1).sum()
where the columns in question have timestamps like this
2013-01-01 09:00:00 0.0
2013-01-01 09:00:02 1.0
2013-01-01 09:00:05 3.0
:
(documentation here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html )
:
Use the window function in spark.
from pyspark.sql import functions as F
df.withColumn(
"window",
F.window("tmst", "2 secondes")
)