I have a series of timestamps (+ other data) that come from 2 separate streams of data ticking at different rates, an example below (NB: the frequency of the real data has some jitter so it's not a simple fixed stride like below)
src,idx,ts
B,1,20
A,1,100
A,2,200
A,3,300
B,2,320
A,4,400
A,5,500
A,6,600
B,3,620
for each A tick, I need to calculate the offset from the preceding B tick so it would become
src,idx,ts
A,1,80
A,2,180
A,3,280
A,4,80
A,5,180
A,6,280
how to do this in pandas without iteration?
I thought of some sort of rolling window but with a dynamic/criteria based window or some hybrid of merge_asof and group by but can't think of a way to do it.
You could group by changing B and subtract the first (B-row) from each group ts. Then maybe filter by not equal B to reproduce your desired final df:
import pandas as pd
df = pd.DataFrame(
{"src": ["B", "A", "A", "A", "B", "A", "A", "A", "B"],
"idx": [1, 1, 2, 3, 2, 4, 5, 6, 3],
"ts": [20, 100, 200, 300, 320, 400, 500, 600, 620]}
)
df["ts"] -= df.groupby(df.src.eq("B").cumsum())["ts"].transform("first")
df.query("src != 'B'")
More detail:
df.src.eq("B").cumsum()
gives a Series which increases by one each time a "B" is encountered. This is what we want to group the DataFrame into sections between subsequent "B"s. For each group between each B (inclusive) and the following B (exclusive), we subtract the ts
value at the first B position from all ts
values within the group, hence resetting to zero at each B.