google-cloud-dataflowpipelineapache-beambeam

Sharing Beam State across different DoFns


Is Beam State shared across different DoFns?

Lets say I have 2 DoFns:

And then the pipeline in pseudocode:

pipline = readInput.........applyDoFn(StatefulDoFn1)......map{do something else}.......applyDoFn(StatefulDoFn2)

If I annotate myState identically in both StatefulDoFns - will what I write in StatefulDoFn1 be visible to StatefulDoFn2 , we implemented a pipeline with the assumption the answer is Yes ---- but it seems to be no


Solution

  • No, state is local to each stateful DoFn, and it is also actually local to each key (and window, if you are using a window) inside that DoFn.