Why does Dataflow generate the following error when joining two streams where one has been windowed into sliding windows?
TypeError: Cannot convert GlobalWindow to apache_beam.utils.windowed_value._IntervalWindowBase [while running 'B/Map(_from_proto_str)-ptransform-24']
I have created a reproducible example below that works on DirectRunner, but produces the error on DataflowRunner.
pipeline_options = PipelineOptions(pipeline_args, streaming=True, save_main_session=True)
with Pipeline(options=pipeline_options) as pipeline:
x = pipeline | "A" >> ReadFromPubSub(topicA) | WindowInto(SlidingWindows(2, 1))
y = pipeline | "B" >> ReadFromPubSub(topicB)
(x, y) | Flatten()
Job ID: 2022-08-25_14_39_18-6045313873222423696
SDK Version: Apache Beam Python 3.9 SDK 2.40.0
PCollections must have consistent windowing to be flattened. I'm not sure why the direct runner doesn't enforce this (though I bet it would give errors later on, e.g. if a GroupByKey was attempted on the flatten's output), but we should catch this at construction time. Filed https://github.com/apache/beam/issues/22903 to make this more explicit.