apache-flinkflink-streamingflink-table-api

what state/s does flink create when using the tableApi to join multiple tables?


I have multiple streams that I want to join (A to B, B to C , C to D...) to create one Z
when using the table api and joining 3 tables
select * from A inner join B on a.pk_id = b.fk_id inner join C on b.pk_id = c.fk_id
what is/are the underlying state/s looks like?
the keys are different from each source, if it is running in parallel. does Flink reshuffle the data?


Solution

  • You can figure this out by looking at the job graph in the web UI. There's a shuffle being done everywhere you see a HASH connection.

    This information is also included in the output of EXPLAIN <query>, but that's harder to grok (look for Exchange).