apache-flinkflink-sql

Flink sql resulting in chaining join, which is causing state to bloat


I have a this sql

select `from all tables`
FROM table1 i
LEFT JOIN table2 ip
ON i.tenantId = ip.tenantId AND i.id = ip.id
    LEFT JOIN table2 t
    ON i.tenantId = t.tenantId AND i.id = t.id
    LEFT JOIN table3 iap
    ON i.tenantId = iap.tenantId AND i.id = iap.id
    LEFT JOIN table4 ia
    ON i.tenantId = ia.tenantId AND i.id = ia.id
    LEFT JOIN table5 iae
    ON i.tenantId = iae.tenantId AND i.id = iae.id
    LEFT JOIN table6 iss
    ON i.tenantId = iss.tenantId AND i.id = iss.id
    LEFT JOIN table7 io
    ON i.tenantId = io.tenantId AND i.id = io.id
    LEFT JOIN table8 ipa
    ON i.tenantId = ipa.tenantId AND i.id = ipa.id
    LEFT JOIN table9 iara
    ON i.tenantId = iara.tenantId AND i.id = iara.id

This result in a join that looks like this enter image description here

With this, every join holds the previous join data + what is being joined newly. This is resulting a state getting bloating, same data is stored in multiple join. I am using flink 1.19.1 is there a way to avoid this ?


Solution

  • For now, I think you're stuck with this behavior. However, in a future release, FLIP-516 will add the capability for Flink SQL to join many tables in a single operator.