I am trying to implement affinity for sql queries with Apache Ignite. Let's imagine I have the following classes (the code is available at https://github.com/hostettler/distibuted-queries)
//I do not put here all the configuration, but it is available in my small test if interested
class EntityAKey {
@AffinityKeyMapped
String id;
}
class EntityBKey {
String id;
@AffinityKeyMapped
String entityAId;
}
...
String joinSql = "select count(1) from \"EntityA\".EntityA as a inner join \"EntityB\".EntityB as b on a.id = b.entityA ";
SqlFieldsQuery q1 = new SqlFieldsQuery(joinSql);
q1.setDistributedJoins(false);
FieldsQueryCursor<List<?>> c = cacheA.query(q1);
List<List<?>> l = c.getAll();
In this case (A <--- B), everything is fine I can even instruct the sql engine to work per group of partitions and I can set setDistributedJoins(false) to instruct the engine that it does not have to look at other partitions.
Now, that works as long as ALL the joins of the query are relying on exactly the SAME affinity. So assuming I am adding the classes C and D such as (C<---D) and A has a relationship to C such as
class EntityAKey {
@AffinityKeyMapped
String id;
}
class EntityA {
String id;
String entityC; //Id of entityC but THIS IS NOT PART OF THE KEY
}
class EntityBKey {
String id;
@AffinityKeyMapped
String entityAId;
}
class EntityCKey {
@AffinityKeyMapped
String id;
}
class EntityDKey {
String id;
@AffinityKeyMapped
String entityCId;
}
String joinSql = "select count(1) from \"EntityA\".EntityA as a "
+ " inner join \"EntityB\".EntityB as b on a.id = b.entityA"
+ " inner join \"EntityC\".EntityC as c on a.entityC = c.id"
+ " inner join \"EntityD\".EntityD as d on d.entityC = a.entityC";
SqlFieldsQuery q1 = new SqlFieldsQuery(joinSql);
q1.setDistributedJoins(false); //only works if setDistributedJoins(true)
q1.setPageSize(100);
FieldsQueryCursor<List<?>> c = cacheA.query(q1);
List<List<?>> l = c.getAll();
In that case the result incorrect (as mentionned in the documentation) simply because we have two different affinities -- on A(id) and on C(id) -- thus the join between A and B works fine and same between C and D but obviously not the whole query.
All that long introduction for the actual questions:
a) set setDistributedJoins(true) works just fine but my understanding is that in that case the query is not done on multiple threads (within a node) as it cannot rely on affinity to prune partitions. Is that understanding correct?
b) split the query in two different queries one from A->B and C->D with setDistributedJoins(false) and then reconcile the 2 queries with a third query with setDistributed(true). This is not practical for real life queries having 10-15 joins on potentially 3-4 different type of affinities
c) Changing the model so that there is the affinity everywhere by artificially having say entityA id in entityC. This is not practical because it means that cache.get will become much more cumbersome as I do not know the entityA id
Would you agree that the Apache Ignite optimizer should do that pruning for me and should apply affinity where possible?
Will that be solved by the adoption of Calcite?
Thanks a lot for your help.
In short, you can: