hazelcasthazelcast-imap

hazelcast predicates API with multiple filters


We are currently evaluating hazelcast for our needs (and the possibility to go with a license). In one of the test set-ups, we have an iMap where its "values" are something like this:

class Market {
 
     boolean useable;
     long idA;
     long idB;
     long idC;
   
}

And we are trying to issue a query on it, using Predicates API:

Predicate<Long, Market> predicate = Predicates.and(
        Predicates.equal("useable",true),
        Predicates.in("idA", idsA.toArray(Long[]::new)),
        Predicates.in("idB", idsB.toArray(Long[]::new)),
        Predicates.in("idC", idsC.toArray(Long[]::new)));

These idsA, idsB and idsC have very few values, up to 5 at most; and there are around 500_000 entries in this map, across 3 nodes (we use compact serializer). We have tried to create various indexes for these, here are some:

        - type: HASH
          attributes:
            - useable
          
        - type: HASH
          attributes:
            - idA

        - type: HASH
          attributes:
            - idB
   
        - type: HASH
          attributes:
            - idC

Another option is when we did:

        - type: HASH
          attributes:
            - useable

        - type: SORTED
          attributes:
            - idA
            - idB
            - idC

While we do see in the management console that these indexes are hit, the response is slow, as such we tried to enable:

config.setProperty(ClusterProperty.QUERY_PREDICATE_PARALLEL_EVALUATION.getName(), "true");

and :

hazelcast:
  executor-service:
    "hz:query":
      pool-size: 64

But none of them brought any significant performance improvement, a simple :

wrk  -t50 -c50 -d60s "http://localhost:8080/markets"

shows:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.06s   350.08ms   1.98s    69.28%

The average of 1 second is far too big.

Before we go into profiling, is the choice of our indexes correct?


Solution

  • We took a few flame graphs and profiled with async profiler, and the indexes are doing great for the amount of data we have. We have also enabled these :

      @Bean
      HazelcastConfigCustomizer predicatesParallelExecutor() {
        return hzInstanceConfig -> {
          hzInstanceConfig.setProperty(ClusterProperty.QUERY_PREDICATE_PARALLEL_EVALUATION.getName(), "true");
          hzInstanceConfig.setProperty(ClusterProperty.CLIENT_ENGINE_THREAD_COUNT.getName(), "50");
          hzInstanceConfig.setProperty(ClusterProperty.CLIENT_ENGINE_QUERY_THREAD_COUNT.getName(), "50");
        };
      }
    

    still, the results are terrible. And we are dominated by compact deserialization.