hazelcasthazelcast-jet

Running a streaming operation ONLY on the node which contains the relevant KEY


Let's say I have a large IStreamMap on a large cluster and I only want to do an operation on a few keys. I could just right a filter expression as shown below, but my understanding is that this will run on all nodes. And 99% of the nodes will be forced to stream the map even though ultimately nothing comes out of it. Is there a way to get the Hazelcast jet cluster to ONLY run the operation on the nodes that correspond to those keys? The code that ought to work is below, but I don't think it's efficient. (In my case, I might be running this operation many times on large distributed maps, so I would not want each node to try to execute this operation if I can tell ahead of time that 99% of the nodes are not relevant to the selected keys.)

final IStreamMap<String, Integer> streamMap = instance1.getMap("source");
    // stream of entries, you can grab keys from it
             streamMap.stream()
                    .filter(key -> key == 1 || key = 9999999)
                    .forEach(key -> <do something interesting>));

Solution

  • IStreamMap was removed from Hazelcast Jet three years ago, I think. You should use Jet through its Pipeline API.

    You can try using a map source with a predicate:

    Pipeline p = Pipeline.create();
    BatchStage<Entry<K, V>> stage = p.readFrom(Sources.map("name", 
            (Map.Entry<K, V> mapEntry) -> myCondition(mapEntry), 
            e -> e));
    

    This will still scan the entire map, though. If you simply have a set of keys you're interested in, then perhaps a better match for your use case is IMap.executeOnKeys().