javaparallel-processingjava-streamshort-circuiting

Can `Stream.allMatch()` call the predicate multiple times for the same element?


I'm trying to implement a short-circuited processing for an external input of java.util.Stream (think Stream.forEach() but with short-circuiting). I do not care about order of the elements, but if processing is not short-circuited, their identity and count are important. I've already got a solution based on a Stream.spliterator() and a while loop, however it is not parallelized and is hard to read. I've got an unreliable hint to use allMatch() to short-circuit the operation instead. The resulting code could then look like:

<T> void process(Stream<T> input) {
  input.allMatch(element -> {
    if (isCancelled) {
      return false;
    }
    send(element); // Should receive all elements in any order exactly once unless cancelled
    return true;
  });
}

Will allMatch() ever apply it's predicate argument to the same element multiple times?

I see that following test does not fail on my machine, can I rely on this behavior?

    @Test
    public void anyMatch() {
        AtomicLong count = new AtomicLong(0); 
        IntStream.range(0, Integer.MAX_VALUE).parallel().mapToObj(ignored -> new Object()) .allMatch(i -> {
            count.incrementAndGet(); // verify that no objects are processed more than once
            return true;
        });
        Assert.assertEquals(count.get(), Integer.MAX_VALUE);
    }

IMO, given the nature of underlying Spliterator.trySplit() this is a reliable approach, but I'd like to get a confirmation and prognosis somehow.

For the reference, the actual processing I'm trying to simplify is PushSpliterator.parallel()


Solution

  • Yes.

    No.

    It depends.

    The spec.

    Stream is a spec. What you're asking about is behaviour. The problem with that is: If the spec does not guarantee a certain behaviour, then if you write code that relies on this behaviour, any java update, or even a different java implementation, may break your app, and it is your fault. Filing a bug report will get the bug report denied as WONTFIX/WORKS_AS_INTENDED, and correctly so.

    In other words, then, the rule is simple. You follow the spec. In other words, you have asked the wrong question.

    Let's look at the spec (specifically the 'parameters' section of allmatch):

    predicate - a non-interfering, stateless predicate to apply to elements of this stream

    So, your question is immaterial. It must not matter. Clearly you want it to matter, and therefore the answer is Yes, you need to write code assuming that the predicate is invoked multiple times for the same object.

    XKCD

    The current implementation

    No, never. All implementations I know of as of today wouldn't do it, but, writing code that relies on this needs a whole page full of HERE BE DRAGONS, DANGERZONE!! style commentary and a bevy of tests that must be run every time you switch hardware or JVM implementation and even then you don't actually have a guarantee (with this multicore stuff, it tends to be non-deterministic: Any given run behaves differently because it depends on sequencing of threads which the JVM explicitly makes no guarantees about, and it is real easy to write code that is broken and nevertheless runs correctly every single time, forever, on the specific hardware/JVM-impl combo you are testing it on, today, but next week when you give the big demo it breaks).

    TL;DR: You're using streams wrong. if you want to shove side effects in your stream ops, don't use streams.