javajava-stream

Does Stream.generate(s).limit(n) guarantee exactly n calls to the generator function s, or is there a preferred alternative?


I have a source of data that I know has n elements, which I can access by repeatedly calling a method on an object. For the sake of example, let's call it myReader.read(). I want to create a stream of data containing those n elements. Let's also say that I don't want to call the read() method more times than the amount of data I want to return, as it will throw an exception (e.g. NoSuchElementException) if the method is called after the end of the data is reached.

I know I can create this stream by using the IntStream.range method, and mapping each element using the read method. However, this feels a little weird since I'm completely ignoring the int values in the stream (I'm really just using it to produce a stream with exactly n elements).

Stream<String> myStream =
        IntStream.range(0, n).mapToObj(i -> myReader.read());

An approach I've considered is using Stream.generate(supplier) followed by Stream.limit(maxSize). Based on my understanding of the limit function, this feels like it should work.

Stream<String> myStream = Stream.generate(myReader::read).limit(n)

However, nowhere in the API documentation do I see an indication that the Stream.limit() method will guarantee exactly maxSize elements are generated by the stream it's called on. It wouldn't be infeasible that a stream implementation could be allowed to call the generator function more than n times, so long as the end result was just the first n calls, and so long as it meets the API contract for being a short-circuiting intermediate operation.

Stream.limit JavaDocs

Returns a stream consisting of the elements of this stream, truncated to be no longer than maxSize in length. This is a short-circuiting stateful intermediate operation.

Stream operations and pipelines documentation

An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. [...] Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.

Is it safe to rely on Stream.generate(generator).limit(n) only making n calls to the underlying generator? If so, is there some documentation of this fact that I'm missing?

And to avoid the XY Problem: what is the idiomatic way of creating a stream by performing an operation exactly n times?


Solution

  • Stream.generate creates an unordered Stream. This implies that the subsequent limit operation is not required to use the first n elements, as there is no “first” when there’s no order, but may select arbitrary n elements. The implementation may exploit this permission , e.g. for higher parallel processing performance.

    The following code

    IntSummaryStatistics s =
        Stream.generate(new AtomicInteger()::incrementAndGet)
            .parallel()
            .limit(100_000)
            .collect(Collectors.summarizingInt(Integer::intValue));
    
    System.out.println(s);
    

    prints something like

    IntSummaryStatistics{count=100000, sum=5000070273, min=1, average=50000,702730, max=100207}
    

    on my machine, whereas the max number may vary. It demonstrates that the Stream has selected exactly 100000 elements, as required, but not the elements from 1 to 100000. Since the generator produces strictly ascending numbers, it’s clear that is has been called more than 100000 times to get number higher than that.

    Another example

    System.out.println(
        Stream.generate(new AtomicInteger()::incrementAndGet)
            .parallel()
            .map(String::valueOf)
            .limit(10)
            .collect(Collectors.toList())
    );
    

    prints something like this on my machine (JDK-14)

    [4, 8, 5, 6, 10, 3, 7, 1, 9, 11]
    

    With JDK-8, it even prints something like

    [4, 14, 18, 24, 30, 37, 42, 52, 59, 66]
    

    If a construct like

    IntStream.range(0, n).mapToObj(i -> myReader.read())
    

    feels weird due to the unused i parameter, you may use

    Collections.nCopies(n, myReader).stream().map(TypeOfMyReader::read)
    

    instead. This doesn’t show an unused int parameter and works equally well, as in fact, it’s internally implemented as IntStream.range(0, n).mapToObj(i -> element). There is no way around some counter, visible or hidden, to ensure that the method will be called n times. Note that, since read likely is a stateful operation, the resulting behavior will always be like an unordered stream when enabling parallel processing, but the IntStream and nCopies approaches create a finite stream that will never invoke the method more than the specified number of times.