I have a source of data that I know has n
elements, which I can access by repeatedly calling a method on an object. For the sake of example, let's call it myReader.read()
. I want to create a stream of data containing those n
elements. Let's also say that I don't want to call the read()
method more times than the amount of data I want to return, as it will throw an exception (e.g. NoSuchElementException
) if the method is called after the end of the data is reached.
I know I can create this stream by using the IntStream.range
method, and mapping each element using the read
method. However, this feels a little weird since I'm completely ignoring the int values in the stream (I'm really just using it to produce a stream with exactly n
elements).
Stream<String> myStream =
IntStream.range(0, n).mapToObj(i -> myReader.read());
An approach I've considered is using Stream.generate(supplier)
followed by Stream.limit(maxSize)
. Based on my understanding of the limit
function, this feels like it should work.
Stream<String> myStream = Stream.generate(myReader::read).limit(n)
However, nowhere in the API documentation do I see an indication that the Stream.limit()
method will guarantee exactly maxSize
elements are generated by the stream it's called on. It wouldn't be infeasible that a stream implementation could be allowed to call the generator function more than n
times, so long as the end result was just the first n
calls, and so long as it meets the API contract for being a short-circuiting intermediate operation.
Returns a stream consisting of the elements of this stream, truncated to be no longer than maxSize in length. This is a short-circuiting stateful intermediate operation.
An intermediate operation is short-circuiting if, when presented with infinite input, it may produce a finite stream as a result. [...] Having a short-circuiting operation in the pipeline is a necessary, but not sufficient, condition for the processing of an infinite stream to terminate normally in finite time.
Is it safe to rely on Stream.generate(generator).limit(n)
only making n
calls to the underlying generator? If so, is there some documentation of this fact that I'm missing?
And to avoid the XY Problem: what is the idiomatic way of creating a stream by performing an operation exactly n
times?
Stream.generate
creates an unordered Stream. This implies that the subsequent limit
operation is not required to use the first n elements, as there is no “first” when there’s no order, but may select arbitrary n elements. The implementation may exploit this permission , e.g. for higher parallel processing performance.
The following code
IntSummaryStatistics s =
Stream.generate(new AtomicInteger()::incrementAndGet)
.parallel()
.limit(100_000)
.collect(Collectors.summarizingInt(Integer::intValue));
System.out.println(s);
prints something like
IntSummaryStatistics{count=100000, sum=5000070273, min=1, average=50000,702730, max=100207}
on my machine, whereas the max number may vary. It demonstrates that the Stream has selected exactly 100000
elements, as required, but not the elements from 1 to 100000. Since the generator produces strictly ascending numbers, it’s clear that is has been called more than 100000 times to get number higher than that.
Another example
System.out.println(
Stream.generate(new AtomicInteger()::incrementAndGet)
.parallel()
.map(String::valueOf)
.limit(10)
.collect(Collectors.toList())
);
prints something like this on my machine (JDK-14)
[4, 8, 5, 6, 10, 3, 7, 1, 9, 11]
With JDK-8, it even prints something like
[4, 14, 18, 24, 30, 37, 42, 52, 59, 66]
If a construct like
IntStream.range(0, n).mapToObj(i -> myReader.read())
feels weird due to the unused i
parameter, you may use
Collections.nCopies(n, myReader).stream().map(TypeOfMyReader::read)
instead. This doesn’t show an unused int
parameter and works equally well, as in fact, it’s internally implemented as IntStream.range(0, n).mapToObj(i -> element)
. There is no way around some counter, visible or hidden, to ensure that the method will be called n times. Note that, since read
likely is a stateful operation, the resulting behavior will always be like an unordered stream when enabling parallel processing, but the IntStream
and nCopies
approaches create a finite stream that will never invoke the method more than the specified number of times.