I get an OOM in Java 19 even with 8GB if I do this:
IntStream
.iterate(0, i -> i + 1)
.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();
However, I don't get any OOM if I omit skip(2):
IntStream
.iterate(0, i -> i + 1)
//.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();
where checksum(...) is
public static long checkSum(long n) {
long result = 0;
long remaining = n;
while (0 < remaining) {
long remainder = remaining % 10;
result += remainder;
remaining = (remaining - remainder) / 10;
}
return result;
}
Why does skip() in this parallel stream expression in Java 19 cause an OOM even with 8GB?
I know I should use range(...) instead of iterate()+limit() with or without skip(). However, that doesn't answer me this question. I would like to understand what's the issue here.
skip()
- is a stateful operation which guarantees that n
first elements of the stream (with respect of the encounter order, if the stream is ordered) would be omitted.
It would be cheap in a sequential pipeline, but might be costful while running in parallel if the stream is ordered. Documentation warns about that and suggests loosening the constraint ordering if possible.
API Note:
While
skip()
is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values ofn
, sinceskip(n)
is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such asgenerate(Supplier)
) or removing the ordering constraint withBaseStream.unordered()
may result in significant speedups ofskip()
in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization withskip()
in parallel pipelines, switching to sequential execution withBaseStream.sequential()
may improve performance.
Emphasis added
The following unordered Stream would run without issues (result of the stream execution might not be consistent because in unordered stream threads a free to discard any n
elements, not n
first):
IntStream
.iterate(0, i -> i + 1)
.unordered()
.skip(2)
.limit(10_000_000)
.filter(i -> checkSum(i) <= 20)
.parallel()
.count();