Let's say we have a piece of code such as:
Arrays.stream(queries)
.limit(queries.length - 1).mapToInt(i -> i)
.sum();
Where queries is an array of N integers. To clarify the question, let's say it's 1 million integers, so the array will take up ~4MB (1M * 4 bytes per integer).
Will the stream take up any considerable space? Or will we use roughly 4MB and stream through the array without having to re-allocate entire array to run the following code (not considering the space we need to run the JVM)?
The answer is:
[A] an implementation detail. The java spec simply isn't going to tell you, and therefore any exact answer needs to be caveated with '.. on this hardware, this OS, this VM impl, this version, under these circumstances'. However...
[B] Whatever the answer is, it is both 'pretty fast / not much space', and it is definitely 'not dependent on the value of N'.
The 'Stream' in stream is not chosen for fun: The stream API does, in fact, stream. It does not take the entire array, then make a new object containing all values ready for streaming, then limit
makes yet another new giant array (one size smaller), and then mapToInt
makes yet another. That is not how it works.
Stream is a pipeline. Nothing happens until you run a terminal command (sum
is terminal). You can check this:
Arrays.stream(queries).mapToInt(i -> {
System.out.println(i);
return i.intValue();
});
This doesn't print anything. At all. Because this is just a stream process half-baked, no terminal, it doesn't 'flow'.
If you invoke sum on the above, THEN the prints start happening. Specifically, the terminal (sum()
, here) start 'pulling values from the stream'. This travels up. sum asks mapToInt
for a value, and to do this, mapToInt
asks limit
for a value (and will then take that value, roll it through the i -> i
lambda, and provide that to sum
). limit
will then ask Arrays.stream
for a value and that will then actually read a single item from the array. There ARE intermediate tracker objects involved, but their size doesn't depend on N. For example, the object returned by Arrays.stream(queries)
holds a ref to the queries
array (about 64 bits of data regardless of how large that array is; just a pointer), and an int value that knows where we're at 1.
The object representing the limit
part of it all just has a single int
that tracks how many values have been provided so far. limit
acts like there are no more values to provide when the thing it is pulling from runs out or limit
items have already been provided, whichever occurs sooner.
And so on. Thus, exactly how big those tracker objects are is an implementation detail, but, they are 'small' (at least, relative to a million-ints array!), and do not depend on the size of the stream. In fact, literally infinite streams can exist, no problem. And they do - check the API of Stream
itself, where you can rather easily make a stream that returns an infinite amount of 1
values, for example.
[1] I'm oversimplifying. Streams also have the property that depending on a few circumstances, they can be parallellized. When you involve counters, parallelizing becomes very difficult, so these trackers are a bit more complicated. If you want the full details, look at Spliterator, and StreamUtils
. But, this oversimplified explanation is sufficient for understanding that no amount of stream intermediate ops are going to put you at risk of running out of memory.