javabenchmarkingjava-stream

Why does IntStream.range(0, 100000).parallel().foreach take longer than normal for loop


I am just starting to learn about the Streams and parallel in Java and I was wondering why a normal for loop takes less time than IntStream paralleled at adding items to an array.

package parallel;

import java.util.stream.IntStream;

public class Parallel {

    public static void main(String[] args) {
         final int[] intArray = new int[100000];
        long startTime = System.currentTimeMillis(); 
        IntStream.range(0, 100000).parallel().forEach(i ->  intArray[i]=i);
        long endTime = System.currentTimeMillis();
        System.out.println("Parallel time: " + (endTime-startTime));
        final int[] intArray2 = new int[100000];
        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        startTime = System.currentTimeMillis();
        for(int i = 0; i < 100000; i++){
            intArray2[i] = i;
        }
        endTime = System.currentTimeMillis();
        System.out.println("Non parallel time: " + (endTime-startTime));
    }
}

Getting results like this.

Parallel time: 110

Non parallel time: 7


Solution

  • The operation you perform for each element is very simple, it's just an assignment, which is very fast. In the parallel version, you have a lot of overhead by starting the multiple threads that handle the operations. This alone will likely already take longer than what the very simple operation takes when applied non-parallel.

    Also, in the non-parallel version, the values are written very linearly to the array, which CPU architecture already has single-threat/single-core optimizations for quite awhile, as do compilers and intermediary compilers (which convert code like C to assembly). In the parallel version though, you'll might get conflicts as each thread tries to write to the same array (although on different positions, but probably still on the same cache line), and as several threads access different parts of the array, you might also get cache misses which slow things down.

    With a more expensive operation, the overhead of the parallel version becomes smaller compared to the total costs, which will in the end result in faster execution than the non-parallel case.