javamultithreadingmathparallel-processingeulers-number

Improve performance of parallel calculation of euler number


I am trying to calculate e=∑(3−4k^2/(2k+1)!); k=0..10000 However I got stuck and can't get a desired performance boost using multithreading.

Given a number of threads, I tried to divide the whole sum to k / numberOfThreads chunks and submit futures for each partial sum. I think the bad part might be the factorial calculation or the granularity. I tried with a smaller step, but didn't get a big improvement. Maybe a different approach is needed.

ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
int step = k / numberOfThreads ;
BigDecimal result = BigDecimal.ZERO;
for (int j = 0; j <= k; j += step) {
    Future<BigDecimal> future = executor.submit(new EulerCalculator(j, j + step));
    futures.add(future);
}
for (Future<BigDecimal> future : futures) {
    result = result.add(future.get());
}
public class EulerCalculator implements Callable<BigDecimal> {
    private int start;
    private int end;

    public BigDecimal call() {
        long numerator = 3 - 4 * start * start;
        BigDecimal denominator = factorial(2 * start + 1);
        BigDecimal partialSum = BigDecimal.valueOf(numerator)
                                .divide(denominator, 1000, RoundingMode.HALF_EVEN);
        for (int i = start + 1 ; i < end; i++) {
            numerator = 3 - 4 * i * i;
            denominator = denominator.multiply(BigDecimal.valueOf(2 * i * (2*i + 1)));
            partialSum = partialSum.add(BigDecimal.valueOf(numerator)
                                        .divide(fact, 1000, RoundingMode.HALF_EVEN));
        }

        return partialSum;
    }

    private BigDecimal factorial(int cur) {
        BigDecimal fact = BigDecimal.ONE;
        for (int i = 2; i <= cur; i++) {
            fact = fact.multiply(BigDecimal.valueOf(i));
        }

        return fact;
    }
}

Best results from a few runs on a quad-core:

k = 10000

threads = 1: 345ms

threads = 2: 216ms

threads = 4: 184ms

threads = 8: 225ms


Solution

  • Your factorial part is not a constant time operation, but O(n). This means your first thread will have significantly less work than the last thread. Therefore you are not distributing work evenly.

    There are generally three methods to tackle this.

    You can make uneven step, i.e. larger step for smaller k. This is highly inefficient though, as you are making same multiplication for thousands of times.

    You can try to switch to an approximate algorithm to calculate factorial to bring it to constant time. For small k you can use iteration to prevent loss of precision though, as the penalty will be low and there aren't many small k anyway.

    Another way is to build a big array holding all factorial that may be used in calculation, which must be run before you submit any task. This caching method lose less precision. See comment below on how to parallelize this process.