I am trying to calculate e=∑(3−4k^2/(2k+1)!); k=0..10000
However I got stuck and can't get a desired performance boost using multithreading.
Given a number of threads, I tried to divide the whole sum to k / numberOfThreads
chunks and submit futures for each partial sum.
I think the bad part might be the factorial calculation or the granularity. I tried with a smaller step, but didn't get a big improvement. Maybe a different approach is needed.
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<BigDecimal>> futures = new ArrayList<>(numberOfThreads);
int step = k / numberOfThreads ;
BigDecimal result = BigDecimal.ZERO;
for (int j = 0; j <= k; j += step) {
Future<BigDecimal> future = executor.submit(new EulerCalculator(j, j + step));
futures.add(future);
}
for (Future<BigDecimal> future : futures) {
result = result.add(future.get());
}
public class EulerCalculator implements Callable<BigDecimal> {
private int start;
private int end;
public BigDecimal call() {
long numerator = 3 - 4 * start * start;
BigDecimal denominator = factorial(2 * start + 1);
BigDecimal partialSum = BigDecimal.valueOf(numerator)
.divide(denominator, 1000, RoundingMode.HALF_EVEN);
for (int i = start + 1 ; i < end; i++) {
numerator = 3 - 4 * i * i;
denominator = denominator.multiply(BigDecimal.valueOf(2 * i * (2*i + 1)));
partialSum = partialSum.add(BigDecimal.valueOf(numerator)
.divide(fact, 1000, RoundingMode.HALF_EVEN));
}
return partialSum;
}
private BigDecimal factorial(int cur) {
BigDecimal fact = BigDecimal.ONE;
for (int i = 2; i <= cur; i++) {
fact = fact.multiply(BigDecimal.valueOf(i));
}
return fact;
}
}
Best results from a few runs on a quad-core:
k = 10000
threads = 1: 345ms
threads = 2: 216ms
threads = 4: 184ms
threads = 8: 225ms
Your factorial part is not a constant time operation, but O(n). This means your first thread will have significantly less work than the last thread. Therefore you are not distributing work evenly.
There are generally three methods to tackle this.
You can make uneven step, i.e. larger step for smaller k. This is highly inefficient though, as you are making same multiplication for thousands of times.
You can try to switch to an approximate algorithm to calculate factorial to bring it to constant time. For small k you can use iteration to prevent loss of precision though, as the penalty will be low and there aren't many small k anyway.
Another way is to build a big array holding all factorial that may be used in calculation, which must be run before you submit any task. This caching method lose less precision. See comment below on how to parallelize this process.