pythonjavaarraysperformancehashmap

Why the performance is bad if I use HashMap to solve Codility problem MinAbsSum?


Regarding the Codility problem MinAbsSum at https://app.codility.com/programmers/lessons/17-dynamic_programming/min_abs_sum/, the performance is bad if I use HashMap to solve the problem. I expect the performance to be similar to Array. I prefer to use HashMap because Array wastes a lot of memory if the number range is too large. Any idea why the HashMap is much slower than the Array?

import java.util.*;

public class MinAbsSumUsingHashMap {
    // Total score = 54%
    // Detected time complexity: O(N**2 * max(abs(A)))
    public int solution(int[] A) {
        // 1. Convert all numbers to absolute numbers
        // 2. Sum all the converted absolute numbers
        // 3. Count the occurrence of all the converted absolute numbers
        int sum = 0;
        Map<Integer, Integer> numberCountMap = new TreeMap<>();
        for (int num : A) {
            int absNum = Math.abs(num);
            sum += absNum;
            numberCountMap.merge(absNum, 1, Integer::sum);
        }

        AllPossibleSumNumbers allPossibleSumNumbers = new AllPossibleSumNumbers(sum);
        for (Map.Entry<Integer, Integer> entry : numberCountMap.entrySet()) {
            int absNum = entry.getKey();
            int count = entry.getValue();
            allPossibleSumNumbers.addPossibleSumNumber(absNum, count);
        }

        float halfSum = sum / 2f;
        int halfSumFloor = (int) Math.floor(halfSum);
        for (int bestPossibleHalfSum = halfSumFloor; bestPossibleHalfSum >= 0; bestPossibleHalfSum--) {
            if (allPossibleSumNumbers.isPossibleSumNumber(bestPossibleHalfSum)) {
                float halfDifference = halfSum - bestPossibleHalfSum;
                float fullDifference = halfDifference * 2;
                return (int) fullDifference;
            }
        }
        return -1; // Ideally, it will not return -1.
    }

    class AllPossibleSumNumbers {
        private final Map<Integer, Integer> allPossibleSumCounts = new HashMap<>();
        private final int expectedSum;

        public AllPossibleSumNumbers(int expectedSum) {
            this.expectedSum = expectedSum;

            // Sum 0 can be achieved without any number
            int possibleSumNumber = 0;
            int count = 0;
            allPossibleSumCounts.put(possibleSumNumber, count);
        }

        public void addPossibleSumNumber(int num, int count) {
            for (int possibleSum = 0; possibleSum <= expectedSum; possibleSum++) {
                if (isPossibleSumNumber(possibleSum)) {
                    // we can set like this because no value num is needed to obtain the possibleSum
                    allPossibleSumCounts.put(possibleSum, count);
                } else if (possibleSum >= num) {
                    // It is not a possible sum number, check the previous one
                    int prevPossibleSum = possibleSum - num;
                    Integer prevCount = allPossibleSumCounts.get(prevPossibleSum);
                    if (prevCount != null) {
                        // previous one is a possible sum number
                        int currCount = prevCount - 1; // currCount should be reduced by 1 because the num has been used once to achieve the current sum
                        if (currCount >= 0) {
                            // Is currCount >= 0 necessary? Got the same score after adding currCount >= 0 condition :(
                            allPossibleSumCounts.put(possibleSum, currCount);
                        }
                    }
                }
            }
        }

        public boolean isPossibleSumNumber(int num) {
            Integer count = allPossibleSumCounts.get(num);
            if (count == null) {
                return false;
            } else {
                return count >= 0;
            }
        }
    }
}

A similar code uses array can achieve a 100% score.

import java.util.*;

public class MinAbsSumUsingArray {
    // Total score = 100%
    // Detected time complexity: O(N * max(abs(A))**2)
    public int solution(int[] A) {
        // 1. Convert all numbers to absolute numbers
        // 2. Sum all the converted absolute numbers
        // 3. Count the occurrence of all the converted absolute numbers
        int sum = 0;
        Map<Integer, Integer> numberCountMap = new TreeMap<>();
        for (int num : A) {
            int absNum = Math.abs(num);
            sum += absNum;
            numberCountMap.merge(absNum, 1, Integer::sum);
        }

        AllPossibleSumNumbers allPossibleSumNumbers = new AllPossibleSumNumbers(sum);
        for (Map.Entry<Integer, Integer> entry : numberCountMap.entrySet()) {
            int absNum = entry.getKey();
            int count = entry.getValue();
            allPossibleSumNumbers.addPossibleSumNumber(absNum, count);
        }

        float halfSum = sum / 2f;
        int halfSumFloor = (int) Math.floor(halfSum);
        for (int bestPossibleHalfSum = halfSumFloor; bestPossibleHalfSum >= 0; bestPossibleHalfSum--) {
            if (allPossibleSumNumbers.isPossibleSumNumber(bestPossibleHalfSum)) {
                float halfDifference = halfSum - bestPossibleHalfSum;
                float fullDifference = halfDifference * 2;
                return (int) fullDifference;
            }
        }
        return -1; // Ideally, it will not return -1.
    }

    class AllPossibleSumNumbers {
        private final int[] allPossibleSumCounts;

        public AllPossibleSumNumbers(int expectedSum) {
            allPossibleSumCounts = new int[expectedSum + 1];

            // first one is 0. the rest are -1
            for (int i = 1; i < allPossibleSumCounts.length; i++) {
                allPossibleSumCounts[i] = -1;
            }
        }

        public void addPossibleSumNumber(int num, int count) {
            for (int possibleSum = 0; possibleSum < allPossibleSumCounts.length; possibleSum++) {
                if (isPossibleSumNumber(possibleSum)) {
                    // we can set like this because no value num is needed to obtain the possibleSum
                    allPossibleSumCounts[possibleSum] = count;
                } else if (possibleSum >= num) {
                    // It is not a possible sum number, check the previous one
                    int prevPossibleSum = possibleSum - num;
                    int prevCount = allPossibleSumCounts[prevPossibleSum];
                    int currCount = prevCount - 1; // currCount should be reduced by 1 because the num has been used once to achieve the current sum
                    allPossibleSumCounts[possibleSum] = currCount;
                }
            }
        }

        public boolean isPossibleSumNumber(int num) {
            return allPossibleSumCounts[num] >= 0;
        }
    }
}

Interestingly, I just tried the same in Python, and both work fine. I assume the Python dictionary is equivalent to the Java HashMap. Please let me know if I am wrong.

def solution(A):
    array_length = len(A)
    # 1. Convert all numbers to absolute numbers
    for best_possible_half_sum in range(array_length):
        A[best_possible_half_sum] = abs(A[best_possible_half_sum])
    # 2. Sum all the converted absolute numbers
    abs_sum = sum(A)
    # 3. Count the occurrence of all the converted absolute numbers
    num_counts = {}
    for num in A:
        if num in num_counts:
            num_counts[num] += 1
        else:
            num_counts[num] = 1

    # Sum 0 can be achieved without any number
    all_possible_sum_counts = {0: 0}

    for num, count in num_counts.items():
        for possible_sum in range(abs_sum):
            if possible_sum in all_possible_sum_counts and all_possible_sum_counts[possible_sum] >= 0:
                # we can set like this because no value num is needed to obtain the possible_sum
                all_possible_sum_counts[possible_sum] = count
            elif (possible_sum >= num):
                # It is not a possible sum number, check the previous one
                prev_possible_sum = possible_sum - num
                if prev_possible_sum in all_possible_sum_counts:
                    # previous one is a possible sum number
                    prev_count = all_possible_sum_counts[prev_possible_sum]
                    curr_count = prev_count - 1 # currCount should be reduced by 1 because the num has been used once to achieve the current sum
                    all_possible_sum_counts[possible_sum] = curr_count
    
    half_sum = abs_sum / 2
    half_sum_floor = abs_sum // 2
    for best_possible_half_sum in range(half_sum_floor, -1, -1):
        if best_possible_half_sum in all_possible_sum_counts and all_possible_sum_counts[best_possible_half_sum] >= 0:
            half_difference = half_sum - best_possible_half_sum
            full_difference = half_difference * 2
            return int(full_difference)

Below is the array version

def solution(A):
    array_length = len(A)
    # 1. Convert all numbers to absolute numbers
    for best_possible_half_sum in range(array_length):
        A[best_possible_half_sum] = abs(A[best_possible_half_sum])
    # 2. Sum all the converted absolute numbers
    abs_sum = sum(A)
    # 3. Count the occurrence of all the converted absolute numbers
    num_counts = {}
    for num in A:
        if num in num_counts:
            num_counts[num] += 1
        else:
            num_counts[num] = 1

    # first one is 0. the rest are -1
    all_possible_sum_counts = [-1] * (abs_sum + 1)
    all_possible_sum_counts[0] = 0

    for num, count in num_counts.items():
        for possible_sum in range(abs_sum):
            if all_possible_sum_counts[possible_sum] >= 0:
                # we can set like this because no value num is needed to obtain the possible_sum
                all_possible_sum_counts[possible_sum] = count
            elif (possible_sum >= num):
                # It is not a possible sum number, check the previous one
                prev_possible_sum = possible_sum - num
                prev_count = all_possible_sum_counts[prev_possible_sum]
                curr_count = prev_count - 1 # currCount should be reduced by 1 because the num has been used once to achieve the current sum
                all_possible_sum_counts[possible_sum] = curr_count
    
    half_sum = abs_sum / 2
    half_sum_floor = abs_sum // 2
    for best_possible_half_sum in range(half_sum_floor, -1, -1):
        if all_possible_sum_counts[best_possible_half_sum] >= 0:
            half_difference = half_sum - best_possible_half_sum
            full_difference = half_difference * 2
            return int(full_difference)

Solution

  • TL;DR: the array is pretty small in memory (<10 MiB) so accesses are fast and the memory overhead is still small, meanwhile the overheads of hash-map are huge, especially on run-time.


    About Java

    In Java, a int[] data type is a sequence of native (signed) integer (of typically 4 bytes) contiguously stored in memory. Accessing an item is very efficient because there is no software overhead for this. On mainstream CPUs, it can be done in a single CPU instruction. As long as the array fit in the CPU cache, random accesses are very fast.

    A HashMap<Integer, Integer> is equivalent to Hashtable<Integer, Integer> internally besides the handling of null and synchronisation. It basically contains an array of bucket objects. Each bucket is typically a structure containing a hash, the key and the value as well as a reference to the next structure forming a linked list (see the doc of Hashtable for more information). The number of buckets is basically the capacity of the container. There are generally more buckets than the number of items so to avoid collisions (see load-factor). Items are hashed and the number of buckets can be adapted at runtime causing re-hashing of all items (quite expensive). The later is typically done with a power-of-n growing policy (where n is often between 1 excluded and 2 included). The default load-factor is 0.75. Which means that for 1000 items, we should expect 1300~2700 buckets. When considering that each bucket is significantly bigger than just a Integer or even a int, we can safely say that the memory overhead of hash-maps is significant.

    Indeed, on top of that, this is not just int that are stored but Integer object which are handled completely differently internally. Such data type are autoboxed but they are managed like full objects except for tiny integer values which are cached. This means the Java implementation internally use a pointer on a dynamically-allocated garbage-collected Integer object. You cannot use a native int data-type parameter for such container (or any container actually) because of how Generics work. On a 64-bit machine with a 64-bit JVM, pointers should theoretically be 8-byte wide. That being said some JVMs like Hotspot can compress pointers to objects so they only take 4 bytes in many cases. This is called CompressedOops (see this post and this one for more information; thanks to @Holger for pointing this out). In this case, using the Hotspot JVM, this means the array of bucket objects is actually an array of 4-byte pointers referencing dynamically-allocated garbage-collected bucket objects. On such platform, each bucket typically takes 24-bytes (hash + 3 pointers + some padding). 2 of the 3 pointers reference dynamically-allocated garbage-collected Integer objects. All objects take at least 16 bytes on modern JVMs due to a 12-byte header and also due to the 8-byte alignment required by CompressedOops. An Integer object actually takes 16 bytes (a 12-byte header + 4-byte actual value). This is 4 times more than an int-items in an int[] array! Even worse: you need 2 Integer objects here for each key-value pair in the HashMap, so 32-bytes. Because of all of that, we can safely say that, on such mainstream platform, a HashMap takes at least 14 times more memory than the number of int actually used in an alternative array. It is also much slower because of all the afore-mentioned overhead (i.e. multiple memory indirections, GC, hashing, allocations).

    Thus, yes, a HashMap<Integer, Integer> can theoretically be smaller in memory than an array when the number of element actually used in the array are tiny compared to its size (let us call this the load-factor of the array). For any array load-factor >0.1, the hash-map will take more memory than a basic array. For an array load-factor of 0.03~0.10, both should take a significant amount of memory in practice. For array load-factors <0.03, the hash-map should be better. The question is then: how many items are actually used in the array. Or put it differently: how big is the array load-factor. Not to mention that even with big arrays taking more memory, the hash-map will be still be much slower due to all its overheads. At least, this will be the case unless the array load-factor is so tiny that the array is really huge in memory and memory overheads (e.g. uncached accesses, paging, etc.) appears.

    It turns out that, "N is an integer within the range [0..20,000]" and "each element of array A is an integer within the range [−100..100]" in your exercise so the array should take less than 10 MiB. In practice, the sum should be often significantly smaller because numbers can be negative (likely less than few MiB). The array should fit in the quite-fast L3 cache of all recent mainstream CPU. Thus, accessing such an array is pretty cheap. Meanwhile, there is a lot to do for accessing a hash-map (which might not even fit in the L2 cache). You should not care about the memory overhead of a single small array.


    About (C)Python

    Regarding Python, the default implementation (CPython) is an interpreter which is insanely slow. Accessing a list is much slower than a int[] array in Java. Indeed, using a mainstream optimised Java VM, the code is compiled at runtime thanks to a just-in-time (JIT) compiler. Any trivial like an addition of 2 integers is typically >100 times slower than in Java. In the end, you might not even see a difference between a list and a dict because other overhead are too big. You can test that with PyPy (JIT) instead of CPython (interpreter). Still, list store integer objects in Python like an ArrayList<Integer> in Java, which is generally significantly slower than just a int[] data-type containing contiguous native integers. Besides, integers in Python have an variable size (no bound), as opposed to int or Integer in Java, introducing even more overheads (even in PyPy) and hiding even more the difference between the two implementations.

    I assume the Python dictionary is equivalent to the Java HashMap

    They are implemented very differently. In the Java's HashMap, a "single bucket stores multiple entries". Meanwhile, CPython dict use open-addressing so each bucket only contains 1 item (hence no linked lists -- see this article on the matter). Open-addressing tends to be faster when both the load-factor is not high (e.g. <0.9) and the hash-function is well-balanced. I expect open-addressing to be faster in this case since numbers are small, the hash function should not be too bad, and the default load factor is reasonably low enough (at the expense of a higher memory usage). Beside, iterating over key-values (like you do) is cheaper on hash-map with open-addressing (like in CPython) when the load-factor is not to high, because it internally just travel a contiguous array of buckets; meanwhile an implementation with linked-lists (like Java) does more (possibly unpredictable) indirections in memory due to linked-lists in each buckets. Pointer chasing is generally inefficient on modern CPUs.