javamemorygarbage-collection

High Memory Usage with large number of object in java


I have the below sample code which loop for 50 Million time and create objects of MyDataHolder . MyDataHolder has two variants one is hashmap based and the other is variable based. see (MyDataHolder_map and MyDataHolder_variable)

public class MemTest {
    public static void main(String[] args) {
        List<MyDataHolder_map> list = new LinkedList<>();
        for (int i = 0; i < 50_000_000; i++) {
            if(i % 100_000==0) System.out.println(i);
            MyDataHolder_map d = new MyDataHolder_map();
            d.add("key1", "value1"+i);
            d.add("key2", "value2"+i);
            d.add("key3", "value3"+i);
            d.add("key4", "value4"+i);
            d.add("key5", "value5"+i);
            d.add("key6", "value6"+i);
            list.add(d);
        }
    }
}
// use Map to store data
class MyDataHolder_map {
    private Map<String, String> data = new HashMap<>(6);

    public void add(String key, String value) {
        data.put(key, value);
    }
}

// use variables
class MyDataHolder_variable {
    private String key1;
    private String key2;
    private String key3;
    private String key4;
    private String key5;
    private String key6;

    public void add(String key, String value) {
        switch (key) {
            case "key1" -> key1 = value;
            case "key2" -> key2 = value;
            case "key3" -> key3 = value;
            case "key4" -> key4 = value;
            case "key5" -> key5 = value;
            case "key6" -> key6 = value;
        }
    }
}

I run the code with 32 GB ram -Xmx32g -verbose:gc and notice that the variable based MyDataHolder print following at the end of the program:

With Variable based MyDataHolder
[32.495s][info][gc] GC(34) Pause Young (Normal) (G1 Evacuation Pause) 24032M->24072M(31312M) 1259.276ms
[35.537s][info][gc] GC(27) Pause Remark 24128M->24128M(32768M) 2.067ms
[59.101s][info][gc] GC(27) Pause Cleanup 24128M->24128M(32768M) 1.132ms
[59.234s][info][gc] GC(27) Concurrent Mark Cycle 39850.818ms

Notice it took 24128M i.e 24 GB RAM.However with Hashmap based ``MyDataHolder` fails and it print

[81.370s][info][gc] GC(73) Pause Young (Normal) (G1 Preventive Collection) 32581M->32583M(32768M) 42.816ms
[81.458s][info][gc] GC(74) To-space exhausted
[81.458s][info][gc] GC(74) Pause Young (Concurrent Start) (G1 Preventive Collection) 32615M->32647M(32768M) 82.420ms
[81.458s][info][gc] GC(75) Concurrent Mark Cycle
[81.519s][info][gc] GC(76) To-space exhausted
[81.519s][info][gc] GC(76) Pause Young (Normal) (G1 Preventive Collection) 32663M->32679M(32768M) 52.343ms

Notice it consumed all 32 GB RAM.

Question:

  1. Why the program is taking 24GB with variable version , it should be way less as total number of unique string in VM are 6 Keys per iterator i.e 6*50M= 300M and each string is 7 byte so total is 300M * 7Byte = 1.9 GB. I understand there are overhead of references and objects but that cant be 20+ GBs. what is going on?
  2. Why Hashmap based ``MyDataHolder` need more space 32 GB as compared to variable based. Does HashMap have very high mem overheads?

PS: using java17


Solution

  • Why the program is taking 24GB with variable version , it should be way less as total number of unique string in VM are 6 Keys per iterator i.e 6*50M= 300M and each string is 7 byte so total is 300M * 7Byte = 1.9 GB. I understand there are overhead of references and objects but that cant be 20+ GBs. what is going on?

    (The following is making some assumptions about your JVM version, heap size and compressed OOPS options. For different assumptions, the object sizes may be different. But the overall point remains the same.)

    The overheads are much larger than you realize.

    1. Every Java object has a 12 byte header. Every Java array has a 16 byte header.

    2. The allocated size of a Java heap node (for an object or array) is rounded up to a multiple of 8 bytes.

    3. A Java String object (in Java 9 and later) has the following fields:

      private final byte[] value;
      private final byte coder;
      private int hash;
      private boolean hashIsZero;
      

    That adds up to at least 14 bytes (or 10 bytes with compressed OOPS) depending on how the fields are aligned in memory.

    1. The value field is reference to an array.

    If you add that all up, a 7 character string (comprising ASCII characters) would occupy round_up(12 + 14) + round_up(16 + 7) = 32 + 24 = 56 bytes.

    Why Hashmap based MyDataHolder need more space 32 GB as compared to variable based. Does HashMap have very high mem overheads?

    Basically yes. I won't do the calculations because they are rather complicated, but in the simpler case, a hash-chain node that holds a map entry has an overhead of 40 bytes + 8 bytes for each slot in the hash array. (And that doesn't include key and value objects themselves.)