I am experiencing outofmemory issue while joining 2 datasets; one contains 39M rows other contain 360K rows.
I have 2 worker nodes, each of the worker node has maximum memory of 125 GB.
In Yarn Memory allocated for all YARN containers on a node = 96GB
Minimum Container Size (Memory) = 3072
In Hive settings :
hive.tez.java.opts=-Xmx2728M -Xms2728M -Djava.net.preferIPv4Stack=true -XX:NewRatio=8 -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB
hive.tez.container.size=3410
What values I should set to get rid of outofmemory issue.
I solved it by using increasing the Yarn Memory allocated Minimum Container Size (Memory) = 3072 to 3840 Memory allocated for all YARN containers on a node 96 to 120 GB ( each node had 120GB)
Percentage of physical CPU allocated for all containers on a node 80%
Number of virtual cores 8
https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-hive-out-of-memory-error-oom