spark, hadoop, tez, etc. all have a list of properties that can be manually configured. example:
yarn.nodemanager.resource.memory-mb
or
spark.executor.memory
or
pig.exec.reducers.bytes.per.reducer, pig.exec.reducers.max
....
Is there an equivalent for PIG_HEAPSIZE? It seems like it can only be set via the environment variable. what is this environment variable doing behind the scenes? which properties is it affecting?
Pig relies on an execution engine such as Tez, Spark, or Mapreduce, so it would inherit the heap sizes from those configurations, such as the Spark executor memory, rather than use its own.
The only thing that PIG_HEAPSIZE
controls is the local JVM driver process where you run the pig
command, therefore it only should be a local env-var, not a remote configuration property.