pysparkhivehive-configuration

Increase max row size in HIVE


I have a pyspark job with these configs:

self.spark = SparkSession.builder.appName("example") \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.config("hive.exec.max.dynamic.partitions", "5000000") \
.config("hive.exec.max.dynamic.partitions.pernode", "1000000") \
.enableHiveSupport() \
.getOrCreate()

I can not find anywhere how to set a configuration to increase the max row size to 150mb. I found the command only in impala.

Thanks in advance.


Solution

  • There is no such configuration in Hive because Hive is not all-in-memory and can process rows of virtually unlimited size. Single string can be up to 2Gb in size, and the number of columns can be many tens of thousands, though you may need (most probably) many thousands rows to fit in single container memory, but usually mapper or reducer size is more than 1G and can be increased.