apache-sparkspark-streamingazure-databricksdelta-live-tables

How to proper setup Spark on Databricks and DLT pipeline?


I have a DLT pipeline in Databricks from the Azure portal. I would like to increase the maximum size of a streaming message, which is 10 MB by default.

Could someone show me how to properly configure pipeline and what is configuration parameter? I need to double the maximum size of a streaming message.

I noticed that one of possible parameters could be the Spark variable "spark.sql.autoBroadcastJoinThreshold". I tried adding this configuration to the top of my notebook as well as in a pipeline job cluster controlled via a Json file.


Solution

  • you can try to set this configuration directly when you are creating your dlt pipeline, you can there fine tune the cluster configuration, in the advanced option.

    enter image description here

    fetch.message.max.bytes - this will determine the largest size of a message that can be fetched by the consumer.