As per the doc (https://docs.databricks.com/en/optimizations/spark-ui-guide/spark-job-gaps.html), Any execution of code that is not Spark will show up in the timeline as gaps.For example, you could have a loop in Python which calls native Python functions. This code is not executing in Spark
and it can show up as a gap in the timeline.
Where does the non spark code (plain python code) runs? Is it on any worker or Driver?
Since the Driver node is the one responsible for orchestrating the application, Native Python code, since it isn't tied directly to a Spark operation will be run on a Driver node. Since this code is not running on parallel worker nodes that is the reason there will be gaps in the timeline.