azureapache-sparkdatabricksazure-databricksspark-notebook

why the Job running time and command execution time not matching in databricks notebook?


I have a azure databricks job and it's triggered via ADF using a api call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands doesn't add up to even 4-5 mins

The interactive cluster is already up and running while this got triggered. Please tell me why this sum of individual cell execution time doesn't match with the overall job execution time ? Where can i see what has taken the additional time here ?


Solution

  • Please follow below reference it has detail explanation about:

    Reference:

    How to measure the execution time of a query on Spark

    https://db-blog.web.cern.ch/blog/luca-canali/2017-03-measuring-apache-spark-workload-metrics-performance-troubleshooting

    https://spark.apache.org/docs/latest/monitoring.html