triggersazure-data-factory

Stop running Azure Data Factory Pipeline when it is still running


I have a Azure Data Factory Pipeline. My trigger has been set for every each 5 minutes. Sometimes my Pipeline takes more than 5 mins to finished its jobs. In this case, Trigger runs again and creates another instance of my Pipeline and two instances of the same pipeline make problem in my ETL. How can I be sure than just one instance of my pipeline runs at time?

enter image description here

As you can see there are several instances running of my pipelines

enter image description here


Solution

  • Few options I could think of:

    OPT 1

    Specify 5 min timeout on your pipeline activities:

    https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities#activity-policy

    OPT 2

    1) Create a 1 row 1 column sql RunStatus table: 1 will be our "completed", 0 - "running" status

    2) At the end of your pipeline add a stored procedure activity that would set the bit to 1.

    3) At the start of your pipeline add a lookup activity to read that bit.

    4) The output of this lookup will then be used in if condition activity:

    To make a full use of this option, you can turn the table into a log, where the new line with start and end time will be added after each successful run (before initiating a new run, you can check if the previous run had the end time). Having this log might help you gather data on how much does it take to run your pipeline and perhaps either add more resources or increase the interval between the runs.

    OPT 3

    Monitor the pipeline run with SDKs (have not tried that, so this is just to possibly direct you): https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically

    Hopefully you can use at least one of them