cadence-workflowtemporal-workflowuber-cadence

Pending decisison tasks never picked for execution and eventually times out in uber cadence workflow


What could be the reason for the decision tasks to not get picked for execution in cadence cluster. They remains at pending state and finally times out. I dont see any error logs. How do I debug this ?


Solution

  • It’s very likely that there is no worker available and actively polling tasks for the tasklist.

    Best way to confirm is to click on the tasklist naming in the webUI and see what are the workers behind the tasklist. Since it’s decision task, you should check the decision handler for the tasklist.

    You can also use CLI to describe the tasklist to give the same information:

     cadence tasklist desc —-tl <tasklist name> 
    

    In some extremely rare cases(I personally never seen but heard that happened in Uber with large scale cluster) that cadence server lost the task. In that case you can use CLI to either regenerate the task, or reset the workflow to unblock the workflow:

    To regenerate task:

    cadence adm wf refresh-tasks  -w <wf id> 
    

    To reset:

    cadence wf reset —-reset_type LastDecisionCompleted -w <wf id>