pythonpython-3.xtriggersbuildbot

Buildbot: worker is idle


I've noticed that the distribution of Builds between workers is sub-optimal, 80% of the time Builds are running on busy workers.

If you have a look at the image, tmp_worker1 can process triggered_build_1, but instead, it's idle!!! For some reason, triggered_build_1 is in acquiring a locked state and is assigned to the busy example-worker

enter image description here

I have the next setup:

Main source code below

# triggerable scheduler
c['schedulers'].append(schedulers.Triggerable(name="trigger_from_main",
    builderNames=['triggered_build_0', 'triggered_build_1', 'triggered_build_2']))

# main builder factory
factory_main = util.BuildFactory()

# trigger
factory_main.addStep(steps.Trigger(
    schedulerNames=['trigger_from_main'],
    waitForFinish=True,
    haltOnFailure=True,
    name='trigger'
))

# main builder 
c['builders'].append(
    util.BuilderConfig(name="test_main",
        workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
        factory=factory_main,
    )
)

# lock
worker_lock = [util.WorkerLock("worker_builds", maxCount=1).access('counting')]

# 1st of 3 sub-builder
c['builders'].append(
    util.BuilderConfig(name="triggered_build_0",
        workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
        factory=factory_subbuild,
        locks=worker_lock,
    )
)

# 2nd of 3 sub-builder
...
# 3rd of 3 sub-builder
...

Solution

  • This behavior is triggered because of locks and the way the build are distributed by the master.

    When a build need to be run, the master do this following step (link to the source code):

    And when the build start on the worker, it take the lock.

    So if the master check the requirement of the next build to dispatch before the lock was acquired, it can dispatch a new build on the same worker (even if they need the same lock).

    You can fix this if you put into quarantine the worker where the build is assign in order to give the worker enough time to take the lock. You can do this with the canStartBuild function which is run just before assigning the build on the worker (docs).

    def canStartBuildLockQuarantine(builder, wfb, request):
        # Put the worker in quarantine for 5 seconds
        wfb.worker.quarantine_timeout = 5
        wfb.worker.putInQuarantine()
        # Reset wfb.worker.quarantine_timeout
        wfb.worker.resetQuarantine()
        return True
    

    And give it to the worker that will take locks.

    c['builders'].append(
        util.BuilderConfig(name="triggered_build_0",
            workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
            factory=factory_subbuild,
            canStartBuild=canStartBuildLockQuarantine,
            locks=worker_lock,
        )
    )