clinuxpthreadsposixpriority-inversion

Does SCHED_IDLE actually preclude execution on non-idle core?


I'm trying to implement an unprivileged test case for unbounded priority inversion in the absence of priority inheritance mutexes, using SCHED_IDLE. The test works with SCHED_FIFO and different realtime priorities (deadlocking for non-PI mutex, immediately resolving with PI mutex), but to include this in a testset that will run without realtime privileges, I'd like to use SCHED_IDLE instead, with the "medium" and "high" priority threads both being SCHED_OTHER (in which case it's not really priority "inversion", but the concept should still work - the "medium" one should preclude execution of the "low" one).

Unfortunately, the test fails to differentiate between PI and non-PI mutexes; it makes forward progress either way. Apparently the SCHED_IDLE task is running even when there is another runnable task. CPU affinity has been set to bind them all to the same core so that the low priority task can't migrate to a different core to run. And I'm aware that SCHED_IDLE tasks are supposed to run with elevated privileges while in kernelspace to prevent kernelspace priority inversion, so I've tried ensuring that the "low" thread doesn't enter kernelspace by making it busy-loop in userspace, and strace shows no indication that it's making a syscall during the time it should not be making forward progress.

Does Linux's SCHED_IDLE just allow idle tasks to run when the core is not actually idle? Or is there something else I might be missing?

Here is the test code, slightly adapted so that it can be run either in realtime mode or SCHED_IDLE:

#define _GNU_SOURCE
#include <pthread.h>
#include <sched.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <semaphore.h>

sem_t sem;

void *start1(void *p)
{
    pthread_mutex_lock(p);
    sem_post(&sem);
    sem_post(&sem);
    usleep(100000);
    pthread_mutex_unlock(p);
    return 0;
}

void *start2(void *p)
{
    sem_wait(&sem);
    time_t t0 = time(0);
    while (pthread_mutex_trylock(p)) {
        if (time(0)>t0+5) return 0;
    }
    pthread_mutex_unlock(p);
    return 0;
}

void *start3(void *p)
{
    sem_wait(&sem);
    struct timespec ts;
    clock_gettime(CLOCK_REALTIME, &ts);
    ts.tv_sec += 5;
    int r;
    if (r=pthread_mutex_timedlock(p, &ts)) {
        printf("failed: %d %s\n", r, strerror(r));
    } else {
        pthread_mutex_unlock(p);
    }
    return 0;
}

int main(int argc, char **argv)
{
    int policy = argc>1 ? SCHED_IDLE : SCHED_FIFO;
    int a = sched_get_priority_min(policy);
    pthread_attr_t attr;
    pthread_t t1,t2,t3;
    struct sched_param param = {0};

    cpu_set_t set = {0};
    CPU_ZERO(&set);
    CPU_SET(0, &set);
    pthread_setaffinity_np(pthread_self(), sizeof set, &set);

    pthread_attr_init(&attr);
    pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
    pthread_attr_setschedpolicy(&attr, policy);

    pthread_mutexattr_t ma;
    pthread_mutexattr_init(&ma);
    pthread_mutexattr_setprotocol(&ma, PTHREAD_PRIO_INHERIT);
    pthread_mutexattr_settype(&ma, PTHREAD_MUTEX_ERRORCHECK);
    pthread_mutex_t mtx;
    pthread_mutex_init(&mtx, &ma);

    sem_init(&sem, 0, 0);

    param.sched_priority = a+1;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t2, policy==SCHED_IDLE ? 0 : &attr, start2, &mtx)) return 1;

    param.sched_priority = a+2;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t3, policy==SCHED_IDLE ? 0 : &attr, start3, &mtx)) return 1;

    param.sched_priority = a;
    pthread_attr_setschedparam(&attr, &param);
    if (pthread_create(&t1, &attr, start1, &mtx)) return 1;

    pthread_join(t1, 0);
    pthread_join(t2, 0);
    pthread_join(t3, 0);
    return 0;
}

Solution

  • Does Linux's SCHED_IDLE just allow idle tasks to run when the core is not actually idle? Or is there something else I might be missing?

    This is correct. SCHED_IDLE gives tasks a very low but non-zero weighting - about 70% less CPU time than a nice 19 task.