Each scheduling policy (like SCHED_NORMAL
, SCHED_FIFO
etc..) is implemented as a struct sched_class
, and instances like fair_sched_class, rt_sched_class, dl_sched_class
, etc., are defined using the DEFINE_SCHED_CLASS()
macro:
#define DEFINE_SCHED_CLASS(name) \
const struct sched_class name##_sched_class \
__aligned(__alignof__(struct sched_class)) \
__section("__" #name "_sched_class")
These class instances are then placed in a specific order in memory using the linker script (vmlinux.lds.h) via the SCHED_DATA
section:
#define SCHED_DATA \
STRUCT_ALIGN(); \
__sched_class_highest = .; \
*(__stop_sched_class) \
*(__dl_sched_class) \
*(__rt_sched_class) \
*(__fair_sched_class) \
*(__idle_sched_class) \
__sched_class_lowest = .;
Early in the sched_init()
we check the following ?
BUG_ON(&idle_sched_class != &fair_sched_class + 1 ||
&fair_sched_class != &rt_sched_class + 1 ||
&rt_sched_class != &dl_sched_class + 1);
My question: Why is it so critical that these sched_class
structures be contiguous in memory?
I understand that for priority purposes it should be ordered. But why the padding is mandatory ? Why is there a hard check on "&a == &b + 1" rather than just relying on the logical order?
This is because scheduler code relies on them being laid out in order and contiguously, so that most of the code that iterates over them can be simplified down to simple pointer arithmetic. Given a pointer to a sched class, you can simply do ptr++
to get the next (lower) class and ptr--
to get the previous (higher) class. There are indeed a bunch of macros and utility functions in sched.h
that do this. Here are some examples:
static inline const struct sched_class *next_active_class(const struct sched_class *class)
{
class++;
#ifdef CONFIG_SCHED_CLASS_EXT
if (scx_switched_all() && class == &fair_sched_class)
class++;
if (!scx_enabled() && class == &ext_sched_class)
class++;
#endif
return class;
}
#define for_class_range(class, _from, _to) \
for (class = (_from); class < (_to); class++)
#define for_each_class(class) \
for_class_range(class, __sched_class_highest, __sched_class_lowest)
#define for_active_class_range(class, _from, _to) \
for (class = (_from); class != (_to); class = next_active_class(class))
#define for_each_active_class(class) \
for_active_class_range(class, __sched_class_highest, __sched_class_lowest)