I am making a program that is bench-marking a lot of generated schedules for a particular algorithm. But that is taking a lot of time, for the most part due to the compilation of each schedule. And I was wondering If there are any ways to speed up this process.
For example using AOT compilation or generators, but I don't think it is possible to give a generator different schedules after it has been created? (E.g. have the schedule as an input parameter.)
Or are there any compiler flags that can give a significant speed-up?
However I also saw that in the autoscheduler a cost-model is used to predict the execution time of a schedule, this would solve my problem. But I cannot figure out if it is possible or how to use this cost model in my own program, and if it only works for schedules that the autoscheduler generated or for every schedule.
Unfortunately there's no great answer. The bulk of the compile time is in Halide lowering and in LLVM, which must be done separately for every schedule, so just reusing a Generator won't help you. You can use Func::specialize on a boolean input param to switch between schedules at runtime, but that doesn't save you much compile time relative to compiling the options separately.
The cost model in the autoscheduler is specific to its representation of the subspace of Halide schedules that it explores, and wouldn't work on arbitrary Halide schedules.
There's one trick that might help: If your algorithm is long and complicated, and you know where some of the compute_roots should be (e.g. the last thing before a conv layer), then you can break your algorithm into multiple pieces and independently search over schedules for each. Compiling smaller algorithms is moderately faster, but more importantly this will make the overall search more efficient in terms of the number of samples it needs to take.