Trio seems to start tasks in the nursery in exactly the opposite order that tasks were given at

I don't expect trio to run in any particular order. It is async, after all. But I noticed something strange and wanted to ask if anyone else could explain what might have happened:

I wanted to test the rate of data ingestion from Google's Pub Sub if I send a small message one at a time. In order to focus on the I/O of pushing to Pub Sub, I sent messages async, and I use trio because, well, I want to keep my head from exploding.
I specifically wanted to look at how fast Pub Sub would be if I turned on it's ordering capability. I really just wanted to test throughput, and since I was using an async process, I didn't expect any ordering of messages, but I tagged the messages just out of curiosity.
I noticed that the messages were processed in pub sub (and therefore sent to pub sub) at exactly the opposite order that is written in the imperative code.

Here is the important snippet (I can provide more if it is helpful):

async with open_nursery() as nursery:
    for num in range(num_messages):
        logger.info(f"===Creating data entry # {num}===")
        raw_data = gen_sample(DATASET, fake_generators=GENERATOR) # you can ignore this, it is just a toy data generator. It is synchronous code, but _very_ fast.
        raw_data["message number"] = num  # <== This is the CRITICAL LINE, adding the message number so that I can observe the ordering.
        data = dumps(raw_data).encode("utf-8")
        nursery.start_soon(publish, publisher, topic_path, data, key)

and here is the publish function:

async def publish(
    publisher: PublisherClient, topic: str, data: bytes, ordering_key: str
):
    future = publisher.publish(topic, data=data, ordering_key=ordering_key)
    result = future.result()
    logger.info(
        f"Published {loads(data)} on {topic} with ordering key {ordering_key}   "
        f"Result: {result}"
    )

And when I look at the logs in Pub/Sub, they are 100% consistently in reverse order, such that I see "message number" 50_000 first, then 49_999, 49_998, ..., 3, 2, 1. Pub Sub is maintaining ordering. This means somehow, the async code above is "first" starting the very last task to reach nursery.start_soon.

I'm not sure why that is. I don't understand exactly how Pub Sub's Future works, because the documentation is sparse (at least what I found), so it is possible that the "problem" lies with Google's PublisherClient.publish() method, or Google's result() method that the returned future uses.

But it seems to me that it is actually due to the nursery.start_soon. Any ideas why it would be exactly in the opposite order of how things are written imperatively?

Solution

Heh, nicely spotted.

So what's going on here is that Trio intentionally randomizes the order that tasks get scheduled, to help you catch places where you're accidentally relying on scheduler details to coordinate between tasks (since this is fragile, hard to reason about, and constrains trio's ability to improve the scheduler in the future).

Originally, the way this worked is that when tasks became runnable, they'd get appended to a list, and then each scheduler tick would use random.shuffle to randomize the list, then loop over it and run each task.

However! It turns out that randomly shuffling a list is actually kind of expensive, esp for something you're doing on every scheduler tick. So as an optimization, we switched to a much simpler randomization scheme: flip a coin, and if it's heads do nothing, if it's tails then reverse the list. The intuition here is that this is way cheaper, but it still guarantees that any two tasks will be scheduled both as A->B and B->A on some fraction of runs, and this should be enough to catch most cases where you're accidentally relying on scheduling order.

So if you do enough runs, you should find that your log lines are actually 50% in ascending order, and 50% in descending order. (But you shouldn't rely on this :-))