My understanding is that the hardware architecture and the operating systems are designed not to block the cpu. When any kind of blocking operation needs to happen, the operating system registers an interruption and moves on to something else, making sure the precious time of the cpu is always effectively used.
It makes me wonder why most programming languages were designed with blocking APIs, but most importantly, since the operating system works in an asynchronous way when it comes to IO, registering interruptions and dealing with results when they are ready later on, I'm really puzzled about how our programming language APIs escape this asynchrony. How does the OS provides synchronous system calls for our programming language using blocking APIs?
Where this synchrony comes from? Certainly not at the hardware level. So, is there an infinite loop somewhere I don't know about spinning and spinning until some interruption is triggered?
Your observations are correct - the operating system interacts with the underlying hardware asynchronously to perform I/O requests.
The behavior of blocking I/O comes from threads. Typically, the OS provides threads as an abstraction for user-mode programs to use. But sometimes, green/lightweight threads are provided by a user-mode virtual machine like in Go, Erlang, Java (Project Loom), etc. If you aren't familiar with threads as an abstraction, read up on some background theory from any OS textbook.
Each thread has a state consisting of a fixed set of registers, a dynamically growing/shrinking stack (for function arguments, function call registers, and return addresses), and a next instruction pointer. The implementation of blocking I/O is that when a thread calls an I/O function, the underlying platform hosting the thread (Java VM, Linux kernel, etc.) immediately suspends the thread so that it cannot be scheduled for execution, and also submits the I/O request to the platform below. When the platform receives the completion of the I/O request, it puts the result on that thread's stack and puts the thread on the scheduler's execution queue. That's all there is to the magic.
Why are threads popular? Well, I/O requests happen in some sort of context. You don't just read a file or write a file as a standalone operation; you read a file, run a specific algorithm to process the result, and issue further I/O requests. A thread is one way to keep track of your progress. Another way is known as "continuation passing style", where every time you perform an I/O operation (A), you pass a callback or function pointer to explicitly specify what needs to happen after I/O completion (B), but the call (A) returns immediately (non-blocking / asynchronous). This way of programming asynchronous I/O is considered hard to reason about and even harder to debug because now you don't have a meaningful call stack because it gets cleared after every I/O operation. This is discussed at length in the great essay "What color is your function?".
Note that the platform has no obligation to provide a threading abstraction to its users. The OS or language VM can very well expose an asynchronous I/O API to the user code. But the vast majority of platforms (with exceptions like Node.js) choose to provide threads because it's much easier for humans to reason about.