Here is my understanding of how an application talks with WindowServer:
Where can I find documentation for the red parts?
For example, I would like to know how NSWindowController manages to create a window. I could not find this in any of the documented APIs (CoreGraphics, QuartzCore, etc). Is there some other API providing this (shown as ??? is the diagram)?
The ideal answer to my question would be a small C program than manages to launch an application by using only low-level RPCs (I do realize that such a program would only work on a single version of macOS if it relies on undocumented APs).
I'm giving my perspective from the PC, rather than Mac, side, because on the PC, the old world of direct and unfettered access to the computer screen lasted for much longer.
Back in DOS, the graphics card was the API.
There was, e.g. an implicit frame buffer at a specific address once you switched your VGA card to that practical 320x200, 256 colors mode. This was the environment in which many people learned graphics programming at first.
With the VESA standards for SVGA, there were more and more new modes - same thing, but guided by a committee. You could do things like 1024x768, 256 colors, with bank switching, and serious applications used these modes.
Meanwhile, Windows had come along, and Windows applications didn't have any regular frame buffer access. They could try accessing those special memory addresses, only to often find out that the graphics card would show the data as expected, but either ignore it or smear it around the screen as ugly pixel dust, because it wasn't running in any of those convenient VESA modes from the DOS era, but rather through intricate graphics drivers, usually provided by the graphics card manufacturers, and communicating with the graphics hardware in the most arcane (yet usually also very quick) manner.
Windows applications at this point were in the same spot as all early MacOS applications.
Applications needing the frame buffer would thus rather be written as DOS applications, which thankfully could run in parallel to regular Windows applications. And they even had access to extended memory, so they had access to megabytes of contiguous memory.
At some point, those VESA modes became limiting and annoying, while people wanted a fast equivalent to frame buffer access for real applications. MacOS and Windows apps were bad at games with fast, rotating, colorful graphics, which is one of the reasons why there so many adventure games with one person slowly walking about a static environment.
Enter QuickDraw and DirectX
While feeling and actually being very empowering, this is the exact point where this common wish for true low-level acces you're expressing breaks down. The connection to this other, nostalgically remembered, world of the past is severed.
You now had your frame buffers back in a way - you could quickly and directly manipulate an area of the screen in a fully supported way. Plus neat and nifty accelerated functionality like drawing geometrical shapes with hardware acceleration, if available.
People thus didn't focus so much on what they had lost in the process, and much more on what they were gaining.
3D was added... And so we landed in this day and age where we take it for granted to be able to casually draw millions of polygons per second inside windows or full-screen (hint: it's still a window, just cheating) on macOS and Windows and Linux.
The Shock
I've been exactly in your place a couple of times by now. Finding it super high-friction to set up a simple macOS application without dealing with storyboards &c.
I've looked at your approach first, too. Turns out, for all of the reasons above, there is no lower level API to be found. The only thing remotely constant is NSWindow and everything connected to it.
Pure NSWindow without built-in assistance from Storyboard, SwiftUI, &c. turns out to be horrible. The Carbon API is gone. There's no longer an equivalent to a simple Win32 or classic MacOS program.
On the other hand, try writing one of those, you'll be surprised at the amount of boilerplate for things you've always taken for granted to a degree that you've either forgotten them or never even realized that they exist. It is not the nice, cozy fireplace I kinda had expected when I last went down that route.
So what to do, then?
Given that SwiftUI doesn't look like it's anywhere near being able to entirely replace Cocoa (good, non-trivial SwiftUI apps are 37% Cocoa for the many, many aspects where SwiftUI is either too high-level or too immature or &c.), don't give up on Cocoa. You can use just the most basic stuff, and I believe it'll work for maybe longer than SwiftUI 😅
My Preferred Approach:
Reduce that storyboard to the absolute minimum without going against the grain. Make your one main window, use it for basic settings and such.
Then go for the event loop and drawing essentials. Retrieve your user input the normal way, connecting it from the storyboard in a single place.
Write the drawing function. Draw your content.
Be happy. You've created exactly the environment you've been looking for in those impossible places 🤩
It's just so regrettable that Apple doesn't provide this minimal mode as a project option. Visual Studio did last time I looked, and it just feels reassuringly free.