architecturedata-oriented-design

Data oriented design - how are data dependencies solved?


I saw "Data-Oriented Design and C++" by Mike Acton and I found it quite interesting. I don't understand how data dependencies are solved though.

Imagine I have a simple 2d engine with: * physical data - to handle physics * graphical data - to render sprites * sound data - to play sounds

Graphical data and sound data depend on position stored in physical data. Position can be referenced from the physical data, but that in my opinion kills the whole point of DOD - to have required data in the same memory location.

How is such situation handled in Data Oriented Design?


Solution

  • DOD is more of a general way of going about designing your architecture that focuses on how to efficiently represent data first. There's no singular way of doing it. Linus Torvalds exhibited that mindset towards the Linux Kernel and Git and so forth, but it's a very different domain from games. The main thing is that he focused on how to efficiently represent data first and foremost.

    As a basic example, if you are designing an image processing application, then if you weren't thinking in a data-oriented fashion and instead focused on how to most easily support the widest range of pixel formats and come up with the easiest interfaces to use, you might come up with an abstract Pixel and maybe even a heap allocation per pixel. At that point you're paying the cost of a virtual pointer (often larger than the pixels themselves), dynamic dispatch per-pixel, possibly another layer of indirection, and potentially a complete loss of spatial locality. If, instead, you thought about how to efficiently represent the data first, you'd probably abstract at a coarser Image level (an abstract collection of pixels, possibly millions of pixels for a given image) if you abstract at all where you don't pay for such overhead on a per-pixel level.

    That said, for games, often the common way to approach what you are talking about is to make the data centrally-accessible. And this might seem like a violation of SE principles but typically if you use something like an entity-component system, any given type of component will often only be accessed by a small number of systems. As a result the scope of that data tends to be small enough to effectively maintain invariants.

    enter image description here

    As for events that might occur in the game like two entities colliding against each other in the physics system for which the sound system might want to play a sound, there are many approaches to go about that to keep the physics and sound system decoupled from each other. One is to use an event queue.

    As for the required data of one system also being shared by another, that's generally quite practical. If you want to run these systems in parallel to each other, they'd still have to copy the shared data, potentially update it, and somehow coordinate their results. That said, in my opinion it is so much more productive to avoid fiddling with that and just parallelize what a system is doing (ex: use a parallel for loop), because typically there are only a handful of systems in an ECS that are hotspots and do really heavy-lifting, and you can easily distribute the work of those specific systems across threads without trying to run many systems concurrently and opening up that can of worms.