data-oriented-design

Data-oriented design in practice?


There has been one more question on what data-oriented design is, and there's one article which is often referred to (and I've read it like 5 or 6 times already). I understand the general concept of this, especially when dealing with, for example, 3d models, where you'd like to keep all vertexes together, and not pollute your faces with normals, etc.

However, I do have a hard time visualizing how data-oriented design might work for anything but the most trivial cases (3d models, particles, BSP-trees, and so on). Are there any good examples out there which really embraces data-oriented design and shows how this might work in practice? I can plow through large code-bases if needed.

What I'm especially interested in is the mantra "where there's one there are many", which I can't really seem to connect with the rest here. Yes, there are always more than one enemy, yet, you still need to update each enemy individually, cause they're not moving the same way now are they? Same goes for the 'balls'-example in the accepted answer to the question above (I actually asked this in a comment to that answer, but haven't gotten a reply yet). Is it simply that the rendering will only need the positions, and not the velocities, whereas the game simulation will need both, but not the materials? Or am I missing something? Perhaps I've already understood it and it's a far more straightforward concept than I thought.

Any pointers would be much appreciated!


Solution

  • So, what is DOD all about? Obviously, it's about performance, but it's not just that. It's also about well-designed code that is readable, easy to understand and even reusable. Now Object Oriented design is all about designing code and data to fit into encapsulated virtual "objects". Each object is a seperate entity with variables for properties that object might have and methods to take action on itself or other objects in the world. The advantage of OO design is that it's easy to mentally model your code into objects because the whole (real) world around us seems to work in the same way. Objects with properties that can interact with each other.

    Now the problem is that the cpu in your computer works in a completely different way. It works best when you let it do the same things again and again. Why is that? Because of a little thing called cache. Accessing RAM on a modern computer can take 100 or 200 CPU cycles (and the CPU has to wait all that time!), which is way too long. So there's this small portion of memory on the CPU that can be accessed really quickly, cache memory. Problem is it's only a few MB tops. So every time you need data that wasn't in cache, you still need to go the long way to RAM. That's not just that way for data, the same goes for code. Trying to execute a function that's not in instruction cache will cause a stall while the code is loaded from RAM.

    Back to OO programming. Objects are big, but most functions need only a small portion of that data, so we're wasting cache by loading unnecessary data. Methods call other methods which call other methods, thrashing your instruction cache. Still, we often do a lot of the same stuff over and over again. Let's take a bullet from a game for example. In a naive implementation each bullet could be a separate object. There might be a bullet manager class. It calls the first bullet's update function. It updates the 3D position using the direction/velocity. This causes a lot of other data from the object to be loaded into the cache. Next, we call the World Manager class to check for a collision with other objects. This loads lots of other stuff into the cache, maybe it even causes code from the original bullet manager class to get dropped from instruction cache. Now we return to the bullet update, there was no collision, so we return to bullet manager. It might need to load some code again. Next up, bullet #2 update. This loads lots of data into the cache, calls world... etc. So in this hypthetical situation, we've got 2 stalls for loading code and let's say 2 stalls for loading data. That's at least 400 cycles wasted, for 1 bullet, and we haven't taken bullets that hit something else into account. Now a CPU runs at 3+ GHz so we're not going to notice one bullet, but what if we've got 100 bullets? Or even more?

    So this is the where there's one there's many story. Yes, there are some cases where you've only got on object, your manager classes, file access, etc. But more often, there's a lot of similar cases. Naive, or even not-naive object oriented design will lead to lots of problems. So enter data oriented design. The key of DOD is to model your code around your data, not the other way around as with OO-design. This starts at the first stages of design. You do not first design your OO code and then optimize it. You start by listing and examining your data and thinking out how you want to modify it(I'll get to a practical example in a moment). Once you know how your code is going to modify the data you can lay it out in a way that makes it as efficient as possible to process it. Now you may think this can only lead to a horrible soup of code and data everywhere but that is only the case if you design it badly (bad design is just as easy with OO programming). If you design it well, code and data can be neatly designed around specific functionality, leading to very readable and even very reusable code.

    So back to our bullets. Instead of creating a class for each bullet, we only keep the bullet manager. Each bullet has a position and a velocity. Each bullet's position needs to be updated. Each bullet has to have a collision check and all bullets that have hit something need to take some action accordingly. So just by taking a look at this description I can design this whole system in a much better way. Let's put the positions of all bullets in an array/vector. Let's put the velocity of all bullets in an array/vector. Now let's start by iterating allong those two arrays and updating each position value with it's corresponding velocity. Now, all data loaded into the data cache is data we're going to use. We can even put a smart pre-loading command to already pre-load some array data in advance so the data's in cache when we get to it. Next, collision check. I'm not going into detail here, but you can imagine how updating all bullets after each other can help. Also note that if there's a collision, we're not going to call a new function or do anything. We just keep a vector with all bullets that had collision and when collision checking is done, we can update all those after each other. See how we just went from lots of memory access to almost none by laying our data out differently? Did you also notice how our code and data, even though not designed in an OO way any more, are still easy to understand and easy to reuse?

    So to get back to the "where there's one there's many". When designing OO code you think about one object, the prototype/class. A bullet has a velocity, a bullet has a position, a bullet will move each frame by it's velocity, a bullet can hit something, etc. When you think about that, you will think about a class, with velocity, position, and an update function which moves the bullet and checks for collision. However, when you have multiple objects you need to think about all of them. Bullets have positions, velocity. Some bullets may have collision. Do you see how we're not thinking about an individual object any longer? We're thinking about all of them and are designing code a lot differently now.

    I hope this helps answer your second part of the question. It's not about whether you need to update each enemy or not, it's about the most efficient way to update them. And while designing only your enemies using DOD may not help gain much performance, designing the entire game around these principles (only where applicable!) may lead to a lot of performance gains!

    So onto the first part of the question, that is other examples of DOD. I'm sorry but I don't have that much there. There is one really good example though, I came across this some time ago, a series on data oriented design of a behavior tree by Bjoern Knafla: http://bjoernknafla.com/data-oriented-behavior-tree-overview You probably want to start at the first one in the series of 4, links are in the article itself. Hope this still helps, despite the old question. Or maybe some other SO user come across this question and have some use from this answer.