c++data-oriented-design

I'm trying out Data Oriented Design - Can I do this with std::vector?


Okay, here's sample code comparing an Object Oriented Programming (OOP) solution vs a Data Oriented Design (DOD) solution of updating a bunch of balls.

const size_t ArraySize = 1000;

class Ball
{
public:
    float x,y,z;
    Ball():
        x(0),
        y(0),
        z(0)
    {
    }

    void Update()
    {
        x += 5;
        y += 5;
        z += 5;
    }
};

std::vector<Ball> g_balls(ArraySize);

class Balls
{
public:
    std::vector<float> x;
    std::vector<float> y;
    std::vector<float> z;

    Balls():
        x(ArraySize,0),
        y(ArraySize,0),
        z(ArraySize,0)
    {
    }

    void Update()
    {
        const size_t num = x.size();
        if(num == 0)
        {
            return;
        }

        const float* lastX = &x[num - 1];

        float* pX = &x[0];
        float* pY = &y[0];
        float* pZ = &z[0];
        for( ; pX <= lastX; ++pX, ++pY, ++pZ)
        {
            *pX += 5;
            *pY += 5;
            *pZ += 5;
        }
    }
};

int main()
{
    Balls balls;

    Timer time1;
    time1.Start();
    balls.Update();
    time1.Stop();

    Timer time2;
    time2.Start();
    const size_t arrSize = g_balls.size();
    if(arrSize > 0)
    {
        const Ball* lastBall = &g_balls[arrSize - 1];
        Ball* pBall = &g_balls[0];
        for( ; pBall <= lastBall; ++pBall)
        {
            pBall->Update();
        }
    }
    time2.Stop();


    printf("Data Oriented design time: %f\n",time1.Get_Microseconds());
    printf("OOB oriented design  time: %f\n",time2.Get_Microseconds());

    return 0;
}

Now, this does compile and run in Visual Studio, though I'm wondering if I'm allowed to do this, supposed to be able to reliably do this:

const float* lastX = &x[num - 1];//remember, x is a std::vector of floats

float* pX = &x[0];//remember, x is a std::vector of floats
float* pY = &y[0];//remember, y is a std::vector of floats
float* pZ = &z[0];//remember, z is a std::vector of floats
for( ; pX <= lastX; ++pX, ++pY, ++pZ)
{
    *pX += 5;
    *pY += 5;
    *pZ += 5;
}

From my understanding the data in a std::vector are supposed to be contiguous, though I'm not sure because of how it's being stored internally if this is going to be an issue on another platform, if it breaks the standard. Also, this was the only way I was able to get the DOD solution to outdo the OOP solution, any other way of iterating wasn't as good. I could use iterators, though I'm pretty sure that it might only be quicker than OOP with optimizations enabled, aka in release mode.

So, is this a good way to do DOD (best way?), and is this legal c++?

[EDIT] Okay, for DOD this is a poor example; the x,y,z should be packaged in a Vector3. So, while DOD ran faster in debug than OOP, in release it was another story. Again, this is a bad example of how you would want to use DOD efficiently, though it does show it's short-comings if you need to access a bunch of data at the same time. The key to using DOD properly is to, "design data based on access patterns".


Solution

  • As pointed out already, vector was generally contiguous prior to C++11 and now guaranteed as such with a new data method which actually returns a direct pointer to the internal array used by it. Here's your ISO C++ standard quote:

    23.2.6 Class template vector [vector]

    [...] The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

    That said, I wanted to jump in mainly because of the way you're benchmarking and using "DOD":

    So, while DOD ran faster in debug than OOP, in release it was another story.

    This kind of sentence doesn't make much sense because DOD is not synonymous with using SoAs for everything, especially if that leads to a performance degradation.

    Data-oriented design is just a generalized approach to design where you consider how to store and efficiently access data in advance. It becomes one of the first things to consider upfront when approaching designs using this mindset. The opposite is starting off, say, designing an architecture trying to figure out all the functionality it should provide along with objects and abstractions and pure interfaces and so forth and then leaving the data as an implementation detail to be filled out later. DOD would start off with the data as a fundamental thing to consider in the design stage and not an implementation detail to be filled in as an afterthought. It's useful in performance-critical cases where performance is a fundamental design-level requirement demanded by customers, and not just an implementation luxury.

    In some cases the efficient representation of the data structures actually leads to new features, somewhat allowing the data itself to design the software. Git is an example of such a software where its features actually revolve around the changeset data structure to some degree where its efficiency actually lead to new features being conceived. In those cases the software's features and user-end design actually evolves out of its efficiency, opening up new doors because the efficiency allows things to be done, say, interactively that were previously thought to be too computationally expensive to do in any reasonable amount of time. Another example is ZBrush which reshaped my VFX industry by allowing things people thought were impossible a couple of decades ago, like sculpting 20 million polygon meshes interactively with a sculpting brush to achieve models so detailed of a kind no one had seen before in the late 90s and early 2000s. Another is voxel cone tracing which is allowing games even written on Playstation to have indirect lighting with diffuse reflections; something people would still think requires minutes or hours to render a single frame without such data-oriented techniques, not 60+ frames per second. So sometimes an effective DOD approach will actually yield new features in the software that people formerly thought were not possible because it broke the analogical sound barrier.

    A DOD mindset could still lead to a design that utilizes an AoS representation if that's deemed to be more efficient. An AoS would often excel in cases where you need random-access, for example, and all or most interleaved fields are hot and frequently accessed and/or modified together.

    Also this is just getting to my spin on it, but in my opinion DOD doesn't have to arrive at the most efficient data representation upfront. What it does need is to arrive at the most efficient interface designs upfront to leave sufficient breathing room to optimize as needed. An example of a software which seems to lack the foresight that a DOD mindset would provide would be a video compositing software that represents data like this:

    class IPixel
    {
    public:
        virtual ~IPixel() {}
        ...
    };
    

    Just one glance at the code above can reveal that there's a significant lack of foresight as to how to design things for efficient data representation and access. For starters, if you consider a 32-bit RGBA pixel, the cost of the virtual pointer, assuming 64-bit pointer size and alignment, would quadruple the size of a single pixel (64-bit vptr + 32-bit pixel data + 32-bits of padding for alignment of vptr). So anyone applying a DOD mindset would generally steer clear of such interface designs like the plague. They might still benefit from an abstraction, however, like being able to utilize the same code for images with many different pixel formats. But in that case I'd expect this:

    class IImage
    {
    public:
       virtual ~IImage() {}
       ...
    };
    

    ... which trivializes the overhead of a vptr, virtual dispatch, possible loss of contiguity, etc. to the level of an entire image (possibly millions of pixels) and not something paid per-pixel.

    Typically a DOD mindset does tend to lead to coarser, not granular, interface designs (interfaces for whole containers, as with the case of an image interface representing a container of pixels, or sometimes even containers of containers). The main reason is that you don't have much breathing room to centrally optimize if you have a codebase like this:

    enter image description here

    Because now let's say you want to multithread the processing of many balls at once everywhere. You can't without rewriting the entire codebase using balls individually. As another example let's say you want to change the representation of a ball from AoS to SoA. That would require rewriting Ball to become Balls along with the entire codebase using the former Ball design. Similar thing if you want to process balls on the GPU. So typically a DOD mindset would tend to favor a coarser design, like Balls:

    enter image description here

    In this second case, you can apply all the optimizations you ever need to process balls in parallel, represent them with SoAs, etc. -- whatever you want without rewriting the codebase. But that said, the Balls implementation might still privately store each individual Ball using an AoS:

    class Balls
    {
    public:
        ...
    private:
        struct Ball
        {
            ...
        };
        vector<Ball> balls;
    };
    

    ... or not. It doesn't matter nearly as much at this point because you're free to change the private implementation of Balls all you like now without affecting the rest of the codebase.

    Finally for your benchmark, what does it do? It's basically looping through a bunch of single-precision floats and adding 5 to them. It doesn't make any real difference in that case whether you store one array of floats or a thousand. If you store more arrays, then inevitably that adds some overhead with no benefit whatsoever if all you're going to be doing is looping through all floats and adding 5 to them.

    To get use of an SoA representation, you can't just write code that does exactly the same thing to all fields. SoAs typically excel in a sequential access pattern on non-trivial input sizes when you actually need to do something different with each field, like transform each x/y/z data field using a transformation matrix with efficient SIMD instructions (either handwritten or generated by your optimizer) transforming 4+ balls at once, not simply adding 5 to a boatload of floats. They especially excel when not all fields are hot, like a physics system not being interested in the sprite field of a particle (which would be wasteful to load into a cache line only to not use it). So to test the differences between an SoA and AoS rep, you need a sufficiently real-world kind of benchmark to see practical differences.