arraysperformanceglslgpudata-oriented-design

GLSL: Array of Structs vs Struct of Arrays in OpenGL Buffers


Now, when reading through different resources in the Internet, a structures of arrays seems to be a very performant way to storage your data, if you are processing a large arrays sequentially.

For example in C++

struct CoordFrames
{
    float* x_pos;
    float* y_pos;
    float* z_pos;
    float* scaleFactor;
    float* x_quat;
    float* y_quat;
    float* z_quat;
    float* w_quat;
};

allowing faster processing of a large array (thanks to SIMD) than an array of

struct CoordFrame
{
    glm::vec3 position;
    float scaleFactor;
    glm::quat quaternion;
};

GPUs are processors designed for massive parallel computing. SIMD is a "must have" here. So the conclusion would be that structures of arrays would be most useful here.

But ...

Question: Does it make sense to try using structures of arrays in GLSL shader?

What's true? Are GPUs highly optimized for the way we love to write shaders, that it doesn't really make any difference?


Solution

  • I do not think it would help in general although I currently have no hard numbers.

    Many modern GPU's indeed use SoA format. However the array part is often the multiple invocations of the shader, and when looking at a single invocation it is as if you execute without SIMD. Therefore, especially with uniform variables, SoA layout of the variables has no significant performance difference.

    Some other GPU's actually have AoS layout. For example Intel Sandy Bridge (Core 2011 edition) executes 2 vertex shaders at the same time on a core, but has an 8 wide SIMD unit, with essentially a layout of 2 vec4's. Working with vectors therefore can make it easier for the compiler to optimize your code.

    If we look at the benefits of SoA on the CPU two major benefits are

    The better cache utilization is basically the same for the GPU. However often you optimize your datastructures for the single draw operation anyway, so there are no members that you leave out to improve cache utilization. Although it would probably still be wasteful to include an array of materials as AoS when rendering a shadowmap for example.

    Using SIMD instructions is much less of a problem as from the perspective of a single shader invocation you are not really using SIMD and therefore no restrictions on your loads and stores. Depending on the architecture there may be some instructions that load multiple elements, but for example with the AMD GCN architecture, you can use the individually loaded variables afterwards and can therefore just load an entire struct and use it.

    I would guess that if you are computation limited it does not really matter and if you are bandwidth limited you should decrease the size of the loaded data, where you could possibly use an SoA layout to reach that goal.

    If it is just the array of 16 lights I would not worry though as it is pretty small and will probably not really use significant bandwidth.

    As for the interleaved attributes, this is probably very GPU dependent. For example with Sandy Bridge, with 2 vertex shader invocations, you have much better locality of those two vertices by interleaving them.

    However, on AMD GCN where a single core can execute 64 shaders at the same time, you are probably going to get good locality even if you do not interleave your attributes,as each attribute should fill entire cache lines (assuming the vertices are close if you do indexed rendering).

    Just remember that performance characteristics can vary between GPU's, drivers and what you are trying to do. Nothin beats a good benchmark for the specific problem.