c++undefined-behaviorcpu-cachepointer-conversion

Interpreting part of an array as an object by casting a pointer to an array element


Say you have an array of floats representing raw data, and several types representing 2D shapes that only have floats as members, like this:

#include <iostream>
#include <array>

struct point_t {
    float x, y;
    void print() const {
        std::cout << "point " << x
            << "," << y << std::endl;
    };
};
struct circle_t {
    float x, y, r;
    void print() const {
        std::cout << "circle " << x  << ","
            << y << "," << r << std::endl;
    };
};
struct rectangle_t {
    float x1, y1, x2, y2;
    void print() const {
        std::cout << "rectangle " << x1 << "," << y1
            << "," << x2 << "," << y2 << std::endl;
    };
};

int main() {

    std::array<float, 50> data{6.1f, 0.3f, 15.4f, 23.2f, 6.1f, 30.f, 35.f, 40.f, 40.f};
    
    //casting objects from the array:
    point_t& point = *(point_t*)&data[0];
    circle_t& circle = *(circle_t*)&data[2];
    rectangle_t& rectangle = *(rectangle_t*)&data[5];

    point.print();
    circle.print();
    rectangle.print();
}

//output:
//point 6.1,0.3
//circle 15.4,23.2,6.1
//rectangle 30,35,40,40

This compiles and appears to work. Is it undefined behavior or dangerous, and if so how can the approach be improved?

For context, this is meant for intense simulations and games where performance is critical. The goal is to manually control where objects of different types and sizes are created and moved in memory, to pack them together with very good cache locality. This includes updating and changing the data as objects are created and destroyed or moved in physical space (they may be periodically moved in memory to respect a certain space filling curve, like a Morton curve, which tends to reduce cache misses when performing spatial queries).

The data will change chaotically and unexpectedly and keeping objects which need to interact with each other as close together as possible in memory is critical.

From what I understand, std::memcpy and std::bit_cast both involve copying data to create a new object, which would be a performance hit. Putting the objects into std::variant or union reduces performance too; std::variant does type safety checking and it uses extra memory to keep track of the type. The size of std::variant also bloats to the largest type stored, which is a problem that unions have as well. The memory bloat makes everything slightly further apart which increases cache misses. Apparently unions may be able to negatively affect the ability to store values in CPU registers, hurting performance. I've benchmarked accessing a member normally VS through a union in a loop and noticed that unions are slightly slower on my system.

I am aware of entity component systems as a solution to cache locality for these kinds of applications but I'm curious about a more traditional object-oriented approach that still achieves good locality - but without running into strange problems due to undefined behavior.


Solution

  • The goal is to manually control where objects of different types and sizes are created and moved in memory, to pack them together with very good cache locality.

    This is what placement new is created for. You can do it like this:

    int main() {
    
        alignas(float) char data[50*sizeof(float)];
        
        auto point = new(&data[0*sizeof(float)]) point_t{6.1f, 0.3f};
        auto circle = new(&data[2*sizeof(float)]) circle_t{15.4f, 23.2f, 6.1f};
        auto rectangle = new(&data[5*sizeof(float)]) rectangle_t{30.f, 35.f, 40.f, 40.f};
    
        point->print();
        circle->print();
        rectangle->print();
    }