c++multithreadingcachingsimulationlockless

Cache line padding for variables that are a multiple of cache line size


I am creating a very fast multi-threaded discrete event simulation framework. The core of the framework uses atomics and lockless programming techniques to achieve very fast execution across many threads. This requires me to align some variables to cache lines and pad the remaining cache line space so that I don't have cache line contention. Here is how I do it:

// compute cache line padding size
constexpr u64 CLPAD(u64 _objSize) {
  return ((_objSize / CACHELINE_SIZE) * CACHELINE_SIZE) +
      (((_objSize % CACHELINE_SIZE) > 0) * CACHELINE_SIZE) -
      _objSize;
}

alignas(CACHELINE_SIZE) MyObject myObj;
char padding[CLPAD(sizeof(myObj))];

This works great for me, but I stumbled upon an issue today when I was using this methodology for a new object type. The CLPAD() function returns the amount of chars needed to pad the input type up to the next cache line. However, if I put in a type that is exactly sized a multiple of number of cache lines, the CLPAD returns 0. If you attempt to create a zero sized array, you get this warning/error:

ISO C++ forbids zero-size array 'padding'

I know I could modify CLPAD() to return CACHELINE_SIZE in this case, but then I'm burning a cache line worth of space for no reason.

How can I make the declaration of 'padding' disappear if CLPAD returns 0?


Solution

  • Taking a page from std::aligned_storage<>, I've come up with the following:

    template<class T, bool = false>
    struct padded
    {
        using type = struct
        {
            alignas(CACHELINE_SIZE)T myObj;
            char padding[CLPAD(sizeof(T))];
        };
    };
    
    template<class T>
    struct padded<T, true>
    {
        using type = struct
        {
            alignas(CACHELINE_SIZE)T myObj;
        };
    };
    
    template<class T>
    using padded_t = typename padded<T, (sizeof(T) % CACHELINE_SIZE == 0)>::type;
    

    Usage:

    struct alignas(32) my_type_1 { char c[32]; }; // char c[32] to silence MSVC warning
    struct my_type_2 { char c[CACHELINE_SIZE * 2]; }; // ditto
    
    int main()
    {
        padded_t<my_type_1> pt0;
        padded_t<my_type_2> pt1;
    
        sizeof(pt0);    // 128
        alignof(pt0);   // 128
    
        sizeof(pt1);    // 256
        alignof(pt1);   // 128
    }
    

    You can provide a function to access myObj however you wish.