cperformancecaching

full cache-line align or half


I have a ring-buffer and high reading and writing usage. I have the buffer structure defined as below:

typedef struct {
  uint8_t*  buffer;
  int32_t   readIdx;
  int32_t   readBase;
  int32_t   writeIdx;
  int32_t   writeBase;
  uint32_t  capacity;
  uint32_t  reserved;
} RBuffer_t;  // half of cache-line

I currently make the size of this structure as half of the cache line. I don't know if making it a full-length cache line (64Bytes) is necessary.

I am wondering if there are any performance differences between this definition and the full-length definition, which one is better?

Is it deserved to reserve more spaces to align it with the cache line?

Thanks. P


Solution

  • Reserving more spaces to make it a full cache line probably won’t make a performance difference.

    More importantly, you should align this structure to a cache line. If the structure spans multiple cache lines, it may waste cache memory. (Though, whether this leads to significant performance degradation is questionable.) Additionally, depending on the processor architecture, spanning multiple cache lines could lead to increased cache misses or excessive memory bandwidth usage during writebacks.

    In any case, if your performance requirements are extremely strict, it might be worth testing and comparing to see if there is any actual performance difference.