What is the correct solution to work with bitmaps in terms of pointer arithmetic and conversion?

When working with bitmaps, it nearly always mandatory that every row of the image begins at a 4-byte boundary. This can lead to padding, in cases where the bitdepth of the image is less than 32 bytes.

Example: An 24 bit RGB image has 3 bytes per pixel and if the width of the image is 1026 pixels, then one line contains 1026 * 3 bytes = 3078 bytes. 3078 is not divisible by 4, so we need to add 2 bytes for padding to form a full scanline.

The space in bytes needed for one row is often called pitch.

This means I cannot simply use a formula like y * width + x to access the pixel at (x, y), instead I must do some sort of pointer arithmetic.

Let's say I have a struct to access one pixel and a struct that contains the image value:

struct PixelRGB24
{
  unsigned char r, g, b;
};

struct ImageRGB24
{
  unsigned int width;
  unsigned int heigth;
  unsigned int pitch;
  void *pixels;
};

Which code is conform with the C++ standard (no undefined behaviour)? (omitted boundary checks for simplicity)

Weird pointer arithmetic

PixelRGB24 GetPixel(unsigned int x, unsigned int y, ImageRGB24 *image)
{
  PixelRGB24 *pixel = reinterpret_cast<PixelRGB24 *>(reinterpret_cast<uintptr_t>(image->pixels) + y * image->pitch + 3 * x);

  return *pixel;
}

Casting chain void * to char * to void * to PixelRGB24

PixelRGB24 GetPixel(unsigned int x, unsigned int y, ImageRGB24 *image)
{
  char *row_offset = static_cast<char *>(image->pixels) + y * image->pitch;
  void *pixel_offset_raw = static_cast<void *>(pixel_offset);
  PixelRGB24 *pixel = static_cast<PixelRGB24 *>(pixel_offset_raw) + x;
  return *pixel;
}

Same as 2., but with reinterpret_cast to avoid void * conversion step

PixelRGB24 GetPixel(unsigned int x, unsigned int y, ImageRGB24 *image)
{
  char *row_offset = static_cast<char *>(image->pixels) + y * image->pitch;
  PixelRGB24 *pixel = reinterpret_cast<PixelRGB24 *>(pixel_offset_raw) + x;
  return *pixel;
}

Or do all three implementations cause undefined behaviour and if so, what is the correct solution?

Solution

The question about pointer arithmetic is covered by Can std::uintptr_t be used to avoid undefined behavior of out-of-bounds pointer arithmetic? (answer: no, don't use uintptr_t to do pointer arithmetic), so I'll treat this question as asking only about the conversion.

A reinterpret_cast to PixelRGB24* followed by a dereference is valid if the data was constructed as an array of PixelRGB24 objects. Otherwise, it is invalid (but tends to work, if portability is not a concern). Given that there is assumed padding between some objects, this would require an advanced construction technique to pull off (pixels would be an array of bytes that "provide storage" for PixelRGB24 objects). Since this advanced technique was not mentioned, and since the code uses void* (code smell), I am going to infer that we are not in that situation.

Before we proceed further, I would like to impose some corrections to the types of fields in the structs. The components of a pixel are not characters; they are 8-bit numbers. Also, the image data is not some unknown type; it is a sequence of bytes that will be interpreted as pixel data. Fixing the type of pixels has an advantage in reducing the number of casts needed later. It is valid regardless of how the pixel data is constructed because three "byte pointers" – char*, unsigned char*, and std::byte* – are special in that they can be used to examine the data of any object. That is, for these types, unlike for PixelRGB24*, it is valid to reinterpret_cast to them and dereference.

#include <cstddef>
#include <cstdint>

struct PixelRGB24
{
    uint8_t r, g, b; // <-- semantically numbers, not characters
};

struct ImageRGB24
{
    unsigned int width;
    unsigned int heigth;
    unsigned int pitch;
    std::byte *pixels; // <-- raw data, not unknown/void
};

The good news is that PixelRGB24 (either version) is trivially copyable, so its data can be simply copied using std::memcpy. The thing to be concerned with for this approach is whether or not the compiler added padding to PixelRGB24. Padding is problematic for both this approach and for the code in the question, so I'll just add a static_assert to detect padding (maybe it should be in the struct definition instead). On the bright side, padding in this particular struct seems unlikely in most (all?) of the common programming environments.

#include <cstring>

PixelRGB24 GetPixel(unsigned int x, unsigned int y, ImageRGB24 *image)
{
    // Ensure that there is no padding
    PixelRGB24 retval;
    static_assert(sizeof(retval) == 3);

    // Calculate the start of the pixel (pointer arithmetic)
    auto *pixel = image->pixels + y*image->pitch + x*sizeof(retval);
    // Extract the pixel data
    std::memcpy(&retval, pixel, sizeof(retval));
    // Note: Named return value optimization avoids a second copy here. In essence,
    // the above `memcpy` makes explicit the copy that would occur when your
    // versions of the function `return *pixel;`.
    return retval;
}

The above is a common way to extract a trivially copyable object from an array of bytes. It is useful if your example is just an example and you really need to handle different pixel types, perhaps by turning the pixel type into a template parameter. However, if generality is not needed, there is another approach available in your particular situation.

When the object being extracted can be constructed from bytes, the function goes down to three lines. A caveat is that the bytes used in the construction must be one of the special types, so that dereferencing after a reinterpret_cast is valid. A benefit of this approach is that padding is no longer a concern.

PixelRGB24 GetPixel(unsigned int x, unsigned int y, ImageRGB24 *image)
{
    auto *pixel = image->pixels + y*image->pitch + x*3;
    // `std::byte cannot initialize `uint8_t`, but `unsigned char` can.
    auto *components = reinterpret_cast<unsigned char*>(pixel);
    // Note that the "magic number" 3 in the first line is explained by the
    // three values used in the line below.
    return {*components, *(components+1), *(components+2)};
}

Another benefit of this approach is that it can be used to adapt image data to pixel data when the two do not agree on the order of the components, such as converting BGR image data to RGB pixel data (reverse the components in the return statement).

Advanced:

I think this goes beyond the scope of the question, but I'll point out that it is possible to combine the above techniques to create a function template that will get a pixel regardless of the order of the components and regardless of the size of the components. An integer sequence could be used to specify the order of the components, while memcpy could be used to extract components that are larger than a byte from the image's data. How far you take this depends on your needs (flexibility vs. over-engineering) and (if significant) on how much your compiler can optimize. Demo