I've been looking for an optimized (i.e., quick) algorithm that converts a 24-bit RGB bitmap to a 16-bit (RGB565) bitmap using dithering. I'm looking for something in C/C++ where I can actually control how the dithering is applied. GDI+ seems to provide some methods, but I can't tell if they dither or not. And, if they do dither, what mechanism are they using (Floyd-Steinberg?)
Does anyone have a good example of bitmap color-depth conversion with dithering?
As you mentioned, the Floyd-Steinberg dithering method is popular because it's simple and fast. For the subtle differences between 24-bit and 16-bit color the results will be nearly optimal visually.
It was suggested that I use the sample picture Lena but I decided against it; despite its long history as a test image I consider it too sexist for modern sensibilities. Instead I present a picture of my own. First up is the original, followed by the conversion to dithered RGB565 (and converted back to 24-bit for display).
And the code, in C++:
inline BYTE Clamp(int n)
{
n = n>255 ? 255 : n;
return n<0 ? 0 : n;
}
struct RGBTriplet
{
int r;
int g;
int b;
RGBTriplet(int _r = 0, int _g = 0, int _b = 0) : r(_r), g(_g), b(_b) {};
};
void RGB565Dithered(const BYTE * pIn, int width, int height, int strideIn, BYTE * pOut, int strideOut)
{
std::vector<RGBTriplet> oldErrors(width + 2);
for (int y = 0; y < height; ++y)
{
std::vector<RGBTriplet> newErrors(width + 2);
RGBTriplet errorAhead;
for (int x = 0; x < width; ++x)
{
int b = (int)(unsigned int)pIn[3*x] + (errorAhead.b + oldErrors[x+1].b) / 16;
int g = (int)(unsigned int)pIn[3*x + 1] + (errorAhead.g + oldErrors[x+1].g) / 16;
int r = (int)(unsigned int)pIn[3*x + 2] + (errorAhead.r + oldErrors[x+1].r) / 16;
int bAfter = Clamp(b) >> 3;
int gAfter = Clamp(g) >> 2;
int rAfter = Clamp(r) >> 3;
int pixel16 = (rAfter << 11) | (gAfter << 5) | bAfter;
pOut[2*x] = (BYTE) pixel16;
pOut[2*x + 1] = (BYTE) (pixel16 >> 8);
int error = r - ((rAfter * 255) / 31);
errorAhead.r = error * 7;
newErrors[x].r += error * 3;
newErrors[x+1].r += error * 5;
newErrors[x+2].r = error * 1;
error = g - ((gAfter * 255) / 63);
errorAhead.g = error * 7;
newErrors[x].g += error * 3;
newErrors[x+1].g += error * 5;
newErrors[x+2].g = error * 1;
error = b - ((bAfter * 255) / 31);
errorAhead.b = error * 7;
newErrors[x].b += error * 3;
newErrors[x+1].b += error * 5;
newErrors[x+2].b = error * 1;
}
pIn += strideIn;
pOut += strideOut;
oldErrors.swap(newErrors);
}
}
I won't guarantee this code is perfect, I already had to fix one of those subtle errors that I alluded to in another comment. However it did generate the results above. It takes 24-bit pixels in BGR order as used by Windows, and produces R5G6B5 16-bit pixels in little endian order.