c++cbitmapgdi+dithering

What is a good, optimized C/C++ algorithm for converting a 24-bit bitmap to 16-bit with dithering?


I've been looking for an optimized (i.e., quick) algorithm that converts a 24-bit RGB bitmap to a 16-bit (RGB565) bitmap using dithering. I'm looking for something in C/C++ where I can actually control how the dithering is applied. GDI+ seems to provide some methods, but I can't tell if they dither or not. And, if they do dither, what mechanism are they using (Floyd-Steinberg?)

Does anyone have a good example of bitmap color-depth conversion with dithering?


Solution

  • As you mentioned, the Floyd-Steinberg dithering method is popular because it's simple and fast. For the subtle differences between 24-bit and 16-bit color the results will be nearly optimal visually.

    It was suggested that I use the sample picture Lena but I decided against it; despite its long history as a test image I consider it too sexist for modern sensibilities. Instead I present a picture of my own. First up is the original, followed by the conversion to dithered RGB565 (and converted back to 24-bit for display).

    Original Floyd-Steinberg Dithered RGB565

    And the code, in C++:

    inline BYTE Clamp(int n)
    {
        n = n>255 ? 255 : n;
        return n<0 ? 0 : n;
    }
    
    struct RGBTriplet
    {
        int r;
        int g;
        int b;
        RGBTriplet(int _r = 0, int _g = 0, int _b = 0) : r(_r), g(_g), b(_b) {};
    };
    
    void RGB565Dithered(const BYTE * pIn, int width, int height, int strideIn, BYTE * pOut, int strideOut)
    {
        std::vector<RGBTriplet> oldErrors(width + 2);
        for (int y = 0;  y < height;  ++y)
        {
            std::vector<RGBTriplet> newErrors(width + 2);
            RGBTriplet errorAhead;
            for (int x = 0;  x < width;  ++x)
            {
                int b = (int)(unsigned int)pIn[3*x] + (errorAhead.b + oldErrors[x+1].b) / 16;
                int g = (int)(unsigned int)pIn[3*x + 1] + (errorAhead.g + oldErrors[x+1].g) / 16;
                int r = (int)(unsigned int)pIn[3*x + 2] + (errorAhead.r + oldErrors[x+1].r) / 16;
                int bAfter = Clamp(b) >> 3;
                int gAfter = Clamp(g) >> 2;
                int rAfter = Clamp(r) >> 3;
                int pixel16 = (rAfter << 11) | (gAfter << 5) | bAfter;
                pOut[2*x] = (BYTE) pixel16;
                pOut[2*x + 1] = (BYTE) (pixel16 >> 8);
                int error = r - ((rAfter * 255) / 31);
                errorAhead.r = error * 7;
                newErrors[x].r += error * 3;
                newErrors[x+1].r += error * 5;
                newErrors[x+2].r = error * 1;
                error = g - ((gAfter * 255) / 63);
                errorAhead.g = error * 7;
                newErrors[x].g += error * 3;
                newErrors[x+1].g += error * 5;
                newErrors[x+2].g = error * 1;
                error = b - ((bAfter * 255) / 31);
                errorAhead.b = error * 7;
                newErrors[x].b += error * 3;
                newErrors[x+1].b += error * 5;
                newErrors[x+2].b = error * 1;
            }
            pIn += strideIn;
            pOut += strideOut;
            oldErrors.swap(newErrors);
        }
    }
    

    I won't guarantee this code is perfect, I already had to fix one of those subtle errors that I alluded to in another comment. However it did generate the results above. It takes 24-bit pixels in BGR order as used by Windows, and produces R5G6B5 16-bit pixels in little endian order.