performancebitmapsystem.drawingcolormapsystem.drawing.imaging

C# Fastest way to replace bunch of colors in high quality Image?


I have some 4k images (.png, 3840p*2160p) in which I want to replace around 2500 colors each.

I have an Color[] Array of the 2500 current Colors and one with 2500 new Colors that are always different.

Currently I have an System.Drawing.Imaging.ColorMap Array in which I push 2500 single ColorMap objects with their respective OldColor & NewColor properties set to each value from my other arrays.

Then I can do the following:

[...]
ImageAttributes imageAttributes = new ImageAttributes();
imageAttributes.SetRemapTable(colorMapArray);
Rectangle rectangle = new Rectangle(0, 0, myImage.Width, myImage.Height);
Graphics graphics = Graphics.FromImage(myImage);
graphics.DrawImage(myImage, rectangle, 0, 0, myImage.Width, myImage.Height, GraphicsUnit.Pixel, imageAttributes);
myImage.Save("myImageWithReplacedColors.png");

This technically works and replaces all of my colors with their new values but its very very slow. (takes up to multiple seconds)

What options do I have to speed up this process? ..Or other ways to get the same result? I've been looking for a few days but haven't really found anything

If that helps, the 4k image and the Array with the Current Colors that should be replaced are always the same. So I could save them special, for example as a byte array(?)


Solution

  • You can first extract the pixels of the image based on this post. Then, you can convert the Color object to a simple 32-bit integer using ((int)color.R << 16) | ((int)color.G << 8) | color.B. Then you can build a Dictionary<int, int> to map the dictionary values. You can then convert the bytes to int pixels, use the dictionary to convert the value and convert the integer back to some bytes for each pixel of the image. This process can be parallelized using multiple threads because Dictionary is thread-safe (as long as you only read it from the threads).

    Alternatively, you can use a basic int[256*256*256] lookup table instead of the Dictionary<int, int> class. The former can be much faster if it fits in the CPU cache but most modern processors does not have a so big L3 cache yet (ie. 16 MiB). It can be faster because only few cache lines (up to 2500) are loaded in the cache and they are probably well distributed in memory (see cache associativity).

    An modern mainstream processor (eg. x86-64 ones) should be able to do that in a fraction of a second: typically between 0.01 and 0.3 second for a mainstream PC.