I am writing a software which does a lot of image operations / compositing on a lot of (potential big) images.
Multi threading helps a lot speed wise, but QT does not allow using multiple QPainter on the same image at the same time.
So i would have to do my image operation / compositing in each thread on a copy and then blit it back, which reduces the performance by a lot (depending on the use case obviously).
So i came up with an idea which seem to work, but feels extremely hacky.
I get the target image data (QImage::bits
) pointer, and provide this to the worker thread.
In the worker thread i recreate a new QImage from the provided pointer. This means, no copy, no blitting. It seem to work fine, as long as i make sure each pixel/tile is only worked on in one thread and i don't detach the target image.
My question is: Is this safe and are there any other issues that could arise from this approach ?
Example code
QImage source = ...;
QImage target = ...;
QPainter::CompositionMode compositionMode = QPainter::CompositionMode_SourceOver;
// calculate tiles
QList<QRect> tiles;
for(int y = rect.top(); y < rect.top() + rect.height(); y += tileSize){
for(int x = rect.left(); x < rect.left() + rect.width(); x += tileSize){
QRect tile(
x, y,
x + tileSize > rect.left() + rect.width() ? rect.left() + rect.width() - x : tileSize,
y + tileSize > rect.top() + rect.height() ? rect.top() + rect.height() - y : tileSize
);
tiles.append(tile);
}
}
// Get target pixel pointer and do threaded operation on each tile
uchar *targetPix = target.bits();
auto target_size = target.size();
auto targetFormat = target.format();
QList<int> lol = QtConcurrent::blockingMapped(tiles, [&target_size, &targetFormat, &source, targetPix, &compositionMode](const QRect &r){
QImage tile_target(targetPix, target_size.width(), target_size.height(), targetFormat);
QPainter p(&tile_target);
p.setCompositionMode(compositionMode);
// do you image operations here. For now we just do a simple draw
p.drawImage(r.topLeft(), source, r);
return 1; // In reallity this would return sensible data ;)
});
(This example increased the speed in my test by about 4.6 times btw. Depends on the operation and system of course.)
This is indeed tricky (but that's often needed when you want bleeding edge performance), but it should be (possible to make it) thread safe (for certain operations). Off course this depends on the operations you perform on the tile_target
.
It's up to you that you don't even access the bits outside the assigned tile (i.e. the portion of tile_target
outside the rect r
).
As tile_target
refers the whole image, it's up to you to ensure you don't access bits outside this target tile. Some problematic cases:
Possible solution?: One option to allow accessing and/or writing to neighboring bits of a tile, is to divide your image in stripes and process your image in two steps:
This procedures allows you to modify half of the neighboring stripes (useful for anti-aliasing) or (if nobody writes) to access all the bits of the next and/or previous stripe (useful for filtering purposes).
This shouldn't decrease the efficiency significantly, if you create enough stripes to keep all your cpu(s) busy (i.e. typically twice the number of threads your cpu(s) support).
That shouldn't be an issue with your current implementation. QImage::bits
already detaches (if needed) the image (target
) from any other possibly existing copy. As you perform the concurrent operation by blocking the calling thread. The original image (target
) will exist at least as long as the tile_target
images exist.
Use a library that is dedicated to multi-threaded image processing, or at least allow referencing a subimage.
Pass a copy of the tile (see QImage::copy
) to each thread and write the result back in the original image (either using mutexes or by performing this operation in the calling thread). Depending on the calculation insensitivity of the image operation, this extra copy may or may not be negligible. For the OP, this doesn't seem a viable option.
Note that those safer approaches may generate (slightly) different results from the single threaded result in case of anti-aliasing or filtering. Those artifacts may be minimized by taking the tiles as large as possible (i.e. create not more tiles than the number of threads your cpu supports)
Image processing is often much faster when using your GPU, especially for things like filtering. But that's not something QImage
supports out of the box.