c++loopsc++11optimizationsfml

How to optimise large loops for debug mode


I have implemented a pixel mask class used for checking for perfect collision. I am using SFML so the implementation is fairly straight forward:

Loop through each pixel of the image and decide whether its true or false based on its transparency value. Here is the code I have used:

// Create an Image from the given texture
    sf::Image image(texture.copyToImage());

    // measure the time this function takes
    sf::Clock clock;
    sf::Time time = sf::Time::Zero;
    clock.restart();

    // Reserve memory for the pixelMask vector to avoid repeating allocation
    pixelMask.reserve(image.getSize().x);

    // Loop through every pixel of the texture
    for (unsigned int i = 0; i < image.getSize().x; i++)
    {
        // Create the mask for one line
        std::vector<bool> tempMask;
        // Reserve memory for the pixelMask vector to avoid repeating allocation
        tempMask.reserve(image.getSize().y);

        for (unsigned int j = 0; j < image.getSize().y; j++)
        {
            // If the pixel is not transparent
            if (image.getPixel(i, j).a > 0)
                // Some part of the texture is there --> push back true
                tempMask.push_back(true);
            else
                // The user can't see this part of the texture --> push back false
                tempMask.push_back(false);
        }
        pixelMask.push_back(tempMask);
    }

    time = clock.restart();
    std::cout << std::endl << "The creation of the pixel mask took: " << time.asMicroseconds() << " microseconds (" << time.asSeconds() << ")";

I have used the an instance of the sf::Clock to meassure time.

My problem is that this function takes ages (e.g. 15 seconds) for larger images(e.g. 1280x720). Interestingly, only in debug mode. When compiling the release version the same texture/image only takes 0.1 seconds or less.

I have tried to reduce memory allocations by using the resize() method but it didn't change much. I know that looping through almost 1 million pixels is slow but it should not be 15 seconds slow should it?

Since I want to test my code in debug mode (for obvious reasons) and I don't want to wait 5 min till all the pixel masks have been created, what I am looking for is basically a way to:

Thanks for your help!


Solution

  • Optimizing For Debug

    Optimizing for debug builds is generally a very counter-productive idea. It could even have you optimize for debug in a way that not only makes maintaining code more difficult, but may even slow down release builds. Debug builds in general are going to be much slower to run. Even with the flattest kind of C code I write which doesn't pose much for an optimizer to do beyond reasonable register allocation and instruction selection, it's normal for the debug build to take 20 times longer to finish an operation. That's just something to accept rather than change too much.

    That said, I can understand the temptation to do so at times. Sometimes you want to debug a certain part of code only for the other operations in the software to takes ages, requiring you to wait a long time before you can even get to the code you are interested in tracing through. I find in those cases that it's helpful, if you can, to separate debug mode input sizes from release mode (ex: having the debug mode only work with an input that is 1/10th of the original size). That does cause discrepancies between release and debug as a negative, but the positives sometimes outweigh the negatives from a productivity standpoint. Another strategy is to build parts of your code in release and just debug the parts you're interested in, like building a plugin in debug against a host application in release.

    Approach at Your Own Peril

    With that aside, if you really want to make your debug builds run faster and accept all the risks associated, then the main way is to just pose less work for your compiler to optimize away. That's going to be flatter code typically with more plain old data types, less function calls, and so forth.

    First and foremost, you might be spending a lot of time on debug mode assertions for safety. See things like checked iterators and how to disable them: https://msdn.microsoft.com/en-us/library/aa985965.aspx

    For your case, you can easily flatten your nested loop into a single loop. There's no need to create these pixel masks with separate containers per scanline, since you can always get at your scanline data with some basic arithmetic (y*image_width or y*image_stride). So initially I'd flatten the loop. That might even help modestly for release mode. I don't know the SFML API so I'll illustrate with pseudocode.

    const int num_pixels = image.w * image.h;
    vector<bool> pixelMask(num_pixels);
    for (int j=0; j < num_pixels; ++j)
        pixelMask[j] = image.pixelAlpha(j) > 0;
    

    Just that already might help a lot. Hopefully SFML lets you access pixels with a single index without having to specify column and row (x and y). If you want to go even further, it might help to grab the pointer to the array of pixels from SFML (also hopefully possible) and use that:

    vector<bool> pixelMask(image.w * image.h);
    
    const unsigned int* pixels = image.getPixels();
    for (int j=0; j < num_pixels; ++j)
    {
        // Assuming 32-bit pixels (should probably use uint32_t).
        // Note that no right shift is necessary when you just want 
        // to check for non-zero values.
        const unsigned int alpha = pixels[j] & 0xff000000;
        pixelMask[j] = alpha > 0;
    }
    

    Also vector<bool> stores each boolean as a single bit. That saves memory but translates to some more instructions for random-access. Sometimes you can get a speed up even in release by just using more memory. I'd test both release and debug and time carefully, but you can try this:

    vector<char> pixelMask(image.w * image.h);
    
    const unsigned int* pixels = image.getPixels();
    char* pixelUsed = &pixelMask[0];
    for (int j=0; j < num_pixels; ++j)
    {
        const unsigned int alpha = pixels[j] & 0xff000000;
        pixelUsed[j] = alpha > 0;
    }