cudanvidianpp

nppiFilter breaks output image


I wrote an example of BoxFilter using NPP, but the output image looks broken. This is my code:

#include <stdio.h>
#include <string.h>

#include <ImagesCPU.h>
#include <ImagesNPP.h>
#include <Exceptions.h>

#include <npp.h>
#include "utils.h"


void boxfilter1_transform( Npp8u *data, int width, int height ){
    size_t size = width * height * 4;

    // declare a host image object for an 8-bit RGBA image
    npp::ImageCPU_8u_C4 oHostSrc(width, height);

    Npp8u *nDstData = oHostSrc.data();
    memcpy(nDstData, data, size * sizeof(Npp8u));

    // declare a device image and copy construct from the host image,
    // i.e. upload host to device
    npp::ImageNPP_8u_C4 oDeviceSrc(oHostSrc);

    // create struct with box-filter mask size
    NppiSize oMaskSize = {3, 3};

    // Allocate memory for pKernel
    Npp32s hostKernel[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
    Npp32s *pKernel;

    checkCudaErrors( cudaMalloc((void**)&pKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s)) );
    checkCudaErrors( cudaMemcpy(pKernel, hostKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s),
                                cudaMemcpyHostToDevice) );

    Npp32s nDivisor = 9;

    // create struct with ROI size given the current mask
    NppiSize oSizeROI = {oDeviceSrc.width() - oMaskSize.width + 1, oDeviceSrc.height() - oMaskSize.height + 1};
    // allocate device image of appropriatedly reduced size
    npp::ImageNPP_8u_C4 oDeviceDst(oSizeROI.width, oSizeROI.height);
    // set anchor point inside the mask
    NppiPoint oAnchor = {2, 2};

    // run box filter
    NppStatus eStatusNPP;
    eStatusNPP = nppiFilter_8u_C4R(oDeviceSrc.data(), oDeviceSrc.pitch(),
                                   oDeviceDst.data(), oDeviceDst.pitch(),
                                   oSizeROI, pKernel, oMaskSize, oAnchor, nDivisor);
    //printf("NppiFilter error status %d\n", eStatusNPP);
    NPP_DEBUG_ASSERT(NPP_NO_ERROR == eStatusNPP);

    // declare a host image for the result
    npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
    // and copy the device result data into it
    oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
    memcpy(data, oHostDst.data(), size * sizeof(Npp8u));

    return;
}

Most part of code was copied from example boxFilterNPP.cpp. And the output image: http://img153.imageshack.us/img153/7716/o8z.png

Why it can be?


Solution

  • You have a striding problem. Change this line:

    npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
    

    To this:

    npp::ImageCPU_8u_C4 oHostDst(oDeviceSrc.size());
    

    What is happening?

    Let's assume your input image is 600x450.

    The solution is to create oHostDst at 600x450 and let the .copyTo operation handle the difference in line sizes and pitches.

    The original code didn't have this problem because there were no unpitched copies anywhere in that code (e.g. no use of raw memcpy). As long as you handle the source and destination pitch and width explicitly at every copy step, it does not matter whether you create the final image as 600x450 or 596x446. But your final memcpy operation was not handling pitch and width explicitly, instead it implicitly assumed both source and destination were of the same size, and this was not the case.