I wrote an example of BoxFilter using NPP, but the output image looks broken. This is my code:
#include <stdio.h>
#include <string.h>
#include <ImagesCPU.h>
#include <ImagesNPP.h>
#include <Exceptions.h>
#include <npp.h>
#include "utils.h"
void boxfilter1_transform( Npp8u *data, int width, int height ){
size_t size = width * height * 4;
// declare a host image object for an 8-bit RGBA image
npp::ImageCPU_8u_C4 oHostSrc(width, height);
Npp8u *nDstData = oHostSrc.data();
memcpy(nDstData, data, size * sizeof(Npp8u));
// declare a device image and copy construct from the host image,
// i.e. upload host to device
npp::ImageNPP_8u_C4 oDeviceSrc(oHostSrc);
// create struct with box-filter mask size
NppiSize oMaskSize = {3, 3};
// Allocate memory for pKernel
Npp32s hostKernel[9] = {1, 1, 1, 1, 1, 1, 1, 1, 1};
Npp32s *pKernel;
checkCudaErrors( cudaMalloc((void**)&pKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s)) );
checkCudaErrors( cudaMemcpy(pKernel, hostKernel, oMaskSize.width * oMaskSize.height * sizeof(Npp32s),
cudaMemcpyHostToDevice) );
Npp32s nDivisor = 9;
// create struct with ROI size given the current mask
NppiSize oSizeROI = {oDeviceSrc.width() - oMaskSize.width + 1, oDeviceSrc.height() - oMaskSize.height + 1};
// allocate device image of appropriatedly reduced size
npp::ImageNPP_8u_C4 oDeviceDst(oSizeROI.width, oSizeROI.height);
// set anchor point inside the mask
NppiPoint oAnchor = {2, 2};
// run box filter
NppStatus eStatusNPP;
eStatusNPP = nppiFilter_8u_C4R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, pKernel, oMaskSize, oAnchor, nDivisor);
//printf("NppiFilter error status %d\n", eStatusNPP);
NPP_DEBUG_ASSERT(NPP_NO_ERROR == eStatusNPP);
// declare a host image for the result
npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
// and copy the device result data into it
oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
memcpy(data, oHostDst.data(), size * sizeof(Npp8u));
return;
}
Most part of code was copied from example boxFilterNPP.cpp. And the output image: http://img153.imageshack.us/img153/7716/o8z.png
Why it can be?
You have a striding problem. Change this line:
npp::ImageCPU_8u_C4 oHostDst(oDeviceDst.size());
To this:
npp::ImageCPU_8u_C4 oHostDst(oDeviceSrc.size());
What is happening?
Let's assume your input image is 600x450.
oHostSrc
is 600 x 450, and the pitch is 600x4 = 2400.memcpy
from data
to oHostSrc
is ok because they have the same width and pitch.oDeviceSrc
picks up the size from oHostSrcc
(600x450)oDeviceDst
is slightly smaller than oDeviceSrc
, because it only picks up the size of the ROI, so it is something like 596x446.oHostDst
to be the same size as oDeviceDst
, so about 596x446..copyTo
operation copies the oDeviceDst (pitched) 596x446 image to (unpitched) oHostDst
, also 596x446.memcpy
breaks the image, because it is copying a 596x446 oHostDst
image to a 600x450 data
region.The solution is to create oHostDst
at 600x450 and let the .copyTo
operation handle the difference in line sizes and pitches.
The original code didn't have this problem because there were no unpitched copies anywhere in that code (e.g. no use of raw memcpy
). As long as you handle the source and destination pitch and width explicitly at every copy step, it does not matter whether you create the final image as 600x450 or 596x446. But your final memcpy
operation was not handling pitch and width explicitly, instead it implicitly assumed both source and destination were of the same size, and this was not the case.