opencv image-processing signal-processing fftw dct

Differences between OpenCV and FFTW 2D DCT

I am trying to figure out how to match the output from FTTW to OpenCV's 2D DCT. It looks like the output is scaled in some way related to square root of the number of rows and columns, if the rows or columns are multiples of each other I can figure out some rule to scale the output but if this isn't the case I cannot figure out a general case :

For example, this matrix :

0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12

OpenCV output :

[41.56921938165306, -11.77350269189626, -9.906343049429816e-16, -1.154700538379252, 
2.654396620040385e-16, -0.2264973081037412;
 -15.78040416381215, 0, 0, 0, 0, 0;
 6.797823324068903e-16, 0, 0, 0, 0, 0;
 -1.649620627042291, 0, 0, 0, 0, 0;
 0, 0, 0, 0, 0, 0;
 -0.4921096019966774, 0, 0, 0, 0, 0;
 -1.64113972635833e-15, 0, 0, 0, 0, 0;
 -0.1241948195350329, 0, 0, 0, 0, 0]

FFTW output :

1152 -230.713 0 -22.6274 0 -4.43842
-309.232 0 0 0 0 0
0 0 0 0 0 0
-32.3258 0 0 0 0 0
0 0 0 0 0 0
-9.64334 0 0 0 0 0
0 0 0 0 0 0
-2.43371 0 0 0 0 0

Does anyone know to scale this output so that it always matches OpenCV's output no matter the size of the matrix ?

Example code of how I do this :

#include <opencv2/core/ocl.hpp>
#include <fftw3.h>
#include <iostream>

void compute2Ddct(const double* input, double* output, int rows, int cols) {
    fftw_plan plan = fftw_plan_r2r_2d(rows, cols, const_cast<double*>(input), output, FFTW_REDFT10, FFTW_REDFT10, FFTW_ESTIMATE);
    fftw_execute(plan);
    fftw_destroy_plan(plan);
}

int main() {

const int rows = 8;
const int cols = 6;

double image[rows * cols];

for (int i = 0; i < rows; ++i) {
    for (int j = 0; j < cols; ++j) {
        image[i * cols + j] = i + j;
    }
}

cv::Mat opencv_dct(rows, cols, CV_64F, image);
opencv_dct = opencv_dct.clone();
cv::dct(opencv_dct, opencv_dct, 0);

std::cout << opencv_dct << std::endl;

double dct_output[rows * cols];

compute2Ddct(image, dct_output, rows, cols);

std::cout << std::endl;

for (int i = 0; i < rows; ++i) {
    for (int j = 0; j < cols; ++j) {
        std::cout << dct_output[i * cols + j] << " ";
    }
    std::cout << std::endl;
}

    return 0;
}

Solution

It looks like FFTW scales each 1D transform by 2 (see docs). The 2D transform is thus scaled by 4.

And OpenCV scales each 1D transform by sqrt(2/N), except the first element, which is scaled by sqrt(1/N) (see docs). The 2D transform is thus scaled by sqrt(1/N)*sqrt(1/M), sqrt(2/N)*sqrt(1/M), and sqrt(2/N)*sqrt(2/M).

So, to transform the FFTW results into the OpenCV results, we need to scale by (for a NxM transform):

The top-left element by 1/(4*sqrt(N*M)).
The remainder of elements along the top and left edges by 1/(4*sqrt(N*M/2)).
The remainder of the elements by 1/(2*sqrt(N*M)).

The simplest implementation, I think, would be to scaled everything by 1/(2*sqrt(N*M)), then scale the top row by an additional 1/sqrt(2), and then the left column by an additional 1/sqrt(2). (The top-left element would be scaled 3 times!)