c++opencv

Difference in OpenCV (C++) saturation behavior between single-channel and multi-channel matrices during matrix operations


For a single channel matrix m, an integer c and a cv::Scalar m_avg, the operation

m = c*(m - m_avg)

results in saturation being applied to m only after the right-hand-side has been evaluated. For a multi-channel image, saturation is applied to c*m before subtracting c*m_avg.

This is on Windows 10 using MSVC2019 and C++17. OpenCV is 4.9.0, from the official sources (https://opencv.org/releases/), and also compiled with MSVC2019.

My use-case is a simple contrast operation. A value close to the average of an image (m_avg) is subtracted from the image, and a contrast factor (c) is applied to this difference. The full equation for my use-case is

m = c*(m - m_avg) + m_avg

, but the final addition of m_avg isn't needed to demonstrate the issue. For the three-channel case, I use the same value to subtract from each channel, but a similar algorithm could use the average of each individual channel to construct the 3-channel m_avg scalar.

Below demonstrates the issue. m1 is a single-channel matrix, m3 is a 3-channel matrix:

#include <iostream>
#include <opencv2/core/core.hpp>

int main()
{
  std::cout << "c m1 m3" << std::endl;
  for (int c = 1; c <= 9; c++){
    cv::Mat m1 = cv::Mat::zeros(1,1, CV_8UC1);
    cv::Mat m3 = cv::Mat::zeros(1,1, CV_8UC3);
    m1 += cv::Scalar(100);
    m3 += cv::Scalar(100,100,100);
    cv::Scalar sub_m1 = cv::Scalar(70);
    cv::Scalar sub_m3 = cv::Scalar(70,70,70);
    m1 = c*(m1 - sub_m1);
    m3 = c*(m3 - sub_m3);
    int m1_val = m1.at<uchar>(0,0);
    int m3_val = m3.at<cv::Vec3b>(0,0)[0];
    std::cout << c << " " << m1_val << " "  << m3_val << std::endl;
  }

  return 0;
}

Output:

c m1 m3
1 30 30
2 60 60
3 90 45 // For multi-channel 3*100 gets capped at 255. After subtraction of 3*70, result is 45
4 120 0 // For multi-channel 4*100 gets capped at 255. After subtraction of 4*70, result is 0
5 150 0
6 180 0
7 210 0
8 240 0
9 255 0 // The single-channel result is only capped once c*(30) is greater than 255

I don't have a problem if this is intentional and documented (I've failed to find anything documenting such behavior). But I want to take advantage of the single-channel behavior by splitting a 3-channel matrix into 3 single-channel matrices, performing the operation and then merging them back. This has proved faster than converting to a signed matrix type in order to perform these operations. But I don't feel comfortable doing the split/math/merge method if this is accidental.

Edit: I've opened an issue for this in the OpenCV tracker (https://github.com/opencv/opencv/issues/26138)


Solution

  • This is definitely weird and IMHO unexpected behaviour. I'm not sure if it's intentional (TODO: dig further through MatExpr history). However, with some research, it's explainable.

    The main culprit here is the Scalar you're subtracting. It makes a difference whether it's only "single channel" (first value set to non-zero -- e.g. your sub_m1), or "multi-channel" (values other than the first are non-zero -- e.g. your sub_m3).


    Analysis

    Let's assume we have (like in your example) some Mat m, a Scalar sub_m and an int c and the following statement:

    m = c * (m - sub_m);
    

    The question is, what does that statement mean in terms of OpenCV code? What does it do underneath the surface? Answering that will involve OpenCV's matrix expressions, both MatExpr class, and the free-standing operator overloads, so keep the reference handy. We will need to dig through the OpenCV source code a little as well, but let's start from the top and drill down to details as we progress.

    1. m - sub_m is evaluated, using MatExpr operator- (const Mat& a, const Scalar& s). It produces an intermediate MatExpr, let's call it p.

    2. c * p is evaluated, using MatExpr operator* (double s, const MatExpr& e). It produces another intermediate MatExpr, let's call it q.

    3. m = q is evaluated using Mat& cv::Mat::operator= (const MatExpr& expr).


    OK, great, but that still doesn't tell us much, let's dig deeper into the implementation. (Note: I removed some bits of code that are not relevant to the explanation.)

    First the subtract operator that creates the first intermediate MatExpr p:

    MatExpr operator - (const Mat& a, const Scalar& s)
    {
        MatExpr e;
        MatOp_AddEx::makeExpr(e, a, Mat(), 1, 0, -s);
        return e;
    }
    

    Which just calls void MatOp_AddEx::makeExpr(...) to populate the MatExpr and return it.

    inline void MatOp_AddEx::makeExpr(MatExpr& res, const Mat& a, const Mat& b, double alpha, double beta, const Scalar& s)
    {
        res = MatExpr(&g_MatOp_AddEx, 0, a, b, Mat(), alpha, beta, s);
    }
    

    The MatExpr object has attributes (NB: Clarify, elaborate)

    With that in mind, we can see that our first intermediate MatExpr p has:

    Notice that no real calculations happened so far, lazy evaluation is at play.


    Next step, the multiplication operator that creates MatExpr q:

    MatExpr operator * (double s, const MatExpr& e)
    {
        MatExpr en;
        e.op->multiply(e, s, en);
        return en;
    }
    

    Recall that e.op is an instance of MatOp_AddEx, so it calls void MatOp_AddEx::multiply(...) const:

    void MatOp_AddEx::multiply(const MatExpr& e, double s, MatExpr& res) const
    {
        res = e;
        res.alpha *= s;
        res.beta *= s;
        res.s *= s;
    }
    

    Hmm, what's happening here? The result is a copy of p with some changes applied:

    And still, no real work has happened, only some scalar values got multiplied. However, we can see the first sign of a potential problem: the expression seems to have been transformed into m = c * m - c *sub_m;.


    Finally the assignment operator, where the laziness ends, and the operation is calculated and written to the destination:

    inline
    Mat& Mat::operator = (const MatExpr& e)
    {
        e.op->assign(e, *this);
        return *this;
    }
    

    Just calling void MatOp_AddEx::assign(...) const. That one is a bit long, so I won't copy it all here. However, we don't have any b, so more than half of the code is irrelevant. Type is left at default -1, so there is not temporary and dst == m.

    The next relevant check is if( e.s.isReal() && (dst.data != m.data || fabs(e.alpha) != 1)). We know that alpha is not 1 and dst == m, so the deciding factor is bool Scalar_<_Tp>::isReal() const:

    return this->val[1] == 0 && this->val[2] == 0 && this->val[3] == 0;
    

    i.e. only the first element is set to a non-zero value (with zero being the default for arguments not provided). If that is the case, the calculation that happens is:

    e.a.convertTo(m, _type, e.alpha, e.s[0]);
    

    A simple Mat::convertTo, with alpha argument set to c, and beta set to -(c * sub_m[0]). Therefore, this calculates c * m - c * sub_m[0] with saturation only happening only on the result.

    Otherwise, (when the Scalar has more than one/first non-zero value), we fall through to the final else, which does the following calculation:

    e.a.convertTo(dst, e.a.type(), e.alpha);
    cv::add(dst, e.s, dst);
    

    Here, the Mat::convertTo is only used to calculate c*m (since beta can only be a double). Saturation happens at the end of this calculation. After that, it adds the scaled Scalar with value of (-c * sub_m) to the result. Another saturation happens after that.


    Solution

    "I want to take advantage of the single-channel behavior..."

    Ditch the matrix expression, use the underlying operations instead.

    1. All Scalar values are equal (like in your example). Make it a Mat::convertTo.

    2. With different values for each channel, it's going to get a bit more tricky, since we want only single saturation to happen. A sub-optimal alternative is to have a temporary of sufficient resolution, like an array of doubles (TODO: crosscheck convertTo implementation). Then you'd use cv::subtract (setting the appropriate destination datatype). Follow that with a convertTo to apply the constant scaling and do the saturation. A better performing implementation might be using parallelFor, thus avoiding the temporary Mat.

    TODO: Add solution examples.