c++opencv computer-vision camera-calibration

opencv calibrateCamera function yielding bad results

I'm trying to get opencv camera calibration working but having trouble getting it to output valid data. I have an uncalibrated camera that I would like to calibrate, but to test my code I am using an Azure Kinect camera (the color camera), since the SDK supplies the correct intrinsics for it and I can verify them. I've collected 30 images of a chessboard from slightly different angles, which I understand should be sufficient, and run the calibration function, but no matter what flags I pass in I get values for fx and fy that are pretty different from the correct fx and fy, and distortion coefficients that are WILDLY different. Am I doing something wrong? Do I need more or better data?

A sample of the images I'm using can be found here: https://www.dropbox.com/sh/9pa94uedoe5mlxz/AABisSvgWwBT-bY65lfzp2N3a?dl=0

Save them in c:\calibration_test to run the code below.

#include <filesystem>
#include <iostream>

#include <opencv2/calib3d/calib3d.hpp>
#include <opencv2/core.hpp>
#include <opencv2/features2d/features2d.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc/imgproc.hpp>

using namespace std;

namespace fs = experimental::filesystem;

static bool extractCorners(cv::Mat colorImage, vector<cv::Point3f>& corners3d,
                           vector<cv::Point2f>& corners) {
  // Each square is 20x20mm
  const float kSquareSize = 0.020f;
  const cv::Size boardSize(7, 9);
  const cv::Point3f kCenterOffset((float)(boardSize.width - 1) * kSquareSize,
                                  (float)(boardSize.height - 1) * kSquareSize,
                                  0.f);

  cv::Mat image;
  cv::cvtColor(colorImage, image, cv::COLOR_BGRA2GRAY);

  int chessBoardFlags =
      cv::CALIB_CB_ADAPTIVE_THRESH | cv::CALIB_CB_NORMALIZE_IMAGE;
  if (!cv::findChessboardCorners(image, boardSize, corners, chessBoardFlags)) {
    return false;
  }

  cv::cornerSubPix(
      image, corners, cv::Size(11, 11), cv::Size(-1, -1),
      cv::TermCriteria(cv::TermCriteria::EPS + cv::TermCriteria::COUNT, 30,
                       0.1));

  // Construct the corners
  for (int i = 0; i < boardSize.height; ++i)
    for (int j = 0; j < boardSize.width; ++j)
      corners3d.push_back(cv::Point3f(j * kSquareSize, i * kSquareSize, 0) -
                          kCenterOffset);

  return true;
}

int main() {
  vector<cv::Mat> frames;
  for (const auto& p : fs::directory_iterator("c:\\calibration_test\\")) {
    frames.push_back(cv::imread(p.path().string()));
  }

  int numFrames = (int)frames.size();
  vector<vector<cv::Point2f>> corners(numFrames);
  vector<vector<cv::Point3f>> corners3d(numFrames);

  int framesWithCorners = 0;
  for (int i = 0; i < numFrames; ++i) {
    if (extractCorners(frames[i], corners3d[framesWithCorners],
                       corners[framesWithCorners])) {
      ++framesWithCorners;
    }
  }

  numFrames = framesWithCorners;
  corners.resize(numFrames);
  corners3d.resize(numFrames);

  // Camera intrinsics come from the Azure Kinect API
  cv::Matx33d cameraMatrix(914.111755f, 0.f, 960.887390f, 0.f, 913.880615f,
                           551.566528f, 0.f, 0.f, 1.f);
  vector<float> distCoeffs = {0.576340079f,     -2.71203661f, 0.000563957903f,
                              -0.000239689150f, 1.54344523f,  0.454746544f,
                              -2.53860712f,     1.47272563f};

  cv::Size imageSize = frames[0].size();
  vector<cv::Point3d> rotations;
  vector<cv::Point3d> translations;
  int flags = cv::CALIB_USE_INTRINSIC_GUESS | cv::CALIB_FIX_PRINCIPAL_POINT |
              cv::CALIB_RATIONAL_MODEL;
  double result =
      cv::calibrateCamera(corners3d, corners, imageSize, cameraMatrix,
                          distCoeffs, rotations, translations, flags);

  // After this call, cameraMatrix has different values for fx and fy, and
  // WILDLY different distortion coefficients.

  cout << "fx: " << cameraMatrix(0, 0) << endl;
  cout << "fy: " << cameraMatrix(1, 1) << endl;
  cout << "cx: " << cameraMatrix(0, 2) << endl;
  cout << "cy: " << cameraMatrix(1, 2) << endl;
  for (size_t i = 0; i < distCoeffs.size(); ++i) {
    cout << "d" << i << ": " << distCoeffs[i] << endl;
  }

  return 0;
}

Some sample output is:

fx: 913.143
fy: 917.965
cx: 960.887
cy: 551.567
d0: 0.327596
d1: -73.1837
d2: -0.00125972
d3: 0.002805
d4: -7.93086
d5: 0.295437
d6: -73.481
d7: -3.25043
d8: 0
d9: 0
d10: 0
d11: 0
d12: 0
d13: 0

Any idea what I'm doing wrong?

Bonus question: Why do I get 14 distortion coefficients back instead of 8? If I leave off CALIB_RATIONAL_MODEL then I only get 5 (three radial and two tangential).

Solution

You need to take images from the whole field of view of the camera to correctly capture the lens distortion characteristics. The images you provide only show the chessboad in one position, slightly angled.

Ideally you should have images of the chessboard evenly distributed over the x and y axis of the image plane, right up to the edges of the image. Make sure sufficient white boarder around the board is always visible though for detection robustness.

You should also try to capture images where the chessboard is nearer to the camera and farther away, not just a uniform distance. The different angles you provide look good on the other hand.

You can find an extensive guide how to ensure good calibration results in this answer: How to verify the correctness of calibration of a webcam?

Comparing your camera matrix to the one coming from Azure Kinect API it doesn't look so bad. The principle point is pretty spot on and the focal length is in a reasonable range. If you improve the quality of the input with my tips and the SO answer I have provided the results should be even closer. Comparing sets of distortion coefficients by their distance doesn't really work that well, the error function is not convex so you can have lots of local minima that produce relatively good results but they are far from the global minimum that would yield the best results. If that explanation makes sense to you.

Regarding your bonus question: I only see 8 values filled in in the output you return, the rest is 0 so doesn't have any influence. I'm not sure if the output is expected to be different from that function.