Does a smaller reprojection error always means better calibration?

During camera calibration, the usual advice is to use many images (>10) with variations in pose, depth, etc. However I notice that usually the fewer images I use, the smaller the reprojection error. For example with 27 images, cv::calibrateCamera returns 0.23 and with just 3 I get 0.11 This may be due to the the fact that during calibration we are solving a least squares problem for an overdetermined system.

QUESTIONS:

Do we actually use the reprojection error as an absolute measure of how good a calibration is? For example, if I calibrate with 3 images and get 0.11, and then calibrate with 27 other images and get 0.23 can we really say that "the first calibration is better"?
OpenCV uses the same images both for calibration and for calculating the error. Isn't that some form of overfitting? Wouldn't it be more correct if I actually used 2 different sets -one to compute the calibration parameters and one to compute the error-? In that case, I would use the same (test) set to calculate the error for all my calibration results from different (training) sets. Wouldn't that be more fair?

Solution

Sorry if this is too late - only just saw it.

The error is the reprojection of the fit. So find points on an image, calculate the real world model, recalculate where those points would be on the image given the model - report the difference. In a way this is a bit circular, you might have a model that is only correct for those few images which would then report a very good error while giving it lots of images will make a much more generally correct model - but will have a larger error, just because you are trying to stretch it to fit a much bigger space.

There does come a point where adding more images doesn't improve the fit, and may add noise since points are never detected perfectly. What's important is to provide a bigger set of paramters, more angles and positions, rather than equivalent data

Using the same image set to predict the error isn't really a problem because the fit does have a real meaning in terms of actual physical lens parameters - it's not like training/testing a neural net on the same data.

edit: a better calibration routine than opencv (although based on the same concept) is included in 3D-DIC (free but not OSS, register for the site to get the download link) specifically see the calibration manual.