[SOLVED] OpenCV Principal Component Analysis terminology

OpenCV Principal Component Analysis terminology - what actually is a 'sample'?

I'm working with Principal Component Analysis (PCA) in openCV. The constructor inputs for the case I'm interested in are:

PCA(InputArray data, InputArray mean, int flags, double retainedVariance);

Regarding the InputArray 'data' the documents state the appropriate flags should be:

CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows. CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.

My question pertains to the use of the term 'samples' in that I'm not sure what a sample is in this context.

For example let's say I have 4 sets of data and for the sake of illustration let's label them A-D. Now each set A through D has 8 elements. They are then set up in the Mat variable I'll use as InputArray as follows:

enter image description here

The question is, which is it:

My sets are samples?
My data elements are samples?

Another way of asking:

Do I have 4 samples (CV_PCA_DATA_AS_COL)
Or do I have 4 sets of 8 samples (CV_PCA_DATA_AS_ROW)

As a guess, I'd choose CV_PCA_DATA_AS_COL (i.e. I have 4 samples) - but that's just where my head is at... Until I learn the correct terminology it seems the word 'sample' could apply to either reasoning.

Solution

Ugh...

So the answer was found by reversing the logic behind the documentation for the PCA::project step...

Mat PCA::project(InputArray vec)

vec – input vector(s); must have the same dimensionality and the same layout as the input data used at PCA phase, that is, if CV_PCA_DATA_AS_ROW are specified, then vec.cols==data.cols (vector dimensionality)

i.e. 'sample' is equivalent to 'set', and the elements are the 'dimension'.

(and my guess was correct :)