opencvmachine-learningpcaprincipal-components

OpenCV Principal Component Analysis terminology - what actually is a 'sample'?


I'm working with Principal Component Analysis (PCA) in openCV. The constructor inputs for the case I'm interested in are:

PCA(InputArray data, InputArray mean, int flags, double retainedVariance);

Regarding the InputArray 'data' the documents state the appropriate flags should be:

CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows. CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.

My question pertains to the use of the term 'samples' in that I'm not sure what a sample is in this context.

For example let's say I have 4 sets of data and for the sake of illustration let's label them A-D. Now each set A through D has 8 elements. They are then set up in the Mat variable I'll use as InputArray as follows:

enter image description here

The question is, which is it:

Another way of asking:

?

As a guess, I'd choose CV_PCA_DATA_AS_COL (i.e. I have 4 samples) - but that's just where my head is at... Until I learn the correct terminology it seems the word 'sample' could apply to either reasoning.


Solution

  • Ugh...

    So the answer was found by reversing the logic behind the documentation for the PCA::project step...

    Mat PCA::project(InputArray vec)
    

    vec – input vector(s); must have the same dimensionality and the same layout as the input data used at PCA phase, that is, if CV_PCA_DATA_AS_ROW are specified, then vec.cols==data.cols (vector dimensionality)

    i.e. 'sample' is equivalent to 'set', and the elements are the 'dimension'.

    (and my guess was correct :)