I'm working with Principal Component Analysis (PCA) in openCV. The constructor inputs for the case I'm interested in are:
PCA(InputArray data, InputArray mean, int flags, double retainedVariance);
Regarding the InputArray 'data' the documents state the appropriate flags should be:
CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows. CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.
My question pertains to the use of the term 'samples' in that I'm not sure what a sample is in this context.
For example let's say I have 4 sets of data and for the sake of illustration let's label them A-D. Now each set A through D has 8 elements. They are then set up in the Mat variable I'll use as InputArray as follows:
The question is, which is it:
Another way of asking:
?
As a guess, I'd choose CV_PCA_DATA_AS_COL (i.e. I have 4 samples) - but that's just where my head is at... Until I learn the correct terminology it seems the word 'sample' could apply to either reasoning.
Ugh...
So the answer was found by reversing the logic behind the documentation for the PCA::project step...
Mat PCA::project(InputArray vec)
vec – input vector(s); must have the same dimensionality and the same layout as the input data used at PCA phase, that is, if CV_PCA_DATA_AS_ROW are specified, then vec.cols==data.cols (vector dimensionality)
i.e. 'sample' is equivalent to 'set', and the elements are the 'dimension'.
(and my guess was correct :)