I have a dataset, for simplicity let's say it has 1000 samples (each is a vector).
I want to split my data for cross validation, for train and test, NOT randomly1, so for example if I want 4-fold cross validation, I should get:
fold1: train = 1:250; test= 251:1000
fold2: train = 251:500, test = [1:250 ; 501:1000]
fold3: train = 501:750, test = [1:500; 751:1000]
fold4: train = 751:1000, test = 1:750
I am aware of CVPARTITION, but AFAIK - it splits the data randomly - which is not what I need.
I guess I can write the code for it, but I figured there is probably a function I could use.
(1) The data is already shuffled and I need to be able to easily reproduce the experiments.
Here is a function that does it in general:
function [test, train] = kfolds(data, k)
n = size(data,1);
test{k,1} = [];
train{k,1} = [];
chunk = floor(n/k);
test{1} = data(1:chunk,:);
train{1} = data(chunk+1:end,:);
for f = 2:k
test{f} = data((f-1)*chunk+1:(f)*chunk,:);
train{f} = [data(1:(f-1)*chunk,:); data(f*chunk+1:end, :)];
end
end
It's not an elegant 1 liner, but it's fairly robust, doesn't need k
to be a factor of your number of samples, works on a 2D matrix and outputs the actual sets rather than indices.