I have a temporal dataset(1000000x70) consisting of info about the activities of 20 subjects. I need to apply subsampling to the dataset as it has more than a million rows. How to select a set of observations of each subject ideally from it? Later, I need to apply PCA and K-means on it. Kindly help me with the steps to be followed. I'm working in MATLAB.
I'm not really clear on what you're looking for. If you just want to subsample a matrix on matlab, here is a way to do it:
myData; % 70 x 1000000 data
nbDataPts = size(myData, 2); % Get the number of points in the data
subsampleRatio = 0.1; % Ratio of data you want to keep
nbSamples = round(subsampleRatio * nbDataPts); % How many points to keep
sampleIdx = round(linspace(1, nbDataPts, nbSamples)); % Evenly space indices of the points to keep
sampledData = myData(:, sampleIdx); % Sampling data
Then if you want to apply PCA and K means I suggest you take a look at the relevant documentation:
Try to work with it, and open a new question if a specific problem arises.