matlabcluster-analysisk-meanspcatemporal-tables

K-Means on temporal dataset


I have a temporal dataset(1000000x70) consisting of info about the activities of 20 subjects. I need to apply subsampling to the dataset as it has more than a million rows. How to select a set of observations of each subject ideally from it? Later, I need to apply PCA and K-means on it. Kindly help me with the steps to be followed. I'm working in MATLAB.


Solution

  • I'm not really clear on what you're looking for. If you just want to subsample a matrix on matlab, here is a way to do it:

    myData;                      % 70 x 1000000  data
    nbDataPts = size(myData, 2); % Get the number of points in the data
    
    subsampleRatio = 0.1;        % Ratio of data you want to keep
    nbSamples = round(subsampleRatio * nbDataPts);  % How many points to keep
    sampleIdx = round(linspace(1, nbDataPts, nbSamples)); % Evenly space indices of the points to keep
    
    sampledData = myData(:, sampleIdx);  % Sampling data
    
    

    Then if you want to apply PCA and K means I suggest you take a look at the relevant documentation:

    Try to work with it, and open a new question if a specific problem arises.