matlabcell-arrayorganizationlarge-data-volumesmatlab-table

Organising large datasets in Matlab


I have a problem I hope you can help me with.

I have imported a large dataset (200000 x 5 cell) in Matlab that has the following structure:

'Year' 'Country' 'X' 'Y' 'Value'

Columns 1 and 5 contain numeric values, while columns 2 to 4 contain strings.

I would like to arrange all this information into a variable that would have the following structure:

NewVariable{Country_1 : Country_n , Year_1 : Year_n}(Y_1 : Y_n , X_1 : X_n)

All I can think of is to loop through the whole dataset to find matches between the names of the Country, Year, X and Y variables combining the if and strcmp functions, but this seems to be the most ineffective way of achieving what I am trying to do.

Can anyone help me out?

Thanks in advance.


Solution

  • As mentioned in the comments you can use categorical array:

    % some arbitrary data:
    country = repmat('ca',10,1);
    country = [country; repmat('cb',10,1)];
    country = [country; repmat('cc',10,1)];
    T = table(repmat((2001:2005)',6,1),cellstr(country),...
        cellstr(repmat(['x1'; 'x2'; 'x3'],10,1)),...
        cellstr(repmat(['y1'; 'y2'; 'y3'],10,1)),...
        randperm(30)','VariableNames',{'Year','Country','X','Y','Value'});
    % convert all non-number data to categorical arrays:
    T.Country = categorical(T.Country);
    T.X = categorical(T.X);
    T.Y = categorical(T.Y);
    % here is an example for using categorical array:
    newVar = T(T.Country=='cb' & T.Year==2004,:);
    

    The table class is made for such things, and very convenient. Just expand the logic statement in the last line T.Country=='cb' & T.Year==2004 to match your needs. Tell me if this helps ;)