I'm struggling on splitting a txt file made of 1 column of N blocks made of 1000 rows. The first 3 rows of each block is an header, followed by 1000 rows of data. Row 1001 will be the next block starting with the three headers in this way:
#Header1
#Header2_1,Header2_2
#Header3
Data1_1-Data1_2
...
Data1000_1-Data1001_2
Header1
Header2
Header3
Data1002_1-Data1002_2
...
Data2003_1-Data2003_2
:::
Each row of data is made of two sets, so I'd like to split into two columns the data, keeping each header in this way
Block1 | Block2 | Block3 ...
Header1 | Header1
Header2 1_1 | Header2 1_2 | Header2 2_1 | Header2 2_2
Header3 | Header3
Data1_1 | Data1_2 | Data1002_1 | Data1002_2
... | ...
Data1001_1 | Data1001_2 | Data2003_1 | Data2003_2
I tried with reshape but without success:
filename = 'text.txt';
data = splitlines(filename);
NRows = 1001;
NumBlocks = length(data)/NRows
Data_Reshaped = reshape(data,NRows,NumBlocks);
but it works partially, because I obtain a 1001 x 12 cell array and each cell has
Data1_1-Data1_2
with the -
as delimiter. How can I split again this cell into two columns in ordered way and get a 1001x24 cell arrays?
Than the result should be exported in excel but I guess writetable
function should do the job.
You need to do three things to make your data flat/uniform enough to put into Excel
Split the single-header rows into two headers, the 2nd of which is just an empty string but you need two columns so you can stack with others properly
Split the two-header row into two headers, using ,
as the delimiter
Split the data rows into two, using -
as the delimiter
I've created an example (at the bottom of this answer) with 3 data rows for testing:
First, read in the data
% Use fileread to get the entire file contents, split on newlines
dat = fileread( 'text.txt' );
dat = splitlines( dat );
% Define the schema for this data: 3 header rows then 3 data rows
nHeaders = 3;
nDataRows = 3;
nPerBlock = nHeaders + nDataRows;
% Do an initial reshape to get each "block" into its own column of the cell
dat = reshape( dat, nPerBlock, [] );
Now we can do the numbered operations above to get a cell array with two-columns per cell:
% Split single-item headers from 'Header1' into {'Header1',''}
dat([1,3],:) = cellfun( @(x) {x,''}, dat([1,3],:), 'uni', 0 );
% Split two-item headers from 'Header2 2_1,Header2 2_2' into {'Header2 2_1','Header2 2_2'}
dat(2,:) = cellfun( @(x) strsplit( x, ',' ), dat(2,:), 'uni', 0 );
% Split the data from 'Data1_1-Data1_2' into {'Data1_1','Data1_2'}
dat(nHeaders+1:end,:) = cellfun( @(x) strsplit( x, '-' ), dat(nHeaders+1:end,:), 'uni', 0 );
Now you need to do a final bit of restructuring to expand and flatten the cell array:
dat = arrayfun( @(r) [dat{r,:}], 1:size(dat,1), 'uni', 0 );
dat = vertcat( dat{:} );
Final output:
Since your data is a cell array rather than a table at this point (despite looking a bit like a table), you probably want to use writecell()
to write into Excel or a csv.
Example data text.txt
:
#Header1
#Header2 1_1,Header2 1_2
#Header3
Data1_1-Data1_2
Data2_1-Data2_2
Data3_1-Data3_2
#Header1
#Header2 2_1,Header2 2_2
#Header3
Data4_1-Data4_2
Data5_1-Data5_2
Data6_1-Data6_2
#Header1
#Header2 3_1,Header2 3_2
#Header3
Data7_1-Data7_2
Data8_1-Data8_2
Data9_1-Data9_2