matlabtextscanfile-import

Blank cells while reading substring and numbers from with a string with textscan


I have a text file that consists of line after line of data in an xml-like format like this:

<item type="newpoint1" orient_zx="0.8658983248810842" orient_zy="0.4371062806139187" orient_zz="0.2432245678709263" electrostatic_force_x="0" electrostatic_force_y="0" electrostatic_force_z="0" cust_attr_HMTorque_0="0" cust_attr_HMTorque_1="0" cust_attr_HMTorque_2="0" vel_x="0" vel_y="0" vel_z="0" orient_xx="-0.2638371745169712" orient_xy="-0.01401379799313232" orient_xz="0.9644654264455047" pos_x="0" cust_attr_BondForce_0="0" pos_y="0" cust_attr_BondForce_1="0" pos_z="0.16" angvel_x="0" cust_attr_BondForce_2="0" angvel_y="0" id="1" angvel_z="0" charge="0" scaling_factor="1" cust_attr_BondTorque_0="0" cust_attr_BondTorque_1="0" cust_attr_BondTorque_2="0" cust_attr_Damage_0="0" orient_yx="0.4249823952954215" cust_attr_HMForce_0="0" cust_attr_Damage_1="0" orient_yy="-0.8993006799250595" cust_attr_HMForce_1="0" orient_yz="0.1031903618333235" cust_attr_HMForce_2="0" />

I'm only interested in the values within the " " so I'm trying to read this with textscan. To do this I take the first line and do regex find/replace to swap all number for %f and strings for %s, like this:

expression = '"[-+]?\d*\.?\d*"';
expression2 = '"\w*?"';

newStr = regexprep(firstline,expression,'"%f"');
FormatString = sprintf('%s',regexprep(newStr,expression2,'"%s"'));

The I re-open the file to read the files with string with the following call:

while ~feof(InputFile) % Read all lines in file    
    data = textscan(InputFile,FormatString,'delimiter','\n');    
end

But all i get is an array of empty cells. I can't see what my mistake is - can someone point me in the right direction?

Clarification:

Mathworks provides this following example for textscan to remove literal text, which is what I'm trying to do.

"Remove the literal text 'Level' from each field in the second column of the data from the previous example."

filename = fullfile(matlabroot,'examples','matlab','scan1.dat');
fileID = fopen(filename);
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}

Solution

  • Ok, after looking at this with some fresh eyes today I spotted my problem.

    newStr = regexprep(firstline,expression,'"%f"');
    FormatString = sprintf('%s',regexprep(newStr,expression2,'%q'));
    
    data = textscan(InputFile,FormatString,'delimiter',' ');
    

    The replacement of the string need to be switched to the %q option which allows a string within double quotes to be read and the delimiter for textscan needed to be reverted to a single space. Code working fine now.