javascriptloadrunnervugentruclient

Using Javascript Regex in Evaluate JS on object step in Vugen TruClient Protocol


While creating a script in Vugen using the TruClient protocol (Firefox), I have an Evaluate JS on object step that finds an object, with the following object.innerHTML:

Foo Bar<br />BAZ
<br />

I need to extract BAZ from this text to use elsewhere, so I have the following code in the JS portion, to extract it using a regex:

var regex = /Foo\s+Bar<br\s+\/>(.*)\s*<br \/>/i;  // Shows as red in the TC JS editor, but no error icon shows, so not sure what the error may be.
var matches = [];
var match;
matches = regex.exec(object.innerHTML);
match = matches[matches.length - 1];
window.alert(match);  // For debugging purposes

However, when I run the script, this fails with the following error:

** 6: Evaluate JavaScript var regex = /Foo\s+B... alert(match); on Foo Bar
** failed - an argument is invalid: 'Code': JavaScript exception
'TypeError: matches is null' during evaluation

I have tested this regex here, and it works as expected.

Using the webtoolkit online JS tester, I've successfully tested the following variant on the code, to ensure that it extracts what I need it to:

var data = "Foo Bar<br />BAZ<br />";
var regex = /Foo\s+Bar<br\s+\/>(.*)\s*<br \/>/i;
var matches = [];
var match;
matches = regex.exec(data);
match = matches[matches.length - 1];
alert(match);

This returns BAZ as expected.

Edit

I originally assumed this was a Vugen/TruClient specific issue. However, after sleeping on it and reading Michael Galos' answer (below), I realized that it was was a generic Javascript issue, so I added the Javascript tag to this as well.


Solution

  • Thank you to Michael Galos for providing part of the answer. However, his answer did not resolve the complete issue.

    I inserted debugging code to write the object.innerHTML to the console to examine it more closely. Finally, after the n+1th time running it and watching the output, I observed that the page source was:

    Foo Bar<br />BAZ
    <br />
    

    But Javascript captured this as:

    Foo Bar<br>BAZ <br>
    

    As a result, I modified the regex as follows:

    var re = /<br\s*\/?>\s*(.*?)\s*<br\s*\/?>/i
    

    Changing the regex for the <br /> tag from <br\s+\/> to <br\s*\/?> matches either <br /> or <br>. The \s* matches 0 or more whitespace characters, and the \/? optionally matches the / character.

    Adding \s* before the capture group trims any leading whitespace, and adding the ? to the end of the capture group trims any trailing whitespace by turning it into a non-greedy match.

    This now successfully matches any combination of the following on either single or multiple lines, returning only BAZ:

    Foo Bar<br />BAZ<br />
    Foo Bar<br>BAZ<br>
    Foo Bar<br />     BAZ     <br />
    Foo Bar<br>     BAZ     <br>