While creating a script in Vugen using the TruClient protocol (Firefox), I
have an Evaluate JS on object
step that finds an object, with the
following object.innerHTML
:
Foo Bar<br />BAZ
<br />
I need to extract BAZ
from this text to use elsewhere, so I have the
following code in the JS portion, to extract it using a regex:
var regex = /Foo\s+Bar<br\s+\/>(.*)\s*<br \/>/i; // Shows as red in the TC JS editor, but no error icon shows, so not sure what the error may be.
var matches = [];
var match;
matches = regex.exec(object.innerHTML);
match = matches[matches.length - 1];
window.alert(match); // For debugging purposes
However, when I run the script, this fails with the following error:
** 6: Evaluate JavaScript var regex = /Foo\s+B... alert(match); on Foo Bar
** failed - an argument is invalid: 'Code': JavaScript exception
'TypeError: matches is null' during evaluation
I have tested this regex here, and it works as expected.
Using the webtoolkit online JS tester, I've successfully tested the following variant on the code, to ensure that it extracts what I need it to:
var data = "Foo Bar<br />BAZ<br />";
var regex = /Foo\s+Bar<br\s+\/>(.*)\s*<br \/>/i;
var matches = [];
var match;
matches = regex.exec(data);
match = matches[matches.length - 1];
alert(match);
This returns BAZ
as expected.
I originally assumed this was a Vugen/TruClient specific issue. However, after sleeping on it and reading Michael Galos' answer (below), I realized that it was was a generic Javascript issue, so I added the Javascript tag to this as well.
Thank you to Michael Galos for providing part of the answer. However, his answer did not resolve the complete issue.
I inserted debugging code to write the object.innerHTML
to the console to
examine it more closely. Finally, after the n+1
th time running it and
watching the output, I observed that the page source was:
Foo Bar<br />BAZ
<br />
But Javascript captured this as:
Foo Bar<br>BAZ <br>
As a result, I modified the regex as follows:
var re = /<br\s*\/?>\s*(.*?)\s*<br\s*\/?>/i
Changing the regex for the <br />
tag from <br\s+\/>
to <br\s*\/?>
matches either <br />
or <br>
. The \s*
matches 0 or more whitespace characters, and the \/?
optionally matches the /
character.
Adding \s*
before the capture group trims any leading whitespace, and adding the ?
to the end of the capture group trims any trailing whitespace by turning it into a non-greedy match.
This now successfully matches any combination of the following on either single
or multiple lines, returning only BAZ
:
Foo Bar<br />BAZ<br />
Foo Bar<br>BAZ<br>
Foo Bar<br /> BAZ <br />
Foo Bar<br> BAZ <br>