javaxpathjmeterjtidytag-soup

Jtidy StringIndexOutOfBoundsException in Jmeter


I want to retrieve content from a webpage using JMeter.
The data I'm looking for is inside a javascript block :

(...)
<map id="id1">
  <script type="text/javascript">
    var name="Lionel Richie";
    var song="Hello";
    var lyrics="Is it me you're looking for ?";
  </script>
(...)
  <script type="text/javascript">
    var name="Waldo";
  </script>
</map>
(...)

Let's say I want the value of the name variable inside a script block in the map id=id1,
where there's also a song variable.

I use a XPath Extractor to get the script content (CSS/Jquery won't get the javascript content as it's not pure HTML) :

.//map[@id='id1']/script[contains(.,'song')]

XPath won't find the data because my HTML is dirty (some wild stuff with missing tag ends and so on...) so I need to clean it up using Jtidy (Use "Tidy(tolerant parser)" option)

Remarks :
- I do not own the webpage I'm processing. I have to deal with this hideous HTML.
- there are many maps elements in the webpage each of them having a script with a song variable : I can't directly use regexp (as far as I know)

Problem :

The problem is : my HTML contains weird international characters wé hà bêêêê... (yep, french, sorry about that) and Jtidy doesn't handle properly this particular case : bug #205 StringIndexOutOfBoundsException while lexing script content

As a result Xpath extractor fails and my entire test plan is stuck.

I desgined a custom solution but I find it a bit complex. Maybe I can handle this in a better way.

My solution :

I used tagsoup java library to clean HTML output and store it in a JMeter variable that is then processed through Xpath (tick "JMeter variable" option in "Apply to") and finally I used a regexp to get my Lionel Richie stuff working...

JMeter |->HTTP Request |->BeanShell PostProcessor->tagsoup > var RESPONSE |->Xpath Extractor, Apply to var RESPONSE > var XPATH_OUTPUT |->Regular Expression Extractor, Apply to var XPATH_OUTPUT

To get tagsoup working with JMeter, just put the jar in the lib directory, and then use a BeanShell PostProcessor.

BeanShell code used :

import org.xml.sax.*;
import org.ccil.cowan.tagsoup.*;

// getting response data of previous sampler
String rep=prev.getResponseDataAsString();

XMLReader r = new Parser();
HTMLSchema theSchema = new HTMLSchema();
r.setProperty(Parser.schemaProperty, theSchema);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();

Writer w = new OutputStreamWriter(outStream);

XMLWriter x = new XMLWriter(w);
x.setPrefix(theSchema.getURI(), "");

r.setContentHandler(x);

r.parse(new InputSource(new StringReader(rep)) );

String encodedRep=outStream.toString("UTF-8");

vars.put("RESPONSE", encodedRep);

Solution

  • Use Regular Expression Extractor with the following regex:

    (?s)var name="([^"]+?)";.+?var song=

    It uses single line mode:

    See: enter image description here