I am using WebHarvest to try to receive data from Woot.com and I'm getting a few different errors. I am able to get the website with the first process, but when I try to test xpath inside of the variable window I get the error org.xml.sax.SAXParseException; lineNumber: 86; columnNumber: 99; The reference to entity "pt2" must end with the ';' delimiter. If I try to use the pretty print function it returns XML is not well-formed: the reference to entity "pt2" must end with the ';' delimiter. {line: 86, col:99]. Lastly, Inside of the script I am writing, if I put in the xpath tag with an expression, I get element type "xpath" must be followed by either attributespecifications,">" or "/>". Can someone tell me what I am doing wrong? I am very new to WebHarvest and don't have any experience with this kind of program.
My code is:
<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class="overview"])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>
To make the XML well-formed you have use '
instead of "
within the attribute expression
. And here it goes:
<?xml version="1.0" encoding="UTF-8"?><config>
<xpath expression="(//div[@class='overview'])[1]//h2/text()">
<html-to-xml>
<http url="http://www.woot.com/"/>
</html-to-xml>
</xpath>
</config>
You could use '
or "
to wrap an attribute. But, it cannot be nested anyway. Here are few examples:
<xpath expression='(//div[@class="overview"])[1]//h2/text()'> --- valid
<xpath expression='(//div[@class='overview'])[1]//h2/text()'> --- invalid
<xpath expression="(//div[@class="overview"])[1]//h2/text()"> --- invalid
<xpath expression='(//div[@class='overview'])[1]//h2/text()'> --- valid
<xpath expression="(//div[@class='overview'])[1]//h2/text()"> --- valid
<xpath expression="(//div[@class="overview"])[1]//h2/text()"> --- valid
Hope this helps.