javascreen-scrapingxquerywebharvest

Xquery error in WebHarvest


I'm using WebHarvest to parse some html. I get the following error in WebHarvest's ide on the function that follows, and I don't understand what's wrong. I'm trying to create a function that trims a string.

Error:

Error executing XQuery expression (Xquery=[declare variable $xqsource external; let $result := normalize-space($xqsource) return $result])!

Edit2: The log reports the following SAX Error:

[...] Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog

I don't understand what this means in this case.

Function's parameters: sourceString, the string to trim

<function name="trim">
    <return>
        <xquery>
            <xq-param name="xqsource">
                <var name="sourceString" />
            </xq-param>
            <xq-expression><![CDATA[
                declare variable $xqsource external;

                let $result := normalize-space($xqsource)
                    return 
                     $result
                ]]>
            </xq-expression>
        </xquery>
    </return>
</function>

Edit: sourceString is a string composed of alphanumeric chars, new lines and spaces, like

" blabla - bla2

"


Solution

  • the default type of xq-param is node() (cf manual). Therefore, WebHarvest tries to parse your variable content as XML (SAXParseException is an XML parsing error, not a particular XQuery error).

    You should add a string type declaration to your param:

    <xq-param name="xqsource" type="string">
      <var name="sourceString" />
    </xq-param>
    

    Does that help?