xmlweb-crawlerwebharvest

Web-Harvest: grabbing multiple url's from a list


What I'm trying is to get multiple webpages from a predefined list. Here is the code:

<?xml version="1.0" encoding="UTF-8"?>
    <config>

      <script>
            <![CDATA[
                String[] codes = new String[] {"18","21","24","25","26"};
                SetContextVar("codes", codes);
            ]]>
      </script>
      <loop item="link">
            <list>
                <var name="codes" />
            </list>
            <body>

              <var-def name="webpage">
                  <html-to-xml>                                 
                    <http url="${sys.fullUrl('http://www.someurl.com/',link)}"/>            
                  </html-to-xml>
              </var-def> 
            </body>
        </loop>
    </config>

and the error is "Variable assignment: codes: Can't assign org.webharvest.runtime.variables.ListVariable to java.lang.String"

What am I missing here?


Solution

  • Please try this exmple:

    <config>
    
      <var-def name="Codes">
        <![CDATA[<Codes>]]>
        <![CDATA[<Code>]]>18<![CDATA[</Code>]]>
        <![CDATA[<Code>]]>21<![CDATA[</Code>]]>
        <![CDATA[<Code>]]>24<![CDATA[</Code>]]>
        <![CDATA[<Code>]]>25<![CDATA[</Code>]]>
        <![CDATA[</Codes>]]>
      </var-def>
    
      <loop item="CodesLoop" index="i">
        <list>
          <xpath expression="//Code/text()">
            <var name="Codes"/>
          </xpath>
        </list>
        <body>
          <file action="write" path="D:\ABC\${CodesLoop}.txt" charset="UTF-8">
            <template>${CodesLoop}</template>
          </file>
        </body>
      </loop>
    </config>