xmlweb-scrapingwebobjectswebharvest

What is wrong with my web harvest authentication config?


I have recently started using Web-Harvest as a web scraping tool. Currently, I am working in the beginning of a project where I want to authenticate / log in to a web site. Before I begin I want to make clear that [URL] in the code replaces the actual url of the web page.

So, I am trying to post login information by executing the following config:

<config>
    <var-def name="result"> 
        <http method="post" url="[URL]/webreservations/WebObjects/WebReservations.woa/wa/Login?language=1&amp;server=1" multipart="true"> 
        <http-param name="login">[myusername]</http-param>
        <http-param name="password">[mypassword]</http-param>
    </http>
</var-def>
</config>

How do I retrieve the resulting information and follow the re-direction? When logging in manually the extension below is added to the URL. There seems to be some kind of randomisation and also a session id that is added. I suppose that is something I need to incorporate in my solution?

[URL]/nP8oIdbhk8MTXkrQ7Y2Z1g/0.3.0;jsessionid=2EF81CDA9A2EFF0B14E45889BC279BFA

Below is a part of the source of the page, that might be key to the problem. Is it a WebObjects problem? Is it a javascript problem? Am I the problem? :)

<body onload="document.form.login.focus();">
   <form name="form" onsubmit="showDiv();return true;" method="post" action="/webreservations/WebObjects/WebReservations.woa/wa/Login">
...
</form>
</body>

Any help is greatly appreciated.


Solution

  • make sure you have got all the necessary params for login. It may require more than just password and username.