I have recently started using Web-Harvest as a web scraping tool. Currently, I am working in the beginning of a project where I want to authenticate / log in to a web site. Before I begin I want to make clear that [URL] in the code replaces the actual url of the web page.
So, I am trying to post login information by executing the following config:
<config>
<var-def name="result">
<http method="post" url="[URL]/webreservations/WebObjects/WebReservations.woa/wa/Login?language=1&server=1" multipart="true">
<http-param name="login">[myusername]</http-param>
<http-param name="password">[mypassword]</http-param>
</http>
</var-def>
</config>
How do I retrieve the resulting information and follow the re-direction? When logging in manually the extension below is added to the URL. There seems to be some kind of randomisation and also a session id that is added. I suppose that is something I need to incorporate in my solution?
[URL]/nP8oIdbhk8MTXkrQ7Y2Z1g/0.3.0;jsessionid=2EF81CDA9A2EFF0B14E45889BC279BFA
Below is a part of the source of the page, that might be key to the problem. Is it a WebObjects problem? Is it a javascript problem? Am I the problem? :)
<body onload="document.form.login.focus();">
<form name="form" onsubmit="showDiv();return true;" method="post" action="/webreservations/WebObjects/WebReservations.woa/wa/Login">
...
</form>
</body>
Any help is greatly appreciated.
make sure you have got all the necessary params for login. It may require more than just password and username.