phpweb-crawlergouttedata-scrubbing

Form submit with multiple redirection


I'm trying to fetch data from a website where once you submit the form it redirects to a loading page which is set to be automatically redirected to the final results page. The issue is that the crawler only gets the data of the loading page and does not go fully to the final results page which is what I actually need. Can someone please tell me how I can achieve that? If not possible then what could be an alternative way to do this?


Solution

  • If you're using curl, you can try the following:

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

    If you still aren't getting past the loading page, its possible its not an http redirect.

    In that case you'll have to manually parse the target location. A lot of websites use a meta refresh tag for such loading pages. Look for something similar to the following:

    <meta http-equiv="refresh" content="5; url=http://example.com/" />

    You can easily parse the above with regex or any dom parsing library for php.

    Another possibility is a javascript redirect. Look for lines containing window.location in the source code.