javascriptphpjqueryweb-scrapingcross-domain

Need to scrape contents of website that requires an "i agree" cookie to be set


From everything I've read, it seems that this is an impossible. But here is my scenario:

I need to scrape a table's content containing for sale housing information. The page is not password protected or anything, but you first have to click an "I Agree" link on the previous page so that a cookie gets set saying you agree that the content may not be 100% accurate. You are only then shown the data. Is there any way at all to accomplish this using php/jquery/javascript? I know you cannot create an iframe because of the fact that it is cross-domain. I also do not have access to this other website.

Thanks for any answers, as I'm not really expecting anything positive. :) And many thanks if you can tell me how to do this. :D


Solution

  • Use server side script (PHP using cURL) to crawl the website and return the information you need. Make sure you set the appropriate HTTP header with your request that represents the "I agree" cookie.

    Sample:

    <?php
    
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
    curl_setopt($ch, CURLOPT_COOKIE, 'I_Agree=1');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    
    $responseBody = curl_exec($ch);
    
    curl_close($ch);
    
    // Read the information you need from $responseBody and return it as response body
    
    ?>
    

    Now you can access the information from your website by calling your server side script above. For details about how to use cURL take a look at the documentation.