rweb-scrapingproxyscreen-scrapingproxy-server

How to access links using a different proxy server in R?


I am trying to use a different proxy server to do webs scraping in R. I am using use_proxy function in R, but no luck.

Please find the snippet of my code below:

 GET("http://had.co.nz", use_proxy("202.40.185.107", 8080), verbose())

It is spittinig out the following error:

 Error in curl::curl_fetch_memory(url, handle = handle) : 
 Timeout was reached: [had.co.nz] Connection timed out after 10000 milliseconds

Can anyone help me how I can change my proxy server in R to avoid getting blocked by the owner of the website. I thought the above method would be the easiest but not working for me. I would very much appreciate it if any web scraping wizard could give me a better idea to do this or how to fix this issue.

Thanks in advance!


Solution

  • To use a proxy, you need to be able to connect to it. Are you sure you can connect to the proxy server 202.40.185.107:8080? You can try that easily by e.g. putting 202.40.185.107:8080 in your browser or trying to ping 202.40.185.107:8080 using command line.

    You could try a different proxy. I found this one online and it is free. Just a word of caution - if you are using a proxy in order not to be blocked by the website owner, the proxy you would be using can be blocked by the website owner as well.

    GET("http://had.co.nz", use_proxy("35.169.156.54", 3128), verbose())