I am using LWP::UserAgent
to download an csv in unix server
my $ua1 = LWP::UserAgent->new();
my $res = $ua1->get($equity_history_url, @netscape_like_headers);
The server keeps giving me an error with code 404 (not found)
Despite that I can download this file from the browser -> http://www.nseindia.com/content/equities/scripvol/datafiles/18-08-2013-TO-17-08-2015ADANIPOWERALLN.csv
and the code works with other pages
I think the issue is one of the below
I tried passing a header similar to my browser which I captured using wireshark
my @netscape_like_headers = (
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36',
'Accept-Language' => 'en-US,en;q=0.8',
'Accept-Charset' => 'iso-8859-1,*,utf-8',
'Accept-Encoding' => 'gzip, deflate, sdch',
'Upgrade-Insecure-Requests' => '1',
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Connection' => 'keep-alive'
);
still no luck. any suggestions?
When I use LWP::UserAgent on that URL without your headers I get a 403 Forbidden error back. Same with curl
.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $res = $ua->get(
'http://www.nseindia.com/content/equities/scripvol/datafiles/18-08-2013-TO-17-08-2015ADANIPOWERALLN.csv');
print $res->as_string;
__END__
HTTP/1.1 403 Forbidden
Connection: close
Date: Tue, 18 Aug 2015 12:19:13 GMT
Server: AkamaiGHost
Content-Length: 388
Content-Type: text/html
Expires: Tue, 18 Aug 2015 12:19:13 GMT
Client-Date: Tue, 18 Aug 2015 12:19:13 GMT
Client-Peer: 104.85.166.76:80
Client-Response-Num: 1
Mime-Version: 1.0
Title: Access Denied
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.nseindia.com/content/equities/scripvol/datafiles/18-08-2013-TO-17-08-2015ADANIPOWERALLN.csv" on this server.<P>
Reference #18.48a65568.1439900353.1006096e
</BODY>
</HTML>
When I added your headers, it worked the first time I tried. When I re-ran it, it also gave 404 Not Found. Now when I click the link in the browser, it gives a 404 as well.
I believe they are preventing you from dowloading the file multiple times. If you are on a dial-up connection or broadband with a non-static IP address, try to reconnect to get a fresh one, or use a proxy.
Maybe they also have terms of services that forbid using automation tools to access their ressources because they are not ment to be APIs.
In fact they do not allow what you are trying to do! Point 12 of their terms of services clearly states that.
You may not conduct any systematic or automated data collection activities (including scraping, data mining, data extraction and data harvesting) on or in relation to our website without our express written consent.