I want to parse an HTML document from a URL in Java.
When I enter the url in my browser (chrome) it doesn't display the html page but it downloads it.
So the url is the link behind a "download" button on a webpage. No problem so far. The url is "https://www.shazam.com/myshazam/download-history" if I paste it in my browser, it downloads fine. But when I try to download it with java, I get a 401 (forbidden) error.
I Checked the chrome network tool when loading the url and noticed that my profile-data and registration cookies where passed with the http GET.
I tried a lot of different methods but none worked. So my question is, how do I produce this with java? How can I get (download) the HTML file and parse it?
update:
This is what we found so far (thanks to Andrew Regan):
BasicCookieStore store = new BasicCookieStore();
store.addCookie( new BasicClientCookie("profile-data", "value") ); // profile-data
store.addCookie( new BasicClientCookie("registration", "value") ); // registration
Executor executor = Executor.newInstance();
String output = executor.use(store)
.execute(Request.Get("https://www.shazam.com/myshazam/download-history"))
.returnContent().asString();
The last line of code seems to cause a NullPointerException. The rest of the code seems to work fine to load non-protected webpages.
I found the answer myself. Using HttpURLConnection, this method can be used to "authenticate" to a variety of services. I used chrome's build in networking tools to get the cookie values of the GET request.
HttpURLConnection con = (HttpURLConnection) new URL("https://www.shazam.com/myshazam/download-history").openConnection();
con.setRequestMethod("GET");
con.addRequestProperty("Cookie","registration=Cooki_Value_Here;profile-data=Cookie_Value_Here");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();