I'm making a program that is meant to connect to a website, get the HTML, and save it to a file. Simple enough, but if a website has a Captcha, it will send me the HTML for the Captcha page instead.
I learnt that if you have already completed the Captcha, and you give Java your cookies, it will let you past as you have already gotten the cookies
public static void testhtmlgetter()
{
CookieManager cookieManager = new CookieManager();
CookieHandler.setDefault(cookieManager);
URLConnection connection = new URL("website link here").openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
connection.getContent();
CookieStore cookieStore = cookieManager.getCookieStore();
scanner.useDelimiter("\\Z");
content = scanner.next();
scanner.close();
System.out.println(content);
File file = new File("E:\\java code\\scraper\\output\\filename.txt");
FileWriter writer = new FileWriter(file);
writer.write(content);
writer.close();
}
I have tried all manner of other peoples solutions using Java's CookieManager
, but most other posts just tell you how to get cookies from a website, not how to actually use those cookies.
I just want to know how to access a webpage I have already logged into using cookies
How I get these cookies isn't important either, I could just copy them from f12>application>storage>cookies if needed
How am I supposed to do this?
Your approach of using URLConnection
etc is too low level. You are going to need to replicate the HTTP protocol of sending HTTP Headers, and much more.
Rather, use a higher level library to read the website. Many libraries exist.
For instance: Apache HTTP Client and chapter 3 of the Tutorial covers cookies
If you are purely interested in the content (and don't need to use Java) then you could use one of the many HTTP tools, such as: