I was trying to build a crawler that collects HTML sourcecodes from websites, which I have in a .csv file. Everything seems to be working fine whenever I place the link in
url = new URL ("http://example.com")
but whenever I try to place the link in a variable ("text" in this example) I get an error, telling me that there has been a malformedURLException.
Here is my code:
String text ="http://stackoverflow.com/questions/9827143/continuing-execution-after-an-exception-is-thrown-in-java";
// get the sourcecode of the link you just grabbed
url = new URL(text);
PrintWriter writer = new PrintWriter("sourcecode.txt", "UTF-8");
You have hidden characters in your string. You probably copied the URL from a Word file or a text file that was converted in Windows. There is a BOM marker in its beginning. When I do this:
System.out.println( Arrays.toString(text.getBytes(StandardCharsets.UTF_16BE)));
This is the output I get:
[-2, -1, 0, 104, 0, 116, 0, 116, 0, 112, 0, 58, 0, 47, 0, 47, 0, 115, 0, 116, 0, 97, 0, 99, 0, 107, 0, 111, 0, 118, 0, 101, 0, 114, 0, 102, 0, 108, 0, 111, 0, 119, 0, 46, 0, 99, 0, 111, 0, 109, 0, 47, 0, 113, 0, 117, 0, 101, 0, 115, 0, 116, 0, 105, 0, 111, 0, 110, 0, 115, 0, 47, 0, 57, 0, 56, 0, 50, 0, 55, 0, 49, 0, 52, 0, 51, 0, 47, 0, 99, 0, 111, 0, 110, 0, 116, 0, 105, 0, 110, 0, 117, 0, 105, 0, 110, 0, 103, 0, 45, 0, 101, 0, 120, 0, 101, 0, 99, 0, 117, 0, 116, 0, 105, 0, 111, 0, 110, 0, 45, 0, 97, 0, 102, 0, 116, 0, 101, 0, 114, 0, 45, 0, 97, 0, 110, 0, 45, 0, 101, 0, 120, 0, 99, 0, 101, 0, 112, 0, 116, 0, 105, 0, 111, 0, 110, 0, 45, 0, 105, 0, 115, 0, 45, 0, 116, 0, 104, 0, 114, 0, 111, 0, 119, 0, 110, 0, 45, 0, 105, 0, 110, 0, 45, 0, 106, 0, 97, 0, 118, 0, 97]
The first two bytes are the unicode BOM character. Be careful where you get your strings from. If you export your CSV from Excel, and the file contains only URLs, try to export it as ASCII only.