I have a resource (a static html page), that I wanna use to test. But, when I get the static page, it comes with some characters encoding. I try with the class StringEscapeUtils but it doesn't work. My function:
private HtmlPage getStaticPage() throws IOException, ClassNotFoundException {
final Reader reader = new InputStreamReader(this.getClass().getResourceAsStream("/" + "testPage" + ".html"), "UTF-8");
final StringWebResponse response = new StringWebResponse(StringEscapeUtils.unescapeHtml4(IOUtils.toString(reader)), StandardCharsets.UTF_8, new URL(URL_PAGE));
return HTMLParser.parseHtml(response, WebClientFactory.getInstance().getCurrentWindow());
}
import org.apache.commons.lang3.StringEscapeUtils;
final Reader reader = new InputStreamReader(this.getClass().getResourceAsStream("/" + "testPage" + ".html"), "UTF-8");
For the reader use the encoding of the file (from your comment i guess this is windows-1252 in your case). Then read the file into an string (e.g. use commons.io).
Then you can process it like this
final StringWebResponse tmpResponse = new StringWebResponse(anHtmlCode,
new URL("http://www.wetator.org/test.html"));
final WebClient tmpWebClient = new WebClient(aBrowserVersion);
try {
final HtmlPage tmpPage = HTMLParser.parseHtml(tmpResponse, tmpWebClient.getCurrentWindow());
return tmpPage;
} finally {
tmpWebClient.close();
}
If you still have problem please make a simple sample out of your page that shows your problem and upload it here together with your code.