javajsouptextarealine-breaks

Preserving text area line breaks using Jsoup


I have an HTML string that includes a text area with multi line content such as below:

<textarea id="textbox" name="textbox">*This is line 1
*This is line 2
*This is line 3
</textarea>

I am attempting to parse the html string using Jsoup, and return the contents of this text area with line breaks preserved.

As per https://www.baeldung.com/jsoup-line-breaks#:~:text=Jsoup%20removes%20the%20newline%20character,Jsoup%20and%20disable%20pretty%2Dprint, I am disabling pretty print to ensure newline characters are not replaced with white space.

However when I attempt to run the method below:

private void printTextbox(String htmlStr){
    final Document doc = Jsoup.parse(htmlStr, "UTF-8");
    doc.outputSettings().prettyPrint(false);
    System.out.println(doc.select("#textbox").val());
}

I am still getting a single line string returned:

*This is line 1*This is line 2*This is line 3

How can I preserve the line breaks?


Solution

  • This was actually caused by my implementation of an HTTP request library which was not preserving line breaks when processing the response body containing the HTML string.

    After switching

    public String processResponseBody(HttpURLConnection con) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer content = new StringBuffer();
        while ((inputLine = in.readLine()) != null) {
            content.append(inputLine);
        }
        in.close();
        return ontent.toString();
    }
    

    to

    public String processResponseBody(HttpURLConnection con) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
        String inputLine;
        StringBuffer content = new StringBuffer();
        while ((inputLine = in.readLine()) != null) {
            content.append(inputLine).append("\n");
        }
        in.close();
        return ontent.toString();
    }
    

    the line breaks are preserved in the HTML string being processed by Jsoup. There is also no need to modify the output settings, so doc.outputSettings().prettyPrint(false); can be removed from the earlier code