javaservletsflying-saucerhtmlcleaner

Generate PDF file in an appropriate format


For my use, I created a PDF file using flying-saucer library. It was a legacy HTML so I cleaned out the XHTML using HTMLCleaner library.

After this I serialize the XML as string then pass it to the iText module of flying-saucer to render it and subsequently create the PDF.

This PDF I place it in the OutputStream. After the response is committed I get a dialog asking to save or open it. However it does not get saved as PDF file. I have to right-click and open it in Adobe or any PDF reader.

How do I make it display in the PDF reader. And make the file be saved as .pdf file. What would be an effective and user-friendly way to handle this issue? Help as always will be greatly appreciated!

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import java.io.StringBufferInputStream;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.htmlcleaner.CleanerProperties;
import org.htmlcleaner.DomSerializer;
import org.htmlcleaner.HtmlCleaner;
import org.htmlcleaner.PrettyXmlSerializer;
import org.htmlcleaner.TagNode;
import org.htmlcleaner.XmlSerializer;
import org.w3c.dom.Document;
import org.xhtmlrenderer.pdf.ITextRenderer;
import org.xhtmlrenderer.resource.XMLResource;

public class MyPDF extends HttpServlet {


public MyPDF() {
    super();
}


public void destroy() {
    super.destroy(); 
}


public void doGet(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {

    doPost(request, response);
}


public void doPost(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {

    response.setContentType("text/pdf");

    String html = request.getParameter("source");

    try 
    {
        HtmlCleaner cleaner = new HtmlCleaner();
        CleanerProperties props = cleaner.getProperties();

        TagNode node = cleaner.clean(html);

        //String content = "<" + node.getName() + ">" + cleaner.getInnerHtml(node) + "</" + node.getName() + ">";
        //System.out.println("content " +content);

        OutputStream os = response.getOutputStream();
        System.out.println("encoding " +response.getCharacterEncoding());

        final XmlSerializer xmlSerializer = new PrettyXmlSerializer(props);
        final String html1 = xmlSerializer.getAsString(node);

        ITextRenderer renderer = new ITextRenderer();
        renderer.setDocumentFromString(html1);
        renderer.layout();

        renderer.createPDF(os);
        os.close();
    } 
    catch (Exception ex) 
    {
        ex.printStackTrace();
    }


}


public void init() throws ServletException {

}

}

Solution

  • Your MIME type is incorrect for PDF. It should be application/pdf.

    Change

    response.setContentType("text/pdf");
    

    to

    response.setContentType("application/pdf");
    

    See https://www.rfc-editor.org/rfc/rfc3778 for the RFC for the PDF MIME type.

    Edit: Totally overlooked the "Save as .pdf" question. You'll also need to add something like:

    response.setHeader("content-disposition", "attachment; filename=yourFileName.pdf");
    

    to tell the browser what the default file name should be.