javapdfitextsuperscriptxmlworker

How do I convert HTML superscript tags to PDF using Itextpdf XML Worker?


I am using itextpdf version 5.5.6. I am pass the html containing superscript tag i.e. <sup>ABC</sup>along with other HTML content. But the text ABC appear as a normal text. Looks like superscript tag <sup> is escaped and ABC text appears as a normal text. Below is the code used for PDF generation using itextpdf.

CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory()); 
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);                                                           
byte[] byte1=htmlBufferForPDF.toString().getBytes("UTF-8");
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);    
ByteArrayInputStream stream = new ByteArrayInputStream(byte1);
p.parse(stream, Charset.forName("UTF-8"));

Any suggestions to solve this issue will be very helpful.

Thanks


Solution

  • The following works for me with iTextSharp / XML Worker 5.5.11 using the overloaded parseXHtml method and explicitly setting the CSS style.

    HTML:

    string HTML = @"
    <html><head>
    <title>Test HTML</title>
    </head><body>
    <div>The 1<sup>st</sup> day of the month</div>
    </body></html>
    ";
    

    Parsing code:

    string css = "sup { vertical-align: super; font-size: 0.8em; }";
    using (var stream = new MemoryStream())
    {
        using (var document = new Document())
        {
            PdfWriter writer = PdfWriter.GetInstance(document, stream);
            document.Open();
            using (var htmlStream = new MemoryStream(Encoding.UTF8.GetBytes(HTML)))
            {
                using (var cssStream = new MemoryStream(Encoding.UTF8.GetBytes(css)))
                {
                    XMLWorkerHelper.GetInstance().ParseXHtml(
                        writer, document, htmlStream, cssStream
                    );
                }
            }
        }
        File.WriteAllBytes(OUTPUT, stream.ToArray());
    }
    

    Output:

    enter image description here