IText prevent row breaking across multiple pages using XML Worker

we are using iText 5.5.7 with XML Worker and have encountered an issue with long tables where rows that run off the end of the page are split in two over to the next page (see image).

We have tried using page-break-inside:avoid; as suggested in Prevent page break in text block with iText, XMLWorker and iText Cut between pages in PDF from HTML table but to no effect.

we have tried

wrapping each row in a <tbody> and applying page break avoid (no effect)
targeting tr, td and applying page break (no effect)
wrapping the contents of each td in a div and applying page break (itext stops processing rows once it gets to end of page)

We are under the impression page-break-inside:avoid is supported but have yet to see confirmation of this. Is there an example or best practice for creating this effect using XML worker or is the Java api needed to do this level of manipulation?

cheers

Rows currently splitting across pages:

Desired effect: Rows with too much data wrap to next page

Solution

.NET developer, but you should be able to easily translate the following C# code.

Anytime the default XML Worker implementation doesn't meet your needs, you're basically left with an exercise in looking through the source code. First, see if the XML Worker supports the tag you want in the Tags class. There's a nice implementation for <table> that supports the page-break-inside:avoid style, but it only works at the <table> level, not the row <tr> level. Luckily, it's not that much work to override the End() method for Table.

If the tag is not supported, you need to roll your own custom tag processor by inheriting from AbstractTagProcessor, but not going there for this answer.

Anyway, on to the code. Instead of blowing away the default implementation by changing the behavior of the page-break-inside:avoid style, we can use a custom HTML attribute and have the best of both worlds:

public class TableProcessor : Table
{
    // custom HTML attribute to keep <tr> on same page if possible
    public const string NO_ROW_SPLIT = "no-row-split";
    public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
    {
        IList<IElement> result = base.End(ctx, tag, currentContent);
        var table = (PdfPTable)result[0];

        if (tag.Attributes.ContainsKey(NO_ROW_SPLIT))
        {
            // if not set,  table **may** be forwarded to next page
            table.KeepTogether = false;
            // next two properties keep <tr> together if possible
            table.SplitRows = true;
            table.SplitLate = true;
        }
        return new List<IElement>() { table };
    }
}

And a simple method to generate some test HTML:

public string GetHtml()
{
    var html = new StringBuilder();
    var repeatCount = 15;
    for (int i = 0; i < repeatCount; ++i) { html.Append("<h1>h1</h1>"); }

    var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer vestibulum sollicitudin luctus. Curabitur at eros bibendum, porta risus a, luctus justo. Phasellus in libero vulputate, fermentum ante nec, mattis magna. Nunc viverra viverra sem, et pulvinar urna accumsan in. Quisque ultrices commodo mauris, et convallis magna. Duis consectetur nisi non ultrices dignissim. Aenean imperdiet consequat magna, ac ornare magna suscipit ac. Integer fermentum velit vitae porttitor vestibulum. Morbi iaculis sed massa nec ultricies. Aliquam efficitur finibus dolor, et vulputate turpis pretium vitae. In lobortis lacus diam, ut varius tellus varius sed. Integer pulvinar, massa quis feugiat pulvinar, tortor nisi bibendum libero, eu molestie est sapien quis odio. Lorem ipsum dolor sit amet, consectetur adipiscing elit.";

    // default iTextSharp.tool.xml.html.table.Table (AbstractTagProcessor)
    // is at the <table>, **not <tr> level
    html.Append("<table style='page-break-inside:avoid;'>");
    html.AppendFormat(
        @"<tr><td style='border:1px solid #000;'>DEFAULT IMPLEMENTATION</td>
            <td style='border:1px solid #000;'>{0}</td></tr>",
        text
    );
    html.Append("</table>");

    // overriden implementation uses a custom HTML attribute to keep:
    // <tr> together - see TableProcessor
    html.AppendFormat("<table {0}>", TableProcessor.NO_ROW_SPLIT);
    for (int i = 0; i < repeatCount; ++i)
    {
        html.AppendFormat(
            @"<tr><td style='border:1px solid #000;'>{0}</td>
            <td style='border:1px solid #000;'>{1}</td></tr>",
            i, text
        );
    }
    html.Append("</table>");
    return html.ToString();
}

Finally the parsing code:

using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
{
    using (var document = new Document())
    {
        PdfWriter writer = PdfWriter.GetInstance(
            document, stream
        );
        document.Open();

        // instantiate custom tag processor and add to `HtmlPipelineContext`.
        var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
        tagProcessorFactory.AddProcessor(
            new TableProcessor(),
            new string[] { HTML.Tag.TABLE }
        );
        var htmlPipelineContext = new HtmlPipelineContext(null);
        htmlPipelineContext.SetTagFactory(tagProcessorFactory);

        var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
        var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);

        var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
        var cssResolverPipeline = new CssResolverPipeline(
            cssResolver, htmlPipeline
        );

        var worker = new XMLWorker(cssResolverPipeline, true);
        var parser = new XMLParser(worker);
        using (var stringReader = new StringReader(GetHtml()))
        {
            parser.Parse(stringReader);
        }
    }
}

Full source.

The default implementation is maintained - first <table> is kept together instead of being split over two pages:

And the custom implementation keeps rows together in the second <table>: