we are using iText 5.5.7 with XML Worker and have encountered an issue with long tables where rows that run off the end of the page are split in two over to the next page (see image).
We have tried using page-break-inside:avoid;
as suggested in Prevent page break in text block with iText, XMLWorker and iText Cut between pages in PDF from HTML table but to no effect.
we have tried
<tbody>
and applying page break avoid (no effect) tr, td
and applying page break (no effect)td
in a div
and applying page break (itext stops processing rows once it gets to end of page)We are under the impression page-break-inside:avoid
is supported but have yet to see confirmation of this. Is there an example or best practice for creating this effect using XML worker or is the Java api needed to do this level of manipulation?
cheers
Rows currently splitting across pages:
.NET developer, but you should be able to easily translate the following C# code.
Anytime the default XML Worker implementation doesn't meet your needs, you're basically left with an exercise in looking through the source code. First, see if the XML Worker
supports the tag you want in the Tags class. There's a nice implementation for <table>
that supports the page-break-inside:avoid
style, but it only works at the <table>
level, not the row <tr>
level. Luckily, it's not that much work to override the End()
method for Table.
If the tag is not supported, you need to roll your own custom tag processor by inheriting from AbstractTagProcessor, but not going there for this answer.
Anyway, on to the code. Instead of blowing away the default implementation by changing the behavior of the page-break-inside:avoid
style, we can use a custom HTML
attribute and have the best of both worlds:
public class TableProcessor : Table
{
// custom HTML attribute to keep <tr> on same page if possible
public const string NO_ROW_SPLIT = "no-row-split";
public override IList<IElement> End(IWorkerContext ctx, Tag tag, IList<IElement> currentContent)
{
IList<IElement> result = base.End(ctx, tag, currentContent);
var table = (PdfPTable)result[0];
if (tag.Attributes.ContainsKey(NO_ROW_SPLIT))
{
// if not set, table **may** be forwarded to next page
table.KeepTogether = false;
// next two properties keep <tr> together if possible
table.SplitRows = true;
table.SplitLate = true;
}
return new List<IElement>() { table };
}
}
And a simple method to generate some test HTML
:
public string GetHtml()
{
var html = new StringBuilder();
var repeatCount = 15;
for (int i = 0; i < repeatCount; ++i) { html.Append("<h1>h1</h1>"); }
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer vestibulum sollicitudin luctus. Curabitur at eros bibendum, porta risus a, luctus justo. Phasellus in libero vulputate, fermentum ante nec, mattis magna. Nunc viverra viverra sem, et pulvinar urna accumsan in. Quisque ultrices commodo mauris, et convallis magna. Duis consectetur nisi non ultrices dignissim. Aenean imperdiet consequat magna, ac ornare magna suscipit ac. Integer fermentum velit vitae porttitor vestibulum. Morbi iaculis sed massa nec ultricies. Aliquam efficitur finibus dolor, et vulputate turpis pretium vitae. In lobortis lacus diam, ut varius tellus varius sed. Integer pulvinar, massa quis feugiat pulvinar, tortor nisi bibendum libero, eu molestie est sapien quis odio. Lorem ipsum dolor sit amet, consectetur adipiscing elit.";
// default iTextSharp.tool.xml.html.table.Table (AbstractTagProcessor)
// is at the <table>, **not <tr> level
html.Append("<table style='page-break-inside:avoid;'>");
html.AppendFormat(
@"<tr><td style='border:1px solid #000;'>DEFAULT IMPLEMENTATION</td>
<td style='border:1px solid #000;'>{0}</td></tr>",
text
);
html.Append("</table>");
// overriden implementation uses a custom HTML attribute to keep:
// <tr> together - see TableProcessor
html.AppendFormat("<table {0}>", TableProcessor.NO_ROW_SPLIT);
for (int i = 0; i < repeatCount; ++i)
{
html.AppendFormat(
@"<tr><td style='border:1px solid #000;'>{0}</td>
<td style='border:1px solid #000;'>{1}</td></tr>",
i, text
);
}
html.Append("</table>");
return html.ToString();
}
Finally the parsing code:
using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
{
using (var document = new Document())
{
PdfWriter writer = PdfWriter.GetInstance(
document, stream
);
document.Open();
// instantiate custom tag processor and add to `HtmlPipelineContext`.
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
tagProcessorFactory.AddProcessor(
new TableProcessor(),
new string[] { HTML.Tag.TABLE }
);
var htmlPipelineContext = new HtmlPipelineContext(null);
htmlPipelineContext.SetTagFactory(tagProcessorFactory);
var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
var cssResolverPipeline = new CssResolverPipeline(
cssResolver, htmlPipeline
);
var worker = new XMLWorker(cssResolverPipeline, true);
var parser = new XMLParser(worker);
using (var stringReader = new StringReader(GetHtml()))
{
parser.Parse(stringReader);
}
}
}
The default implementation is maintained - first <table>
is kept together instead of being split over two pages:
And the custom implementation keeps rows together in the second <table>
: