javapdfpdfboxacrofields

How to retrieve only the visible (inside of border) text of PDField?


I'm working on a Java project using Apache PDFBox 3.0.3. I want to display a string in a PDField of an AcroForm. The field allows multi-line text and about 9 lines of text can fit in it.

The problem is that the length of the given string can be even longer than that. In that case, I want to add an empty copy of the same page to the document and continue to fill the new PDField with the part of the given string that did not fit into the first field.

Adding new pages is already working, but not the second part, because I cannot manage to let the program know, where the text in the first field ended.

My idea is to check which part of the string is visible in the PDField and to cut this part out of the given string. Then I would put the part of the string which is left into the next PDField an so on.

The problem is that the first PDField does actually contain all of the text as a value, but only the first 9 lines are visible (the rest is outside of the field's borders).

When I execute the following line, I get the whole text of the field (also the text which is outside of the field's borders and therefore invisible), which is worthless because I want to get only the visible part:

acroForm.getField("fieldname").getValueAsString();

How can I retrieve only the visible part of the text in a PDField?

Setting limits for the number of characters is not an option, because it ignores line breaks etc.


Solution

  • I finally found a solution for retrieving only the visible part of a PDField.

    //Flatten the AcroForm first to remove the PDField but keep the text.
    acroForm.flatten()
    //Get the page of the document where the text is located.
    PDPage page = document.getPage(0);
    //Create a PDFTextStripperByArea.
    PDFTextStripperByArea stripper = new PDFTextStripperByArea();
    //Create a Rectangle2D. All the text which is located within the rectangle will be
    //included in the stripper. So put the rectangle excactly above the visible part of
    //the text. The params for the coordinates are (x, y, width, height). You likely need
    //to experiment a bit to find the right values.
    Rectangle2D rectangle2D = new Rectangle2D.Float(50, 525, 500, 140);
    //Add the rectangle as a region for the stripper.
    stripper.addRegion("region", rectangle2D);
    //Extract the region.
    stripper.extractRegions(page);
    //Get the text.
    String text = stripper.getTextForRegion("region");