I'm given a document of A4 pages with 8 A7 sections on each page. I need to extract the data from each A7 area of each page because they're related.
Is it possible to break each A4 in 8 A7 and go through the data?
This is the PDF file I'm dealing with: https://s3.us-east-2.amazonaws.com/s3.barcodegen-website.io/programada+pdf+teste.pdf
(Regarding A4/A7 paper sizes, see ISO 216 at Wikipedia.)
Splitting PDF pages raises a number of secondary issues like what will you do with half a "glyph" (or half a hyperlink) Thus internal hyperlinks will usually be discarded but perhaps externals need keeping.
We need to test for duplication of resources so a source A4 of 526 KB (539,607 bytes) may actually become slightly different as 537 KB (550,093 bytes) which sometimes is oddly smaller but here only slightly larger!
Using an image approach is not acceptable as clearly at this scale the Bar codes are likely to be destroyed.
Image Left (Notice the bad infill), Vector Right is accurate for scanning.
Cropped duplication is not always a good solution as there can be overlapping contents per page. However in this case that can be broken by a decimation into 4 x 2 pages, Seen here in facing pairs. We may also see at that stage the offsets vary and are not perfect for such splitting. Thus the source positions either need alter or the page boundary sliding in different directions.
Corrected Result as seen in Acrobat Reader etc.
mutool poster -x 4 -y 2 -r programada.pdf output.pdf
Nearest to desired cropping is
cpdf -shift-boxes "-20 0" TOTVS.pdf -o tempout1.pdf
cpdf -chop "4 2" tempout1.pdf -o tempout2.pdf
mutool trim -b MediaBox -o final.pdf tempout2.pdf
or
cpdf -shift-boxes "-20 0" TOTVS.pdf -o tempout1.pdf
mutool poster -x 4 -y 2 -r tempout1.pdf tempout2.pdf
mutool trim -b MediaBox -o final.pdf tempout2.pdf
These should produce similar cleaner A7 size pages.