I need to determine document page information from a postscript or a pcl file. Preferably in Java, but Ghostscript/Ghostpcl is as good as well.
What I tried to get the following info:
Page color
This can be achieved with ghostscript/ghostpcl using the device called inkcov.
PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.ps
PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=inkcov -o- input.pcl
Page size
There is a device called bbox which gives me the boundary box per page for PostScript or PCL6 documents
PostScript
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.ps
PCL6
gpcl6win64 -dNOPAUSE -dBATCH -sDEVICE=bbox -o- input.pcl
But in the end the boundary box is an inaccurate approximation for the page size. I checked the following post, but the solution seems not to work with my ghostscript version 9.5 Getting the page sizes of a PostScript document
The bbox device should provide accurate information, in what way is it inaccurate?
You need to bear in mind that it's possible some objects (eg images) might mark the page with white space. That still counts as marking the page for the purposes of the bbox device. If you want to only count non-white output samples, then you need to render the document (at the final resolution you intend to use) and actually count the non-white pixels. That's a potentially very slow operation because it needs to read every output colour sample of what could be a very large image.
It's not hard to code though, and you could use the inkcov device as a basis for doing both operations in the same pass.
Or you could just have GhostPDL deliver the rendered bitmap for you and code a solution to the bounding box using some other tool/language.
Ah, are you actually looking for the requested media size, rather than the Bounding Box ? That's not the same thing at all. The bounding box returns the smallest rectangle which encloses all the marks on the output, it doesn't tell you how big the requested media was. So a small rectangle in the bottom left would give you a tiny BBox, even if the media was large.
You can reasonably easily get the media size requests from PostScript by writing a small PostScript program, but you can't do that with PCL. Perhaps the easiest solution in both cases is to render the content to a file at 72 dpi, then read the width/height of the rendered output and that gives you the media size in points.
Or use the pdfwrite device to convert the input into PDF and then the pdf_info.ps PostScript program can be used to give you the sizes of the pages from the PDF file.