I am using Apache PDFBox to create a very simple pdf with one line of text with conformance to PDFA 2b and I want to use VeraPDF to check this pdf for conformance. Vera is telling me, that the pdf is not compliant and shows me two failed assertions:
TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.6.2.1, testNumber=1], status=failed, message=The Catalog dictionary of a conforming file shall contain the Metadata key whose value is a metadata stream as defined in ISO 32000-1:2008, 14.3.2., location=Location [level=CosDocument, context=root/document[0]], locationContext=null, errorMessage=null]
TestAssertion [ruleId=RuleId [specification=ISO 19005-2:2011, clause=6.2.4.3, testNumber=4], status=failed, message=DeviceGray shall only be used if a device independent DefaultGray colour space has been set when the DeviceGray colour space is used, or if a PDF/A OutputIntent is present., location=Location [level=CosDocument, context=root/document[0]/pages[0](4 0 obj PDPage)/contentStream[0](6 0 obj PDContentStream)/operators[3]/fillCS[0]], locationContext=null, errorMessage=null]
My code looks something like this:
try (ByteArrayOutputStream baos = new ByteArrayOutputStream(); PDDocument document = new PDDocument(); COSStream cosStream = new COSStream()) {
PDPage page = new PDPage();
document.addPage(page);
PDDocumentInformation documentInformation = new PDDocumentInformation();
documentInformation.setTitle("Name");
documentInformation.setCreator("Creator");
documentInformation.setSubject("Subject");
document.setDocumentInformation(documentInformation);
try (ByteArrayOutputStream xmpOutputStream = new ByteArrayOutputStream(); OutputStream cosXMPStream = cosStream.createOutputStream()) {
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
PDFAIdentificationSchema pdfaSchema = xmp.createAndAddPFAIdentificationSchema();
pdfaSchema.setPart(2);
pdfaSchema.setConformance("B");
DublinCoreSchema dublinCoreSchema = xmp.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle("Name");
dublinCoreSchema.addCreator("Creator");
dublinCoreSchema.setDescription("Subject");
XMPBasicSchema basicSchema = xmp.createAndAddXMPBasicSchema();
Calendar creationDate = Calendar.getInstance();
basicSchema.setCreateDate(creationDate);
basicSchema.setModifyDate(creationDate);
basicSchema.setMetadataDate(creationDate);
basicSchema.setCreatorTool("Creator Tool");
new XmpSerializer().serialize(xmp, xmpOutputStream, true);
cosXMPStream.write(xmpOutputStream.toByteArray());
document.getDocumentCatalog().setMetadata(new PDMetadata(cosStream));
}
PDViewerPreferences prefs = new PDViewerPreferences(page.getCOSObject());
prefs.setDisplayDocTitle(true);
document.getDocumentCatalog().setViewerPreferences(prefs);
File fontFile = new File("C:\\Windows\\Fonts\\arial.ttf");
PDType0Font font = PDType0Font.load(document, fontFile);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(100, 700);
contentStream.showText("Hello PDF/A-2b World!");
contentStream.endText();
contentStream.close();
document.save(baos);
try (PDFAParser parser = Foundries.defaultInstance().createParser(new ByteArrayInputStream(baos.toByteArray()), PDFAFlavour.PDFA_2_B)) {
PDFAValidator validator = Foundries.defaultInstance().createValidator(PDFAFlavour.PDFA_2_B, false);
ValidationResult result = validator.validate(parser);
System.out.println(result.isCompliant());
}
}
When I inspect the generated PDF with debugger-app-2.0.31.jar, I can find the metadata. When I compare the metadata with a pdf file from the regression test from VeraPDF (eg. this one), the only difference that seems relevant to me is in the begin="" tag. It is empty in the vera test file <?xpacket begin=''
and it seems to contain the BOM Start Sequence in the file created by pdfbox <?xpacket begin=""
.
Is someone able to tell me, if this is an error in VeraPDF or in PDFBox? Is there a solution for this problem? Can someone explain the second error to me and offer an solution?
The CreatePDFA example from the source code does the metadata part slightly differently although yours looks ok (oops no, see update), and I was able to validate it with VeraPDF:
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
The second problem is the missing output intent. Add this code:
// sRGB output intent
InputStream colorProfile = CreatePDFA.class.getResourceAsStream(
"/org/apache/pdfbox/resources/pdfa/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(doc, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
doc.getDocumentCatalog().addOutputIntent(intent);
About the PDFMergerExample
and your original code:
That example and you used new PDMetadata(cosStream)
. This constructor doesn't add two mandatory dictionary entries. Add this to your code:
cosStream.setName(COSName.TYPE, "Metadata");
cosStream.setName(COSName.SUBTYPE, "XML");