I am trying to write an .xslt file (as input for Apache FOP) whose purpose is to generate an "accessible" document (in our case that means: the generated PDF file must pass the checks done by PAC - the PDF accessibility checker).
My file is still practically in a "hello world"-state:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
xml:lang="en_US">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<fo:root font-family="Arial">
<fo:layout-master-set>
<fo:simple-page-master
master-name="A4-portrait" page-height="29.7cm" page-width="21.0cm">
<fo:region-body region-name="main-content" margin="2cm" />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4-portrait">
<fo:flow flow-name="main-content">
<fo:block font-family="Arial">Hello, <xsl:value-of select="data/name" />! This is some sample text.</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
but I am already getting a couple of "accessibility errors".
One is "Text object not tagged". The text object referenced is the fo:block containg the "Hello ..." text. I googled around for quite some time but I found no helpful explanation or description or any guidance what has to be done in such an .fo/.xslt file to create a tagged element in the resulting PDF.
Does some kind soul have an idea or a good link or description on that?
Later addition:
to force FOP to locate the required fonts one has to add snippets like this to the conf\fop.xconf
-file:
<fonts>
...
<font kerning="yes" embed-url="file:///c:/Windows/Fonts/arial.ttf" embedding-mode="subset">
<font-triplet name="Arial" style="normal" weight="normal"/>
</font>
...
</fonts>
Note that these URLs - when pointing to lcoal files - have to contain the file:
-protcocol and they need to contain a drive specifier (here: c:
). They didn't work for me without these.
In order to add accessibility information to the resulting PDF, you can use the role
attribute on the FO elements to specify the structural / semantical function of the contained text.
From FOP's Accessibility page:
The PDF Reference defines a set of standard Structure Types to tag content. For example, ‘P’ is used for identifying paragraphs, ‘H1’ to ‘H6’ for headers, ‘L’ for lists, ‘Div’ for block-level groups of elements, etc. This standard set is aimed at improving interoperability between applications producing or consuming PDF.
FOP provides a default mapping of Formatting Objects to elements from that standard set. For example, fo:page-sequence is mapped to ‘Part’, fo:block is mapped to ‘P’, fo:list-block to ‘L’, etc.
You may want to customize that mapping to improve the accuracy of the tagging or deal with particular FO constructs. For example, you may want to make use of the ‘H1’ to ‘H6’ tags to make the hierarchical structure of the document appear in the PDF. This is achieved by using the
role
XSL-FO property:
The link to the PDF Reference seems to be broken, so here are some working links:
role
Note that, even you don't use the role
attribute, FOP should be using a default value (for example, P
for fo:block
elements). So, if the accessibility checker gives you a warning, maybe there is something else wrong:
-a
command line option, setting userAgent.setAccessibility(true)
in the Java code, or adding <accessibility>true</accessibility>
in the fop.xconf configuration file)-pdfprofile PDF/A-1a
command line option, userAgent.getRendererOptions().put("pdf-a-mode", "PDF/A-1a")
Java instruction, or <pdf-a-mode>PDF/A-1a</pdf-a-mode>
element in the configuration file)