xmlxml-parsingsaxonapache-fop

"Content is not allowed in prolog" error yet nothing before XML declaration


First of all I have already exhaustively checked the following questions and this issue does not seem to be the same thing:

These all seem to boil down to 2 things:

  1. There is one or more (possibly invisible) chars before the opening <?xml?> tag.
  2. There is some byte sequence in the body that does not fit the encoding defined in the <?xml?> tag.

Well as for #1 I checked my file with xxd, result is shown here:

$ xxd sample.fo
00000000: 3c3f 786d 6c20 7665 7273 696f 6e3d 2231  <?xml version="1
00000010: 2e30 2220 656e 636f 6469 6e67 3d22 5554  .0" encoding="UT
00000020: 462d 3822 3f3e 5465 7374 206d 6174 6572  F-8"?>Test mater
00000030: 6961 6c20 6963 6f6e 7354 6869 7320 746f  ial iconsThis to
00000040: 7069 6320 7465 7374 7320 7468 6520 4d61  pic tests the Ma
00000050: 7465 7269 616c 2049 636f 6e73 2e52 4544  terial Icons.RED
00000060: 434f 4d20 4c61 626f 7261 746f 7269 6573  COM Laboratories
00000070: 2c20 496e 632e 0a20 2020 2020 2020 2020  , Inc..
00000080: 2020 2020 2020 2020 2020 2054 6865 7365             These
00000090: 2061 7265 2074 6865 2074 6573 7473 2066   are the tests f
000000a0: 6f72 2074 6865 204d 4920 4449 5441 3a0a  or the MI DITA:.
000000b0: 2020 2020 2020 2020 2020 2020 2020 2020
000000c0: 2020 2020 2020 2020 5465 7374 2074 6865          Test the
000000d0: 2022 6b65 7962 6f61 7264 5f61 7272 6f77   "keyboard_arrow
000000e0: 5f64 6f77 6e22 2069 636f 6e2e 5465 7374  _down" icon.Test
000000f0: 2074 6865 206d 6972 726f 722d 696d 6167   the mirror-imag
00000100: 6520 2272 6570 6c79 2220 6963 6f6e 2e54  e "reply" icon.T
00000110: 6865 2069 636f 6e73 2061 7265 2072 656e  he icons are ren
00000120: 6465 7265 6420 696e 2074 6865 204d 6174  dered in the Mat
00000130: 6572 6961 6c49 636f 6e73 2066 6f6e 742e  erialIcons font.
00000140: 0a09 0954 6573 7420 2331 3a43 6c69 636b  ...Test #1:Click
00000150: 2074 6865 203c 666f 3a69 6e6c 696e 6520   the <fo:inline
00000160: 786d 6c6e 733a 666f 3d22 6874 7470 3a2f  xmlns:fo="http:/
00000170: 2f77 7777 2e77 332e 6f72 672f 3139 3939  /www.w3.org/1999
00000180: 2f58 534c 2f46 6f72 6d61 7422 2066 6f6e  /XSL/Format" fon
00000190: 742d 7765 6967 6874 3d22 626f 6c64 2220  t-weight="bold"
000001a0: 6c69 6e65 2d68 6569 6768 743d 2231 3030  line-height="100
000001b0: 2522 3e3c 666f 3a69 6e6c 696e 6520 786d  %"><fo:inline xm
000001c0: 6c6e 733a 6178 663d 2268 7474 703a 2f2f  lns:axf="http://
000001d0: 7777 772e 616e 7465 6e6e 6168 6f75 7365  www.antennahouse
000001e0: 2e63 6f6d 2f6e 616d 6573 2f58 534c 2f45  .com/names/XSL/E
000001f0: 7874 656e 7369 6f6e 7322 2066 6f6e 742d  xtensions" font-
00000200: 6661 6d69 6c79 3d22 4d61 7465 7269 616c  family="Material
00000210: 4963 6f6e 7322 3eee 8c93 3c2f 666f 3a69  Icons">...</fo:i
00000220: 6e6c 696e 653e 3c2f 666f 3a69 6e6c 696e  nline></fo:inlin
00000230: 653e 2069 636f 6e2e 436c 6963 6b20 7468  e> icon.Click th
00000240: 6520 3c66 6f3a 696e 6c69 6e65 2078 6d6c  e <fo:inline xml
00000250: 6e73 3a66 6f3d 2268 7474 703a 2f2f 7777  ns:fo="http://ww
00000260: 772e 7733 2e6f 7267 2f31 3939 392f 5853  w.w3.org/1999/XS
00000270: 4c2f 466f 726d 6174 2220 666f 6e74 2d77  L/Format" font-w
00000280: 6569 6768 743d 2262 6f6c 6422 206c 696e  eight="bold" lin
00000290: 652d 6865 6967 6874 3d22 3130 3025 223e  e-height="100%">
000002a0: 3c66 6f3a 696e 6c69 6e65 2078 6d6c 6e73  <fo:inline xmlns
000002b0: 3a61 7866 3d22 6874 7470 3a2f 2f77 7777  :axf="http://www
000002c0: 2e61 6e74 656e 6e61 686f 7573 652e 636f  .antennahouse.co
000002d0: 6d2f 6e61 6d65 732f 5853 4c2f 4578 7465  m/names/XSL/Exte
000002e0: 6e73 696f 6e73 2220 666f 6e74 2d66 616d  nsions" font-fam
000002f0: 696c 793d 224d 6174 6572 6961 6c49 636f  ily="MaterialIco
00000300: 6e73 2220 6178 663a 7472 616e 7366 6f72  ns" axf:transfor
00000310: 6d3d 2273 6361 6c65 5828 2d31 2922 3eee  m="scaleX(-1)">.
00000320: 859e 3c2f 666f 3a69 6e6c 696e 653e 3c2f  ..</fo:inline></
00000330: 666f 3a69 6e6c 696e 653e 2069 636f 6e2e  fo:inline> icon.

As for #2, I checked with file:

$ file sample.fo
sample.fo: XML 1.0 document, UTF-8 Unicode text, with very long lines

I can only think of the two instances of the Material Icons font codepoints which are 3-byte UTF-8 characters and seem to be properly encoded, as verified online with this site:

  1. Icon "keyboard_arrow_down" is codepoint e313 which is encoded ee 8c 93
  2. Icon "reply" is codepoint e15e which is encoded ee 85 9e

As the xxd output indicates, my XML header seems valid:

<?xml version="1.0" encoding="UTF-8"?>

I also tried manually inserting a space after the encoding as suggested in one of the answers to the other questions:

<?xml version="1.0" encoding="UTF-8" ?>

which made no difference. So I am baffled as to the problem, especially given the error code given:

[Fatal Error] sample.fo:1:39: Content is not allowed in prolog.
Jul 24, 2018 9:56:34 AM org.apache.fop.cli.Main startFOP
SEVERE: Exception
org.apache.fop.apps.FOPException: org.xml.sax.SAXParseException; systemId: file:/tmp/sample.fo; lineNumber: 1; columnNumber: 39; Content is not allowed in prolog.
javax.xml.transform.TransformerException: org.xml.sax.SAXParseException; systemId: file:/tmp/sample.fo; lineNumber: 1; columnNumber: 39; Content is not allowed in prolog.
        at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:296)
        at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:116)
        at org.apache.fop.cli.Main.startFOP(Main.java:186)
        at org.apache.fop.cli.Main.main(Main.java:217)
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXParseException; systemId: file:/tmp/sample.fo; lineNumber: 1; columnNumber: 39; Content is not allowed in prolog.
        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:502)
        at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:293)
        ... 3 more
Caused by: org.xml.sax.SAXParseException; systemId: file:/tmp/sample.fo; lineNumber: 1; columnNumber: 39; Content is not allowed in prolog.
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:485)
        ... 4 more

Just for completeness, the FO file was generated by Saxon and the PDF attempt from fop-2.2:

$ fop -version
FOP Version 2.2
$ fop -fo sample.fo -pdf sample.pdf
[Fatal Error] sample.fo:1:39: Content is not allowed in prolog.
...

Solution

  • Elaborating on what @MartinHonnen has already helpfully commented...

    The error,

    Content is not allowed in prolog.

    arises because the XML prolog, which is everything before the root element in an XML document, has textual content that is not allowed. The error does not necessarily have to have occurred before the XML declaration.

    Specifically, the prolog in XML is defined in the context of an XML document:

    [1] document      ::= prolog element Misc*
    

    Note that prolog precedes element, the single root element of the XML document.

    Most answers focus on the problem where there is text (visible or invisible) at the beginning of the prolog, before the XML declaration, but note that non-whitespace text cannot appear anywhere within or after the prolog either:

    [22] prolog      ::= XMLDecl? Misc* (doctypedecl Misc*)?
    [23] XMLDecl     ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
    [24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
    [25] Eq          ::= S? '=' S?
    [26] VersionNum  ::= '1.' [0-9]+
    [27] Misc        ::= Comment | PI | S
    

    In your case, you have Test material... text content appearing between the XML declaration (XMLDecl) and the root element (element). A comment, processing instruction, or whitespace can appear there, but not text.