I'm using saxonche version 12.3.0 installed using pip in Python 3.11.5.
When using collection()
in XSLT, document-uri()
doesn't seem to be working.
I've created a small test:
Python
import saxonche
with saxonche.PySaxonProcessor(license=False) as proc:
xsltproc = proc.new_xslt30_processor()
executable = xsltproc.compile_stylesheet(stylesheet_file="test.xsl")
content = executable.call_template_returning_string()
print(f"\ncontent:\n{content}")
XSLT 3.0 (test.xsl)
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" expand-text="yes">
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template name="xsl:initial-template">
<xsl:for-each select="collection('input?select=*.(xml|XML)&content-type=application/xml')">
<xsl:message>Processing "{document-uri()}"...</xsl:message>
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Output
Processing ""...
Processing ""...
Processing ""...
content:
<doc>First test XML instance.</doc>
<doc>Second test XML instance.</doc>
<doc>Third test XML instance.</doc>
Expected Output
Processing "file:/C:/test_saxonche_collection/input/test_01.xml"...
Processing "file:/C:/test_saxonche_collection/input/test_02.xml"...
Processing "file:/C:/test_saxonche_collection/input/test_03.xml"...
content:
<doc>First test XML instance.</doc>
<doc>Second test XML instance.</doc>
<doc>Third test XML instance.</doc>
Is this a bug in Saxon?
This is not a bug in Saxon.
The issue is that by default the collection is not stable1 and documents read using collection()
are only added to the document pool if the collection is stable.
From the docs:
Documents read using the collection() function are added to the document pool (and therefore have a document-uri() property) if and only if the collection is stable.
I was able to resolve this two different ways.
The first was by adding the stable=yes
query param to my collection()
call:
collection('input?select=*.(xml|XML)&content-type=application/xml&stable=yes')
The second was by using base-uri()
instead of document-uri()
:
<xsl:message>Processing "{base-uri()}"...</xsl:message>
I'm going to stick with using base-uri()
since there are performance concerns making a collection stable. (Per the docs linked above.)
Additional Info
1: For more information on what a "stable" collection is, take a look at the the Saxon documentation. I've summarized a couple of key points here:
From Collection catalogs:
If stable="false" is specified, however, the URI is dereferenced directly, and the document is not added to the document pool, which means that a subsequent retrieval of the same document will not return the same node [emphasis added].
From Directories as collections:
Making a collection stable has the effect that the entire result of the collection() function is retained in a cache for the duration of the query or transformation, and any further calls on collection() with the same absolute URI return this saved collection retrieved from this cache.