I have a large sequence of maps: 15,000 rows of 85 columns, initially created from sql:
<xsl:variable name="original-sequence" select="sql:prepared-query($connection, $sql)()"/>
It's pretty fast to create for its size, taking about 200ms. I then extend it with an additional key, by looping through each map in the sequence and doing a map:merge((., map{'newcol': $newcol}))
or map:put(., 'newcol', $newcol)
to return a new sequence of maps:
<xsl:variable name="new-sequence" as="map(*)*">
<xsl:for-each select="$original-sequence">
<xsl:variable name="newcol" select="()"/> <!-- placeholder instead of real data -->
<xsl:sequence select="map:put(., 'newcol', $newcol)"/>
</xsl:for-each>
</xsl:varialble>
This is where the performance issue comes. Simply adding one key to the existing maps takes even longer than the initial sql:prepared-query
: about 300ms. However if I create 15,000 new maps with map{'newcol': $newcol}
that is fast: 14ms. There seems to be something expensive about adding a key to existing maps with a lot of keys, even more expensive than creating the maps in the first place.
The docs for map:put
say "Saxon meets this requirement without the overhead of creating a complete copy of the map every time an entry is added.", and "In effect the cost of adding an entry is constant, independent of the size of the map.". But I am seeing more cost with larger maps. If I reduce the initial SQL map from 85 columns to 3 columns, then adding the key is fast.
The docs for map:put
also say "Performance may suffer if there is a need to check the types of all the keys and values against a required type (e.g. an as="map(xs:integer, xs:string)" attribute on xsl:variable).". But since my variable is map(*)*
, it seems that should not apply. Though maybe the initial map sequence from sql has something to do with this.
The docs for map:merge
talk about saxon:key-type
and saxon:final
as optimization hints, and mention internal reorganizations, which is costly. It seems maybe I'm getting that internal reorganizations, but I'm not sure why.
So I'm requesting suggestions for how to best handle adding information to large maps. I'm using PE 11.6.
The reason for this effect is that the result of evaluating the sql:prepared-query
function is a map implemented as a DictionaryMap
, described in the Javadoc as:
A simple implementation of MapItem where the strings are keys, and modification is unlikely.
This is a calculated gamble: we use a data structure that is cheaper to create, and more expensive to modify. For your particular application, the gamble doesn't pay off.
We use the same structure when parsing JSON. Experience suggests that when users create maps by parsing JSON, they very rarely modify the result (in fact, they very rarely access the map by key: for most such maps, they are either discarded, or copied unchanged to the serialized output). However, we have no real way of objectively assessing whether our experience in such areas is truly representative of real workloads: as I say, we are gambling.
As it happens, over the last couple of weeks I've been reviewing the map implementations to be offered in Saxon 13 (see https://blog.saxonica.com/mike/2025/08/implementing-jnodes.html). One of the ideas I've been playing with is that when we create a structure like a DictionaryMap
with no intrinsic support for functional modification, we should implement the first few calls on map:put()
or map:remove()
not by copying the entire map to a different data structure, but rather by applying a delta. The other possibility, which we have used to good effect in other areas, is to apply a learning strategy, so if the map returned by one call on sql:prepared-query
gets modified, then the maps returned by future calls will be designed to be amenable to modification.