Implementing Maps to support JNodes

Since I last wrote about ordered maps, the XPath 4.0 specifications have now introduced the idea of JNodes, which adds a new set of requirements to the specification.

I wrote about JNodes in my paper at Balisage 2025, and I won't repeat the introduction here. Suffice it to say that a JNode can wrap a specific entry in an XDM map, and it allows you (among other things) to:

Establish which of two entries in the same map comes first in document order (or equivalently, in map order);
Get the preceding or following siblings of a map entry (the entries that precede it or follow it in map order).

The prototype implementation of JNodes that I demonstrated at Balisage included support for these operations, but the implementation would be hopelessly inefficient in production. The current implementation of ordered maps on the Saxon 13 development branch uses the LinkedHashMap implementation from the VAVR library, and this doesn't have anything that would support either of these operations.

I'm therefore proposing to use a home-grown map implementation as described below.

First note that map:put and map:remove operations are rare, and it's OK to have a map implementation that allows the map to be constructed by a mutable builder class in one phase of operation, and thereafter to be read-only. If a map:put or map:remove operation is attempted, we can rebuild the map using a different (immutable | persistent | functional) data structure. So let's first consider a structure that doesn't support map:put and map:remove operations directly. I'm calling this a FixedMap.

A FixedMap is underpinned by three Java structures:

keys: An array of keys: type AtomicValue[].
values: An array of values: type GroundedValue[].
index: A map from atomic match keys to integers: type Map<AtomicMatchKey, Integer>>. Note that this map doesn't need to be ordered. The class AtomicMatchKey is a surrogate for an XDM atomic value whose Java equals() method respects the XDM equality semantics: for example, if the key is an xs:string then the AtomicMatchKey is an instance of Saxon's UnicodeString class.

The basic operations on XDM maps are simply and efficiently implemented as follows:

Get the value corresponding to a given key: set i = index.get(key); return if i == null then null else values[i].
Get all keys in map order: iterate over keys.
Get all values in map order: iterate over values.

The new operations required to support JNode navigation can also be efficiently implemented:

Determine which of two entries with keys X and Y is first in document order: compare index.get(X) with index.get(Y) (an integer comparison).
Get preceding or following siblings of an entry with key X: find index.get(X) and then iterate forwards or backwards in the keys and values arrays from this integer offset.

All very straightforward. Now, what about map:put and map:remove?

When one of these operations occurs, I propose to copy the FixedMap to an ExtensibleMap. This will have a very similar structure to the FixedMap, except that:

The arrays of keys and values are replaced by immutable lists of keys and values. I propose to implement these immutable lists using the existing Saxon class ZenoChain. This is described in a previous blog. This uses the same logic as the ZenoString which I introduced in a 2021 Balisage paper. It provides efficient support for getting an entry at a particular integer offset, and for appending individual entries at the end of the list: therefore for map:get and map:put.
The map from AtomicMatchKey to Integer is replaced by an immutable map. There are plenty of implementations available (we could revert to the Michael Froh implementation used in earlier Saxon releases), because there is no need for this structure to support ordering.

The design relies on the fact that adding or removing entries to the map does not change the integer offset of existing entries. That's fine for map:put, which always puts new entries at the end (in terms of map ordering). To support map:remove, I propose to keep a separate list holding the integer offsets of map entries that have been removed; a traversal of the map keys or arrays needs to check this list and skip an entry if it is present. When the number of removed entries exceeds some threshold (a rare event) we can rebuild the map from scratch.

The new design looks as if all the main operations perform in constant or logarithmic time, with traversal of the map in linear time; and it seems to be reasonably economical in terms of memory usage.

The design allows the use of an optimization whereby the index (from atomic match keys to integer offsets) is built lazily, the first time map:get is called. This is useful when processing JSON, because it's likely that many maps constructed during JSON parsing will either be copied unchanged to the transformation output, or will never be referenced at all, so the work needed to construct an index can be saved (it's still necessary to check for duplicate keys, but this can be done cheaply with a simple Bloom filter).

We currently have an optimized implementation of maps for cases where all the keys are known in advance to be instances of xs:string. This can potentially save a bit of space because in place of the StringValue representing each string-valued key, we can simply store the underlying UnicodeString. This saves holding the wrapper object and the type annotation, which can be significant in the case of very large maps.

One of the benefits of this design is that it reduces our dependencies on third-party code, which in turn simplifies porting Saxon across different platforms, and transpiling to C# for the .NET platform. In the current Saxon 13 code base, we're not only dependent on the VAVR library, but we're actually dependent on a fork of that library that uses a different implentation of linked hash maps, to avoid a performance bug.

Saxon diaries

Ordered Maps and JNodes