Ordered Maps and JNodes

By Michael Kay on August 19, 2025 at 09:00a.m.

Since I last wrote about ordered maps, the XPath 4.0 specifications have now introduced the idea of JNodes, which adds a new set of requirements to the specification.

I wrote about JNodes in my paper at Balisage 2025, and I won't repeat the introduction here. Suffice it to say that a JNode can wrap a specific entry in an XDM map, and it allows you (among other things) to:

  1. Establish which of two entries in the same map comes first in document order (or equivalently, in map order);
  2. Get the preceding or following siblings of a map entry (the entries that precede it or follow it in map order).

The prototype implementation of JNodes that I demonstrated at Balisage included support for these operations, but the implementation would be hopelessly inefficient in production. The current implementation of ordered maps on the Saxon 13 development branch uses the LinkedHashMap implementation from the VAVR library, and this doesn't have anything that would support either of these operations.

I'm therefore proposing to use a home-grown map implementation as described below.

First note that map:put and map:remove operations are rare, and it's OK to have a map implementation that allows the map to be constructed by a mutable builder class in one phase of operation, and thereafter to be read-only. If a map:put or map:remove operation is attempted, we can rebuild the map using a different (immutable | persistent | functional) data structure. So let's first consider a structure that doesn't support map:put and map:remove operations directly. I'm calling this a FixedMap.

A FixedMap is underpinned by three Java structures:

The basic operations on XDM maps are simply and efficiently implemented as follows:

The new operations required to support JNode navigation can also be efficiently implemented:

All very straightforward. Now, what about map:put and map:remove?

When one of these operations occurs, I propose to copy the FixedMap to an ExtensibleMap. This will have a very similar structure to the FixedMap, except that:

The design relies on the fact that adding or removing entries to the map does not change the integer offset of existing entries. That's fine for map:put, which always puts new entries at the end (in terms of map ordering). To support map:remove, I propose to keep a separate list holding the integer offsets of map entries that have been removed; a traversal of the map keys or arrays needs to check this list and skip an entry if it is present. When the number of removed entries exceeds some threshold (a rare event) we can rebuild the map from scratch.

The new design looks as if all the main operations perform in constant or logarithmic time, with traversal of the map in linear time; and it seems to be reasonably economical in terms of memory usage.

The design allows the use of an optimization whereby the index (from atomic match keys to integer offsets) is built lazily, the first time map:get is called. This is useful when processing JSON, because it's likely that many maps constructed during JSON parsing will either be copied unchanged to the transformation output, or will never be referenced at all, so the work needed to construct an index can be saved (it's still necessary to check for duplicate keys, but this can be done cheaply with a simple Bloom filter).

We currently have an optimized implementation of maps for cases where all the keys are known in advance to be instances of xs:string. This can potentially save a bit of space because in place of the StringValue representing each string-valued key, we can simply store the underlying UnicodeString. This saves holding the wrapper object and the type annotation, which can be significant in the case of very large maps.

One of the benefits of this design is that it reduces our dependencies on third-party code, which in turn simplifies porting Saxon across different platforms, and transpiling to C# for the .NET platform. In the current Saxon 13 code base, we're not only dependent on the VAVR library, but we're actually dependent on a fork of that library that uses a different implentation of linked hash maps, to avoid a performance bug.